Re: Spark Streaming fails - where is the problem?
Update: I can get it to work by disabling iptables temporarily. I can, however, not figure out on which port I have to accept traffic. 4040 and any of the Master or Worker ports mentioned in the previous post don't work. Can it be one of the randomly assigned ones in the 30k to 60k range? Those appear to change every time, making it difficult to apply any sensible rules. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
Hi Simon, The drivers and executors currently choose random ports to talk to each other, so the Spark nodes will have to have full TCP access to each other. This is changed in a very recent commit, where all of these random ports will become configurable: https://github.com/apache/spark/commit/09f7e4587bbdf74207d2629e8c1314f93d865999. This will be available in Spark 1.1, but for now you will have to open all ports among the nodes in your cluster. -Andrew 2014-08-06 10:23 GMT-07:00 durin m...@simon-schaefer.net: Update: I can get it to work by disabling iptables temporarily. I can, however, not figure out on which port I have to accept traffic. 4040 and any of the Master or Worker ports mentioned in the previous post don't work. Can it be one of the randomly assigned ones in the 30k to 60k range? Those appear to change every time, making it difficult to apply any sensible rules. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
Hi Andrew, for this test I only have one machine which provides the master and only worker. So all I'd need is communication to the Internet to access the twitter API. I've tried assigning a specific port to the driver and creating iptables rules for this port, but that didn't work. Best regards, Simon On Aug 6, 2014 11:37 AM, quot;Andrew Or-2 [via Apache Spark User List]quot; lt;ml-node+s1001560n11561...@n3.nabble.comgt; wrote: Hi Simon, The drivers and executors currently choose random ports to talk to each other, so the Spark nodes will have to have full TCP access to each other. This is changed in a very recent commit, where all of these random ports will become configurable: https://github.com/apache/spark/commit/09f7e4587bbdf74207d2629e8c1314f93d865999 . This will be available in Spark 1.1, but for now you will have to open all ports among the nodes in your cluster. -Andrew 2014-08-06 10:23 GMT-07:00 durin lt; [hidden email] gt;: lt;blockquote style='border-left:2px solid #CC;padding:0 1em' class=quot;gmail_quotequot; style=quot;margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1exquot;gt;Update: I can get it to work by disabling iptables temporarily. I can, however, not figure out on which port I have to accept traffic. 4040 and any of the Master or Worker ports mentioned in the previous post don#39;t work. Can it be one of the randomly assigned ones in the 30k to 60k range? Those appear to change every time, making it difficult to apply any sensible rules. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11561.html To unsubscribe from Spark Streaming fails - where is the problem?, click here . NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11566.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming fails - where is the problem?
@ Simon Any progress? On Tue, Aug 5, 2014 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You need to add twitter4j-*-3.0.3.jars to your class path Thanks Best Regards On Tue, Aug 5, 2014 at 7:18 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Are you able to run it locally? If not, can you try creating an all-inclusive jar with all transitive dependencies together (sbt assembly) and then try running the app? Then this will be a self contained environment, which will help us debug better. TD On Mon, Aug 4, 2014 at 5:06 PM, durin m...@simon-schaefer.net wrote: In the WebUI Environment tab, the section Classpath Entries lists the following ones as part of System Classpath: /foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop /foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar /foo/spark-master-2014-07-28/conf /foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar So I can't see where any other versions would come from. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
Using 3.0.3 (downloaded from http://mvnrepository.com/artifact/org.twitter4j ) changes the error to Exception in thread Thread-55 java.lang.NoClassDefFoundError: twitter4j/StatusListener at org.apache.spark.streaming.twitter.TwitterInputDStream.getReceiver(TwitterInputDStream.scala:55) It seems yet another version is required. Is there any quick way to find out which? The ScalaDoc for TwitterUtils doesn't seem to mention anything in that direction. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11387.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
3.0.3 is being used https://github.com/apache/spark/blob/master/external/twitter/pom.xml Are you sure you are deploying the twitter4j3.0.3, and there is not other version of twitter4j in the path? TD On Mon, Aug 4, 2014 at 4:48 PM, durin m...@simon-schaefer.net wrote: Using 3.0.3 (downloaded from http://mvnrepository.com/artifact/org.twitter4j ) changes the error to Exception in thread Thread-55 java.lang.NoClassDefFoundError: twitter4j/StatusListener at org.apache.spark.streaming.twitter.TwitterInputDStream.getReceiver(TwitterInputDStream.scala:55) It seems yet another version is required. Is there any quick way to find out which? The ScalaDoc for TwitterUtils doesn't seem to mention anything in that direction. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11387.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
In the WebUI Environment tab, the section Classpath Entries lists the following ones as part of System Classpath: /foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop /foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar /foo/spark-master-2014-07-28/conf /foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar So I can't see where any other versions would come from. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming fails - where is the problem?
Are you able to run it locally? If not, can you try creating an all-inclusive jar with all transitive dependencies together (sbt assembly) and then try running the app? Then this will be a self contained environment, which will help us debug better. TD On Mon, Aug 4, 2014 at 5:06 PM, durin m...@simon-schaefer.net wrote: In the WebUI Environment tab, the section Classpath Entries lists the following ones as part of System Classpath: /foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop /foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar /foo/spark-master-2014-07-28/conf /foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar /foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar So I can't see where any other versions would come from. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org