Re: Spark Streaming fails - where is the problem?

2014-08-06 Thread durin
Update: I can get it to work by disabling iptables temporarily. I can,
however, not figure out on which port I have to accept traffic. 4040 and any
of the Master or Worker ports mentioned in the previous post don't work.

Can it be one of the randomly assigned ones in the 30k to 60k range? Those
appear to change every time, making it difficult to apply any sensible
rules.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming fails - where is the problem?

2014-08-06 Thread Andrew Or
Hi Simon,

The drivers and executors currently choose random ports to talk to each
other, so the Spark nodes will have to have full TCP access to each other.
This is changed in a very recent commit, where all of these random ports
will become configurable:
https://github.com/apache/spark/commit/09f7e4587bbdf74207d2629e8c1314f93d865999.
This will be available in Spark 1.1, but for now you will have to open all
ports among the nodes in your cluster.

-Andrew


2014-08-06 10:23 GMT-07:00 durin m...@simon-schaefer.net:

 Update: I can get it to work by disabling iptables temporarily. I can,
 however, not figure out on which port I have to accept traffic. 4040 and
 any
 of the Master or Worker ports mentioned in the previous post don't work.

 Can it be one of the randomly assigned ones in the 30k to 60k range? Those
 appear to change every time, making it difficult to apply any sensible
 rules.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark Streaming fails - where is the problem?

2014-08-06 Thread durin
Hi Andrew, 
for this test I only have one machine which provides the master and only 
worker. 
So all I'd need is communication to the Internet to access the twitter API. 
I've tried assigning a specific port to the driver and creating iptables rules 
for this port, but that didn't work. 
Best regards, 
Simon 
On Aug 6, 2014 11:37 AM, quot;Andrew Or-2 [via Apache Spark User List]quot; 
lt;ml-node+s1001560n11561...@n3.nabble.comgt; wrote: 

Hi Simon, The drivers and executors currently choose random ports to 
talk to each other, so the Spark nodes will have to have full TCP access to 
each other. This is changed in a very recent commit, where all of these random 
ports will become configurable:  
https://github.com/apache/spark/commit/09f7e4587bbdf74207d2629e8c1314f93d865999 
. This will be available in Spark 1.1, but for now you will have to open all 
ports among the nodes in your cluster. 
-Andrew 2014-08-06 10:23 GMT-07:00 durin lt; [hidden email] gt;: 
lt;blockquote style='border-left:2px solid #CC;padding:0 1em' 
class=quot;gmail_quotequot; style=quot;margin:0 0 0 .8ex;border-left:1px 
#ccc solid;padding-left:1exquot;gt;Update: I can get it to work by disabling 
iptables temporarily. I can, 
however, not figure out on which port I have to accept traffic. 4040 and any 
of the Master or Worker ports mentioned in the previous post don#39;t work. 

Can it be one of the randomly assigned ones in the 30k to 60k range? Those 
appear to change every time, making it difficult to apply any sensible 
rules. 



-- 
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11556.html
 

Sent from the Apache Spark User List mailing list archive at Nabble.com. 

- 
To unsubscribe, e-mail: [hidden email] 
For additional commands, e-mail: [hidden email] 











If you reply to this email, your message will be added to the 
discussion below: 

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11561.html
 



To unsubscribe from Spark Streaming fails - where is the 
problem?, click here . 
NAML 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11566.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming fails - where is the problem?

2014-08-05 Thread Tathagata Das
@ Simon Any progress?

On Tue, Aug 5, 2014 at 12:17 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
 You need to add twitter4j-*-3.0.3.jars to your class path

 Thanks
 Best Regards


 On Tue, Aug 5, 2014 at 7:18 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:

 Are you able to run it locally? If not, can you try creating an
 all-inclusive jar with all transitive dependencies together (sbt
 assembly)  and then try running the app? Then this will be a self
 contained environment, which will help us debug better.

 TD


 On Mon, Aug 4, 2014 at 5:06 PM, durin m...@simon-schaefer.net wrote:
  In the WebUI Environment tab, the section Classpath Entries lists
  the
  following ones as part of System Classpath:
 
  /foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop
 
  /foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar
  /foo/spark-master-2014-07-28/conf
 
  /foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar
  /foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar
  /foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar
 
 
  So I can't see where any other versions would come from.
 
 
 
  --
  View this message in context:
  http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming fails - where is the problem?

2014-08-04 Thread durin
Using 3.0.3 (downloaded from http://mvnrepository.com/artifact/org.twitter4j
) changes the error to

Exception in thread Thread-55 java.lang.NoClassDefFoundError:
twitter4j/StatusListener
at
org.apache.spark.streaming.twitter.TwitterInputDStream.getReceiver(TwitterInputDStream.scala:55)

It seems yet another version is required. Is there any quick way to find out
which? The ScalaDoc for TwitterUtils doesn't seem to mention anything in
that direction.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11387.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming fails - where is the problem?

2014-08-04 Thread Tathagata Das
3.0.3 is being used
https://github.com/apache/spark/blob/master/external/twitter/pom.xml

Are you sure you are deploying the twitter4j3.0.3, and there is not
other version of twitter4j in the path?

TD

On Mon, Aug 4, 2014 at 4:48 PM, durin m...@simon-schaefer.net wrote:
 Using 3.0.3 (downloaded from http://mvnrepository.com/artifact/org.twitter4j
 ) changes the error to

 Exception in thread Thread-55 java.lang.NoClassDefFoundError:
 twitter4j/StatusListener
 at
 org.apache.spark.streaming.twitter.TwitterInputDStream.getReceiver(TwitterInputDStream.scala:55)

 It seems yet another version is required. Is there any quick way to find out
 which? The ScalaDoc for TwitterUtils doesn't seem to mention anything in
 that direction.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11387.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming fails - where is the problem?

2014-08-04 Thread durin
In the WebUI Environment tab, the section Classpath Entries lists the
following ones as part of System Classpath:

/foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop
/foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar
/foo/spark-master-2014-07-28/conf
/foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar
/foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar
/foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar


So I can't see where any other versions would come from.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming fails - where is the problem?

2014-08-04 Thread Tathagata Das
Are you able to run it locally? If not, can you try creating an
all-inclusive jar with all transitive dependencies together (sbt
assembly)  and then try running the app? Then this will be a self
contained environment, which will help us debug better.

TD


On Mon, Aug 4, 2014 at 5:06 PM, durin m...@simon-schaefer.net wrote:
 In the WebUI Environment tab, the section Classpath Entries lists the
 following ones as part of System Classpath:

 /foo/hadoop-2.0.0-cdh4.5.0/etc/hadoop
 /foo/spark-master-2014-07-28/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.0.0-cdh4.5.0.jar
 /foo/spark-master-2014-07-28/conf
 /foo/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar
 /foo/spark-master-2014-07-28/extrajars/twitter4j-core-3.0.3.jar
 /foo/spark-master-2014-07-28/extrajars/twitter4j-stream-3.0.3.jar


 So I can't see where any other versions would come from.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fails-where-is-the-problem-tp11355p11391.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org