Re: SparkContext Threading

2015-06-06 Thread Will Briggs
Hi Lee, it's actually not related to threading at all - you would still have the same problem even if you were using a single thread. See this section ( https://spark.apache.org/docs/latest/programming-guide.html#passing-functions-to-spark) of the Spark docs. On June 5, 2015, at 5:12 PM, Lee

Re: SparkContext Threading

2015-06-06 Thread Lee McFadden
Hi Will, That doesn't seem to be the case and was part of the source of my confusion. The code currently in the run method of the runnable works perfectly fine with the lambda expressions when it is invoked from the main method. They also work when they are invoked from within a separate method

Re: SparkContext Threading

2015-06-06 Thread William Briggs
Hi Lee, I'm stuck with only mobile devices for correspondence right now, so I can't get to shell to play with this issue - this is all supposition; I think that the lambdas are closing over the context because it's a constructor parameter to your Runnable class, which is why inlining the lambdas

SparkContext Threading

2015-06-05 Thread Lee McFadden
Hi all, I'm having some issues finding any kind of best practices when attempting to create Spark applications which launch jobs from a thread pool. Initially I had issues passing the SparkContext to other threads as it is not serializable. Eventually I found that adding the @transient

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote: Initially I had issues passing the SparkContext to other threads as it is not serializable. Eventually I found that adding the @transient annotation prevents a NotSerializableException. This is really puzzling. How are

Re: SparkContext Threading

2015-06-05 Thread Lee McFadden
You can see an example of the constructor for the class which executes a job in my opening post. I'm attempting to instantiate and run the class using the code below: ``` val conf = new SparkConf() .setAppName(appNameBase.format(Test)) val connector = CassandraConnector(conf)

Re: SparkContext Threading

2015-06-05 Thread Igor Berman
+1 to question about serializaiton. SparkContext is still in driver process(even if it has several threads from which you submit jobs) as for the problem, check your classpath, scala version, spark version etc. such errors usually happens when there is some conflict in classpath. Maybe you

Re: SparkContext Threading

2015-06-05 Thread Igor Berman
Lee, what cluster do you use? standalone, yarn-cluster, yarn-client, mesos? in yarn-cluster the driver program is executed inside one of nodes in cluster, so might be that driver code needs to be serialized to be sent to some node On 5 June 2015 at 22:55, Lee McFadden splee...@gmail.com wrote:

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
Ignoring the serialization thing (seems like a red herring): On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote: 15/06/05 11:35:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoSuchMethodError:

Re: SparkContext Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 12:30 PM Marcelo Vanzin van...@cloudera.com wrote: Ignoring the serialization thing (seems like a red herring): People seem surprised that I'm getting the Serialization exception at all - I'm not convinced it's a red herring per se, but on to the blocking issue...

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 12:55 PM, Lee McFadden splee...@gmail.com wrote: Regarding serialization, I'm still confused as to why I was getting a serialization error in the first place as I'm executing these Runnable classes from a java thread pool. I'm fairly new to Scala/JVM world and there

Re: SparkContext Threading

2015-06-05 Thread Will Briggs
Your lambda expressions on the RDDs in the SecondRollup class are closing around the context, and Spark has special logic to ensure that all variables in a closure used on an RDD are Serializable - I hate linking to Quora, but there's a good explanation here:

Re: SparkContext Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 1:00 PM Igor Berman igor.ber...@gmail.com wrote: Lee, what cluster do you use? standalone, yarn-cluster, yarn-client, mesos? Spark standalone, v1.2.1.

Re: SparkContext Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 12:58 PM Marcelo Vanzin van...@cloudera.com wrote: You didn't show the error so the only thing we can do is speculate. You're probably sending the object that's holding the SparkContext reference over the network at some point (e.g. it's used by a task run in an

Re: SparkContext Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 2:05 PM Will Briggs wrbri...@gmail.com wrote: Your lambda expressions on the RDDs in the SecondRollup class are closing around the context, and Spark has special logic to ensure that all variables in a closure used on an RDD are Serializable - I hate linking to Quora,