Re: Task not serializable?
Hi I am new to Spark and I encountered this error when I try to map RDD[A] = RDD[Array[Double]] then collect the results. A is a custom class extends Serializable. (Actually it's just a wrapper class which wraps a few variables that are all serializable). I also tried KryoSerializer according to this guide http://spark.apache.org/docs/0.8.1/tuning.html and it gave the same error message. Daniel Liu
Re: SequenceFileRDDFunctions cannot be used output of spark package
Hi Sonal, There are no custom objects in saveRDD, it is of type RDD[(String, String)]. Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SequenceFileRDDFunctions-cannot-be-used-output-of-spark-package-tp250p3508.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
java.lang.ClassNotFoundException - spark on mesos
I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is related to the error below. $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar java.io.IOException: META-INF/license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:907) at sun.tools.jar.Main.extract(Main.java:850) at sun.tools.jar.Main.run(Main.java:240) at sun.tools.jar.Main.main(Main.java:1147) This error happens with all the jars that I created. But the classes that are already generated is different in the different cases. If JavaSerializer is not already extracted before encountering META-INF/license, then that class is not found during execution. If MesosExecutorBackend is not found, then that class shows up in the mesos slave error logs. Can someone confirm if this is a valid cause for the problem I am seeing? Any way I can debug this further? — Bharath
Re: java.lang.ClassNotFoundException - spark on mesos
What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 8:16:19 AM Subject: java.lang.ClassNotFoundException - spark on mesos I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is related to the error below. $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar java.io.IOException: META-INF/license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:907) at sun.tools.jar.Main.extract(Main.java:850) at sun.tools.jar.Main.run(Main.java:240) at sun.tools.jar.Main.main(Main.java:1147) This error happens with all the jars that I created. But the classes that are already generated is different in the different cases. If JavaSerializer is not already extracted before encountering META-INF/license, then that class is not found during execution. If MesosExecutorBackend is not found, then that class shows up in the mesos slave error logs. Can someone confirm if this is a valid cause for the problem I am seeing? Any way I can debug this further? — Bharath -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata
yarn.application.classpath in yarn-site.xml
Hi, I've just tested spark in yarn mode, but something made me confused. When I *delete* the yarn.application.classpath configuration in yarn-site.xml, the following command works well. *bin/spark-class org.apache.spark.deploy.yarn.Client --jar examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-worker 3* However, when I configures it as follows, yarnAppState has always kept in the *ACCEPTED state*. The application has no tend to stop. property nameyarn.application.classpath/name value$HADOOP_HOME/etc/hadoop/conf, $HADOOP_HOME/share/hadoop/common/*,$HADOOP_HOME/share/hadoop/common/lib/*, $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*, $HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*, $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/* /value /property Hadoop version is 2.2.0 and the cluster has one master and three workers. Does anyone have ideas about this problem? Thanks, Dan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/yarn-application-classpath-in-yarn-site-xml-tp3512.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: java.lang.ClassNotFoundException - spark on mesos
I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and the latest git tree. Thanks On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote: What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 8:16:19 AM Subject: java.lang.ClassNotFoundException - spark on mesos I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is related to the error below. $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar java.io.IOException: META-INF/license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:907) at sun.tools.jar.Main.extract(Main.java:850) at sun.tools.jar.Main.run(Main.java:240) at sun.tools.jar.Main.main(Main.java:1147) This error happens with all the jars that I created. But the classes that are already generated is different in the different cases. If JavaSerializer is not already extracted before encountering META-INF/license, then that class is not found during execution. If MesosExecutorBackend is not found, then that class shows up in the mesos slave error logs. Can someone confirm if this is a valid cause for the problem I am seeing? Any way I can debug this further? — Bharath -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata
Best practices: Parallelized write to / read from S3
Howdy-doody, I have a single, very large file sitting in S3 that I want to read in with sc.textFile(). What are the best practices for reading in this file as quickly as possible? How do I parallelize the read as much as possible? Similarly, say I have a single, very large RDD sitting in memory that I want to write out to S3 with RDD.saveAsTextFile(). What are the best practices for writing this file out as quickly as possible? Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?
* unionAll preserve duplicate v/s union that does not This is true, if you want to eliminate duplicate items you should follow the union with a distinct() * SQL union and unionAll result in same output format i.e. another SQL v/s different RDD types here. * Understand the existing union contract issue. This may be a class hierarchy discussion for SchemaRDD, UnionRDD etc. ? This is unfortunately going to be a limitation of the query DSL since it extends standard RDDs. It is not possible for us to return specialized types from functions that are already defined in RDD (such as union) as the base RDD class has a very opaque notion of schema, and at this point the API for RDDs is very fixed. If you use SQL however, you will always get back SchemaRDDs.
Re: groupBy RDD does not have grouping column ?
This is similar to how SQL works, items in the GROUP BY clause are not included in the output by default. You will need to include 'a in the second parameter list (which is similar to the SELECT clause) as well if you want it included in the output. On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel manojsamelt...@gmail.comwrote: Hi, If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the resulting RDD should have 'a, 'foo and 'bar. The result RDD just shows 'foo and 'bar and is missing 'a Thoughts? Thanks, Manoj
Re: Error in SparkSQL Example
val people: RDD[Person] // An RDD of case class objects, from the first example. is just a placeholder to avoid cluttering up each example with the same code for creating an RDD. The : RDD[People] is just there to let you know the expected type of the variable 'people'. Perhaps there is a clearer way to indicate this. As you have realized, using the full line from the first example will allow you to run the rest of them. On Sun, Mar 30, 2014 at 7:31 AM, Manoj Samel manojsamelt...@gmail.comwrote: Hi, On http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html, I am trying to run code on Writing Language-Integrated Relational Queries ( I have 1.0.0 Snapshot ). I am running into error on val people: RDD[Person] // An RDD of case class objects, from the first example. scala val people: RDD[Person] console:19: error: not found: type RDD val people: RDD[Person] ^ scala val people: org.apache.spark.rdd.RDD[Person] console:18: error: class $iwC needs to be abstract, since value people is not defined class $iwC extends Serializable { ^ Any idea what the issue is ? Also, its not clear what does the RDD[Person] brings. I can run the DSL without the case class objects RDD ... val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) val teenagers = people.where('age = 13).where('age = 19) Thanks,
Re: Error in SparkSQL Example
Hi Michael, Thanks for the clarification. My question is about the error above error: class $iwC needs to be abstract and what does the RDD brings, since I can do the DSL without the people: people: org.apache.spark.rdd.RDD[Person] Thanks, On Mon, Mar 31, 2014 at 9:13 AM, Michael Armbrust mich...@databricks.comwrote: val people: RDD[Person] // An RDD of case class objects, from the first example. is just a placeholder to avoid cluttering up each example with the same code for creating an RDD. The : RDD[People] is just there to let you know the expected type of the variable 'people'. Perhaps there is a clearer way to indicate this. As you have realized, using the full line from the first example will allow you to run the rest of them. On Sun, Mar 30, 2014 at 7:31 AM, Manoj Samel manojsamelt...@gmail.comwrote: Hi, On http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html, I am trying to run code on Writing Language-Integrated Relational Queries ( I have 1.0.0 Snapshot ). I am running into error on val people: RDD[Person] // An RDD of case class objects, from the first example. scala val people: RDD[Person] console:19: error: not found: type RDD val people: RDD[Person] ^ scala val people: org.apache.spark.rdd.RDD[Person] console:18: error: class $iwC needs to be abstract, since value people is not defined class $iwC extends Serializable { ^ Any idea what the issue is ? Also, its not clear what does the RDD[Person] brings. I can run the DSL without the case class objects RDD ... val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) val teenagers = people.where('age = 13).where('age = 19) Thanks,
Re: Best practices: Parallelized write to / read from S3
Note that you may have minSplits set to more than the number of cores in the cluster, and Spark will just run as many as possible at a time. This is better if certain nodes may be slow, for instance. In general, it is not necessarily the case that doubling the number of cores doing IO will double the throughput, because you could be saturating the throughput with fewer cores. However, S3 is odd in that each connection gets way less bandwidth than your network link can provide, and it does seem to scale linearly with the number of connections. So, yes, taking minSplits up to 4 (or higher) will likely result in a 2x performance improvement. saveAsTextFile() will use as many partitions (aka splits) as the RDD it's being called on. So for instance: sc.textFile(myInputFile, 15).map(lambda x: x + !!!).saveAsTextFile(myOutputFile) will use 15 partitions to read the text file (i.e., up to 15 cores at a time) and then again to save back to S3. On Mon, Mar 31, 2014 at 9:46 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So setting minSplitshttp://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.context.SparkContext-class.html#textFile will set the parallelism on the read in SparkContext.textFile(), assuming I have the cores in the cluster to deliver that level of parallelism. And if I don't explicitly provide it, Spark will set the minSplits to 2. So for example, say I have a cluster with 4 cores total, and it takes 40 minutes to read a single file from S3 with minSplits at 2. Tt should take roughly 20 minutes to read the same file if I up minSplits to 4. Did I understand that correctly? RDD.saveAsTextFile() doesn't have an analog to minSplits, so I'm guessing that's not an operation the user can tune. On Mon, Mar 31, 2014 at 12:29 PM, Aaron Davidson ilike...@gmail.comwrote: Spark will only use each core for one task at a time, so doing sc.textFile(s3 location, num reducers) where you set num reducers to at least as many as the total number of cores in your cluster, is about as fast you can get out of the box. Same goes for saveAsTextFile. On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy-doody, I have a single, very large file sitting in S3 that I want to read in with sc.textFile(). What are the best practices for reading in this file as quickly as possible? How do I parallelize the read as much as possible? Similarly, say I have a single, very large RDD sitting in memory that I want to write out to S3 with RDD.saveAsTextFile(). What are the best practices for writing this file out as quickly as possible? Nick -- View this message in context: Best practices: Parallelized write to / read from S3http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html Sent from the Apache Spark User List mailing list archivehttp://apache-spark-user-list.1001560.n3.nabble.com/at Nabble.com.
Re: Best practices: Parallelized write to / read from S3
OK sweet. Thanks for walking me through that. I wish this were StackOverflow so I could bestow some nice rep on all you helpful people. On Mon, Mar 31, 2014 at 1:06 PM, Aaron Davidson ilike...@gmail.com wrote: Note that you may have minSplits set to more than the number of cores in the cluster, and Spark will just run as many as possible at a time. This is better if certain nodes may be slow, for instance. In general, it is not necessarily the case that doubling the number of cores doing IO will double the throughput, because you could be saturating the throughput with fewer cores. However, S3 is odd in that each connection gets way less bandwidth than your network link can provide, and it does seem to scale linearly with the number of connections. So, yes, taking minSplits up to 4 (or higher) will likely result in a 2x performance improvement. saveAsTextFile() will use as many partitions (aka splits) as the RDD it's being called on. So for instance: sc.textFile(myInputFile, 15).map(lambda x: x + !!!).saveAsTextFile(myOutputFile) will use 15 partitions to read the text file (i.e., up to 15 cores at a time) and then again to save back to S3. On Mon, Mar 31, 2014 at 9:46 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So setting minSplitshttp://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.context.SparkContext-class.html#textFile will set the parallelism on the read in SparkContext.textFile(), assuming I have the cores in the cluster to deliver that level of parallelism. And if I don't explicitly provide it, Spark will set the minSplits to 2. So for example, say I have a cluster with 4 cores total, and it takes 40 minutes to read a single file from S3 with minSplits at 2. Tt should take roughly 20 minutes to read the same file if I up minSplits to 4. Did I understand that correctly? RDD.saveAsTextFile() doesn't have an analog to minSplits, so I'm guessing that's not an operation the user can tune. On Mon, Mar 31, 2014 at 12:29 PM, Aaron Davidson ilike...@gmail.comwrote: Spark will only use each core for one task at a time, so doing sc.textFile(s3 location, num reducers) where you set num reducers to at least as many as the total number of cores in your cluster, is about as fast you can get out of the box. Same goes for saveAsTextFile. On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy-doody, I have a single, very large file sitting in S3 that I want to read in with sc.textFile(). What are the best practices for reading in this file as quickly as possible? How do I parallelize the read as much as possible? Similarly, say I have a single, very large RDD sitting in memory that I want to write out to S3 with RDD.saveAsTextFile(). What are the best practices for writing this file out as quickly as possible? Nick -- View this message in context: Best practices: Parallelized write to / read from S3http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html Sent from the Apache Spark User List mailing list archivehttp://apache-spark-user-list.1001560.n3.nabble.com/at Nabble.com.
Re: Calling Spark enthusiasts in NYC
How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.comwrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy inline: image.png
Re: network wordcount example
Not sure what data you are sending in. You could try calling lines.print() instead which should just output everything that comes in on the stream. Just to test that your socket is receiving what you think you are sending. On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote: Hello i just started working with spark today... and i am trying to run the wordcount network example i created a socket server and client.. and i am sending data to the server in an infinite loop when i run the spark class.. i see this output in the console... --- Time: 1396281891000 ms --- 14/03/31 11:04:51 INFO SparkContext: Job finished: take at DStream.scala:586, took 0.056794606 s 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job 1396281891000 ms.0 from job set of time 1396281891000 ms 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time 1396281891000 ms (execution: 0.058 s) 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool but i dont see any output from the workcount operation when i make this call... wordCounts.print(); any help is greatly appreciated thanks in advance
Re: Calling Spark enthusiasts in NYC
Responses about London, Montreal/Toronto, DC, Chicago. Great coverage so far, and keep 'em coming! (still looking for an NYC connection) I'll reply to each of you off-list to coordinate next-steps for setting up a Spark meetup in your home area. Thanks again, this is super exciting. Andy On Mon, Mar 31, 2014 at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.comwrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.comwrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy inline: image.png
Re: Calling Spark enthusiasts in NYC
We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 image.png On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: java.lang.ClassNotFoundException - spark on mesos
It sounds like the protobuf issue. So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos protobuf. mesos 0.17.0 protobuf 2.5 Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 9:46:32 AM Subject: Re: java.lang.ClassNotFoundException - spark on mesos I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and the latest git tree. Thanks On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote: What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 8:16:19 AM Subject: java.lang.ClassNotFoundException - spark on mesos I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is related to the error below. $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar java.io.IOException: META-INF/license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:907) at sun.tools.jar.Main.extract(Main.java:850) at sun.tools.jar.Main.run(Main.java:240) at sun.tools.jar.Main.main(Main.java:1147) This error happens with all the jars that I created. But the classes that are already generated is different in the different cases. If JavaSerializer is not already extracted before encountering META-INF/license, then that class is not found during execution. If MesosExecutorBackend is not found, then that class shows up in the mesos slave error logs. Can someone confirm if this is a valid cause for the problem I am seeing? Any way I can debug this further? — Bharath -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata
how spark dstream handles congestion?
Dear list, I was wondering how Spark handles congestion when the upstream is generating dstreams faster than downstream workers can handle? Thanks -Mo
Re: how spark dstream handles congestion?
Thanks -Mo 2014-03-31 13:16 GMT-05:00 Evgeny Shishkin itparan...@gmail.com: On 31 Mar 2014, at 21:05, Dong Mo monted...@gmail.com wrote: Dear list, I was wondering how Spark handles congestion when the upstream is generating dstreams faster than downstream workers can handle? It will eventually OOM.
Re: Calling Spark enthusiasts in NYC
Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote: We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 image.png On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Calling Spahk enthusiasts in Boston
My fellow Bostonians and New Englanders, We cannot allow New York to beat us to having a banging Spark meetup. Respond to me (and I guess also Andy?) if you are interested. Yana, I'm not sure either what is involved in organizing, but we can figure it out. I didn't know about the meetup that never took off. Nick On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Calling Spahk enthusiasts in Boston
I would offer to host one in Cape Town but we're almost certainly the only Spark users in the country apart from perhaps one in Johanmesburg :)— Sent from Mailbox for iPhone On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: My fellow Bostonians and New Englanders, We cannot allow New York to beat us to having a banging Spark meetup. Respond to me (and I guess also Andy?) if you are interested. Yana, I'm not sure either what is involved in organizing, but we can figure it out. I didn't know about the meetup that never took off. Nick On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Calling Spark enthusiasts in NYC
Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, but am back in NYC quite often, and have been turning several computational people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be interest in those communities. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote: We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 image.png On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: Calling Spark enthusiasts in NYC
Also in NYC, definitely interested in a spark meetup! Sent from my iPhone On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, but am back in NYC quite often, and have been turning several computational people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be interest in those communities. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote: We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 image.png On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: Calling Spark enthusiasts in NYC
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com). I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well). HTH! On March 31, 2014 at 12:35:38 PM, Patrick Grinaway (pgrina...@gmail.com) wrote: Also in NYC, definitely interested in a spark meetup! Sent from my iPhone On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, but am back in NYC quite often, and have been turning several computational people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be interest in those communities. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote: We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote: How about Chicago? On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote: How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 image.png On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: java.lang.ClassNotFoundException - spark on mesos
Your suggestion took me past the ClassNotFoundException. I then hit akka.actor.ActorNotFound exception. I patched in PR 568 into my 0.9.0 spark codebase and everything worked. So thanks a lot, Tim. Is there a JIRA/PR for the protobuf issue? Why is it not fixed in the latest git tree? Thanks. On 31-Mar-2014, at 11:30 pm, Tim St Clair tstcl...@redhat.com wrote: It sounds like the protobuf issue. So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos protobuf. mesos 0.17.0 protobuf 2.5 Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 9:46:32 AM Subject: Re: java.lang.ClassNotFoundException - spark on mesos I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and the latest git tree. Thanks On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote: What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 8:16:19 AM Subject: java.lang.ClassNotFoundException - spark on mesos I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is related to the error below. $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar java.io.IOException: META-INF/license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:907) at sun.tools.jar.Main.extract(Main.java:850) at sun.tools.jar.Main.run(Main.java:240) at sun.tools.jar.Main.main(Main.java:1147) This error happens with all the jars that I created. But the classes that are already generated is different in the different cases. If JavaSerializer is not already extracted before encountering META-INF/license, then that class is not found during execution. If MesosExecutorBackend is not found, then that class shows up in the mesos slave error logs. Can someone confirm if this is a valid cause for the problem I am seeing? Any way I can debug this further? — Bharath -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata
Re: Using ProtoBuf 2.5 for messages with Spark Streaming
Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be getting pulled in unless you are directly using akka yourself. Are you? Does your project have other dependencies that might be indirectly pulling in protobuf 2.4.1? It would be helpful if you could list all of your dependencies including the exact Spark version and other libraries. - Patrick On Sun, Mar 30, 2014 at 10:03 PM, Vipul Pandey vipan...@gmail.com wrote: I'm using ScalaBuff (which depends on protobuf2.5) and facing the same issue. any word on this one? On Mar 27, 2014, at 6:41 PM, Kanwaldeep kanwal...@gmail.com wrote: We are using Protocol Buffer 2.5 to send messages to Spark Streaming 0.9 with Kafka stream setup. I have protocol Buffer 2.5 part of the uber jar deployed on each of the spark worker nodes. The message is compiled using 2.5 but then on runtime it is being de-serialized by 2.4.1 as I'm getting the following exception java.lang.VerifyError (java.lang.VerifyError: class com.snc.sinet.messages.XServerMessage$XServer overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;) java.lang.ClassLoader.defineClass1(Native Method) java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) java.lang.ClassLoader.defineClass(ClassLoader.java:615) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) Suggestions on how I could still use ProtoBuf 2.5. Based on the article - https://spark-project.atlassian.net/browse/SPARK-995 we should be able to use different version of protobuf in the application. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Calling Spark enthusiasts in Austin, TX
In the spirit of everything being bigger and better in TX ;) = if anyone is in Austin and interested in meeting up over Spark - contact me! There seems to be a Spark meetup group in Austin that has never met and my initial email to organize the first gathering was never acknowledged. Ognen On 3/31/14, 2:01 PM, Nick Pentreath wrote: I would offer to host one in Cape Town but we're almost certainly the only Spark users in the country apart from perhaps one in Johanmesburg :) — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas nicholas.cham...@gmail.com mailto:nicholas.cham...@gmail.com wrote: My fellow Bostonians and New Englanders, We cannot allow New York to beat us to having a banging Spark meetup. Respond to me (and I guess also Andy?) if you are interested. Yana, I'm not sure either what is involved in organizing, but we can figure it out. I didn't know about the meetup that never took off. Nick On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska [hidden email] /user/SendEmail.jtp?type=nodenode=3544i=0 wrote: Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas [hidden email] /user/SendEmail.jtp?type=nodenode=3544i=1 wrote: As in, I am interested in helping organize a Spark meetup in the Boston area. On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas [hidden email] /user/SendEmail.jtp?type=nodenode=3544i=2 wrote: Well, since this thread has played out as it has, lemme throw in a shout-out for Boston. View this message in context: Calling Spahk enthusiasts in Boston http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Re: network wordcount example
@eric- i saw this exact issue recently while working on the KinesisWordCount. are you passing local[2] to your example as the MASTER arg versus just local or local[1]? you need at least 2. it's documented as n1 in the scala source docs - which is easy to mistake for n=1. i just ran the NetworkWordCount sample and confirmed that local[1] does not work, but local[2] does work. give that a whirl. -chris On Mon, Mar 31, 2014 at 10:41 AM, Diana Carroll dcarr...@cloudera.comwrote: Not sure what data you are sending in. You could try calling lines.print() instead which should just output everything that comes in on the stream. Just to test that your socket is receiving what you think you are sending. On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote: Hello i just started working with spark today... and i am trying to run the wordcount network example i created a socket server and client.. and i am sending data to the server in an infinite loop when i run the spark class.. i see this output in the console... --- Time: 1396281891000 ms --- 14/03/31 11:04:51 INFO SparkContext: Job finished: take at DStream.scala:586, took 0.056794606 s 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job 1396281891000 ms.0 from job set of time 1396281891000 ms 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time 1396281891000 ms (execution: 0.058 s) 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool but i dont see any output from the workcount operation when i make this call... wordCounts.print(); any help is greatly appreciated thanks in advance
Re: java.lang.ClassNotFoundException - spark on mesos
I was talking about the protobuf version issue as not fixed. I could not find any reference to the problem or the fix. Reg. SPARK-1052, I could pull in the fix into my 0.9.0 tree (from the tar ball on the website) and I see the fix in the latest git. Thanks On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote: Which repository do you use? The issue should be fixed in 0.9.1 and 1.0.0 https://spark-project.atlassian.net/browse/SPARK-1052 https://spark-project.atlassian.net/browse/SPARK-1052 There's an old repository https://github.com/apache/incubator-spark and as Spark become one of top level projects, it was moved to new repo: https://github.com/apache/spark The 0.9.1 version hasn't been released yet, so you should get it from the new git repo. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Calling Spark enthusiasts in NYC
Hi Andy, I would be interested in setting up a meetup in Delhi/NCR, India. Can you please let me know how to go about organizing it? Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Tue, Apr 1, 2014 at 10:04 AM, giive chen thegi...@gmail.com wrote: Hi Andy We are from Taiwan. We are already planning to have a Spark meetup. We already have some resources like place and food budget. But we do need some other resource. Please contact me offline. Thanks Wisely Chen On Tue, Apr 1, 2014 at 1:28 AM, Andy Konwinski andykonwin...@gmail.comwrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: java.lang.ClassNotFoundException - spark on mesos
Another problem I noticed is that the current 1.0.0 git tree still gives me the ClassNotFoundException. I see that the SPARK-1052 is already fixed there. I then modified the pom.xml for mesos and protobuf and that still gave the ClassNotFoundException. I also tried modifying pom.xml only for mesos and that fails too. So I have no way of running the 1.0.0 git tree spark on mesos yet. Thanks. On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote: Which repository do you use? The issue should be fixed in 0.9.1 and 1.0.0 https://spark-project.atlassian.net/browse/SPARK-1052 https://spark-project.atlassian.net/browse/SPARK-1052 There's an old repository https://github.com/apache/incubator-spark and as Spark become one of top level projects, it was moved to new repo: https://github.com/apache/spark The 0.9.1 version hasn't been released yet, so you should get it from the new git repo. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html Sent from the Apache Spark User List mailing list archive at Nabble.com.