Re: Issue with Custom Key Class
Does your RDD contain a null key? On Sat, Nov 8, 2014 at 11:15 AM, Bahubali Jain bahub...@gmail.com wrote: Hi, I have a custom key class.In this class equals() and hashcode() have been overridden. I have a javaPairRDD which has this class as the key .When groupbykey() or reducebykey() is called a null object is being passed to the function equals(Object obj) as a result the grouping is failing. Is this a known issue? I am using Spark 0.9 version. Thanks, Baahu -- Twitter:http://twitter.com/Baahu - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: org/apache/commons/math3/random/RandomGenerator issue
Hi, I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having the same error. I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it didn't help. Any ideas what might be the problem? Thanks, Lev. anny9699 wrote I use the breeze.stats.distributions.Bernoulli in my code, however met this problem java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on YARN, ExecutorLostFailure for long running computations in map
So it seems that this problem was related to http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html and increasing the executor memory worked for me. __ Hi, I am getting ExecutorLostFailure when I run spark on YARN and in map I perform very long tasks (couple of hours). Error Log is below. Do you know if it is possible to set something to make it possible for Spark to perform these very long running jobs in map? Thank you very much for any advice. Best regards, Jan Spark log: 4533,931: [GC 394578K-20882K(1472000K), 0,0226470 secs] Traceback (most recent call last): File /home/hadoop/spark_stuff/spark_lda.py, line 112, in module models.saveAsTextFile(sys.argv[1]) File /home/hadoop/spark/python/pyspark/rdd.py, line 1324, in saveAsTextFile keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path) File /home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 0.0 failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Yarn log: 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:41091 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:39160 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:45058 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:54111 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:45772 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:59509 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:35720 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) 14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) 14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding SendingConnection to
Debian package for spark?
Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: org/apache/commons/math3/random/RandomGenerator issue
lev wrote I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having the same error. I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it didn't help. I am experiencing likewise with all the breeze.stats.distributions using any math3 version. I run 'spark-shell --jars commons-math3-3.1.1.jar', import the required classes (org.apache.commons.math3.random.RandomGenerator, etc.), and am unable to create any distributions from breeze despite having just imported the offending class: java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator Caused by: java.lang.ClassNotFoundException: org.apache.commons.math3.random.RandomGenerator -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18411.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Debian package for spark?
Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Debian package for spark?
looks like it doesn’t work: [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on project spark-assembly_2.10: Failed to create debian package /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb: Could not create deb package: Control file descriptor keys are invalid [Version]. The following keys are mandatory [Package, Version, Section, Priority, Architecture, Maintainer, Description]. Please check your pom.xml/build.xml and your control file. - [Help 1] On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton bur...@spinn3r.com wrote: Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Debian package for spark?
OK… here’s my version. https://github.com/spinn3r/spark-deb it’s just two files really. so if the standard spark packages get fixed I’ll just switch to them. Doesn’t look like there’s an init script and the conf isn’t in /etc … On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton bur...@spinn3r.com wrote: looks like it doesn’t work: [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on project spark-assembly_2.10: Failed to create debian package /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb: Could not create deb package: Control file descriptor keys are invalid [Version]. The following keys are mandatory [Package, Version, Section, Priority, Architecture, Maintainer, Description]. Please check your pom.xml/build.xml and your control file. - [Help 1] On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton bur...@spinn3r.com wrote: Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: org/apache/commons/math3/random/RandomGenerator issue
Hi Lev, I also finally couldn't solve that problem and switched to Java.util.Random. Thanks~ Anny On Sat, Nov 8, 2014 at 4:21 AM, lev [via Apache Spark User List] ml-node+s1001560n18406...@n3.nabble.com wrote: Hi, I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having the same error. I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it didn't help. Any ideas what might be the problem? Thanks, Lev. anny9699 wrote I use the breeze.stats.distributions.Bernoulli in my code, however met this problem java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html To unsubscribe from org/apache/commons/math3/random/RandomGenerator issue, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=15748code=YW5ueTk2OTlAZ21haWwuY29tfDE1NzQ4fC0xMzE2OTg2NzMw . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18415.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Debian package for spark?
Another note for the official debs. ‘spark’ is a bad package name because of confusion with the spark programming lang based on ada. There are packages for this already named ‘spark’ so I put mine as ‘apache-spark’ On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton bur...@spinn3r.com wrote: OK… here’s my version. https://github.com/spinn3r/spark-deb it’s just two files really. so if the standard spark packages get fixed I’ll just switch to them. Doesn’t look like there’s an init script and the conf isn’t in /etc … On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton bur...@spinn3r.com wrote: looks like it doesn’t work: [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on project spark-assembly_2.10: Failed to create debian package /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb: Could not create deb package: Control file descriptor keys are invalid [Version]. The following keys are mandatory [Package, Version, Section, Priority, Architecture, Maintainer, Description]. Please check your pom.xml/build.xml and your control file. - [Help 1] On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton bur...@spinn3r.com wrote: Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: org/apache/commons/math3/random/RandomGenerator issue
This means you haven't actually included commons-math3 in your application. Check the contents of your final app jar and then go check your build file again. On Sat, Nov 8, 2014 at 12:20 PM, lev kat...@gmail.com wrote: Hi, I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having the same error. I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it didn't help. Any ideas what might be the problem? Thanks, Lev. anny9699 wrote I use the breeze.stats.distributions.Bernoulli in my code, however met this problem java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Debian package for spark?
The building of the Debian package in Spark works just fine for me -- I just did it using a clean check-out of 1.1.1-SNAPSHOT and `mvn -U -Pdeb -DskipTests clean package`. There's likely something else amiss in your build. Actually, that's not quite true. There is one small problem with the Debian packaging that you should be aware of: https://issues.apache.org/jira/browse/SPARK-3624 https://github.com/apache/spark/pull/2477#issuecomment-58291272 You should also know there is no such thing as standard Debian packages or official debs for Spark, nor is it likely that there ever will be. What is available was never intended as anything more than a convenient hack (or a starting point for a custom hack) for Spark developers or users who need a way to create a Spark package sufficient to use in a configuration management system or something of that nature. A proper collection of debs that divides Spark up into multiple parts, properly reflects inter-package dependencies, relocates executables, configuration and libraries to conform to the expectations of a larger system, etc. is something that the Apache Spark Project does not do, probably won't do, and probably shouldn't do -- something like that is better handled by the distributors of OSes or larger software systems like Apache Bigtop. On Sat, Nov 8, 2014 at 1:17 PM, Kevin Burton bur...@spinn3r.com wrote: Another note for the official debs. ‘spark’ is a bad package name because of confusion with the spark programming lang based on ada. There are packages for this already named ‘spark’ so I put mine as ‘apache-spark’ On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton bur...@spinn3r.com wrote: OK… here’s my version. https://github.com/spinn3r/spark-deb it’s just two files really. so if the standard spark packages get fixed I’ll just switch to them. Doesn’t look like there’s an init script and the conf isn’t in /etc … On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton bur...@spinn3r.com wrote: looks like it doesn’t work: [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on project spark-assembly_2.10: Failed to create debian package /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb: Could not create deb package: Control file descriptor keys are invalid [Version]. The following keys are mandatory [Package, Version, Section, Priority, Architecture, Maintainer, Description]. Please check your pom.xml/build.xml and your control file. - [Help 1] On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton bur...@spinn3r.com wrote: Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Do spark works on multicore systems?
I am a Spark newbie and I use python (pyspark). I am trying to run a program on a 64 core system, but no matter what I do, it always use 1 core. It doesn't matter if I run it using spark-submit --master local[64] run.sh or I call x.repartition(64) in my code with an RDD, the spark program always use one core. Has anyone experience of running spark programs on multicore processors with success? Can someone provide me a very simple example that does properly run on all cores of a multicore system? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-spark-works-on-multicore-systems-tp18419.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Does spark works on multicore systems?
I am a Spark newbie and I use python (pyspark). I am trying to run a program on a 64 core system, but no matter what I do, it always uses 1 core. It doesn't matter if I run it using spark-submit --master local[64] run.sh or I call x.repartition(64) in my code with an RDD, the spark program always uses one core. Has anyone experience of running spark programs on multicore processors with success? Can someone provide me a very simple example that does properly run on all cores of a multicore system?
Re: Debian package for spark?
Weird… I’m using a 1.1.0 source tar.gz … but if it’s fixed in 1.1.1 that’s good. On Sat, Nov 8, 2014 at 2:08 PM, Mark Hamstra m...@clearstorydata.com wrote: The building of the Debian package in Spark works just fine for me -- I just did it using a clean check-out of 1.1.1-SNAPSHOT and `mvn -U -Pdeb -DskipTests clean package`. There's likely something else amiss in your build. Actually, that's not quite true. There is one small problem with the Debian packaging that you should be aware of: https://issues.apache.org/jira/browse/SPARK-3624 https://github.com/apache/spark/pull/2477#issuecomment-58291272 You should also know there is no such thing as standard Debian packages or official debs for Spark, nor is it likely that there ever will be. What is available was never intended as anything more than a convenient hack (or a starting point for a custom hack) for Spark developers or users who need a way to create a Spark package sufficient to use in a configuration management system or something of that nature. A proper collection of debs that divides Spark up into multiple parts, properly reflects inter-package dependencies, relocates executables, configuration and libraries to conform to the expectations of a larger system, etc. is something that the Apache Spark Project does not do, probably won't do, and probably shouldn't do -- something like that is better handled by the distributors of OSes or larger software systems like Apache Bigtop. On Sat, Nov 8, 2014 at 1:17 PM, Kevin Burton bur...@spinn3r.com wrote: Another note for the official debs. ‘spark’ is a bad package name because of confusion with the spark programming lang based on ada. There are packages for this already named ‘spark’ so I put mine as ‘apache-spark’ On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton bur...@spinn3r.com wrote: OK… here’s my version. https://github.com/spinn3r/spark-deb it’s just two files really. so if the standard spark packages get fixed I’ll just switch to them. Doesn’t look like there’s an init script and the conf isn’t in /etc … On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton bur...@spinn3r.com wrote: looks like it doesn’t work: [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on project spark-assembly_2.10: Failed to create debian package /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb: Could not create deb package: Control file descriptor keys are invalid [Version]. The following keys are mandatory [Package, Version, Section, Priority, Architecture, Maintainer, Description]. Please check your pom.xml/build.xml and your control file. - [Help 1] On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton bur...@spinn3r.com wrote: Nice! Not sure how I missed that. Building it now. If it has all the init scripts and config in the right place I might use that. I might have to build a cassandra package too which adds cassandra support.. I *think* at least. Maybe distribute this .deb with the standard downloads? Kevin On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Yep there is one have a look here http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages Le 8 nov. 2014 19:48, Kevin Burton bur...@spinn3r.com a écrit : Are there debian packages for spark? If not I plan on making one… I threw one together in about 20 minutes as they are somewhat easy with maven and jdeb. But of course there are other things I need to install like cassandra support and an init script. So I figured I’d ask here first. If not we will open source our packaging code and put it on github. It’s about 50 lines of code :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile
Unresolved Attributes
I have an exception when I am trying to run a simple where clause query. I can see the name attribute is present in the schema but somehow it still throws the exception. query = select name from business where business_id= + business_id what am I doing wrong ? thx srinivas Exception in thread main org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'name, tree: Project ['name] Filter (business_id#1 = 'Ba1hXOqb3Yhix8bhE0k_WQ) Subquery business SparkLogicalPlan (ExistingRdd [attributes#0,business_id#1,categories#2,city#3,full_address#4,hours#5,latitude#6,longitude#7,name#8,neighborhoods#9,open#10,review_count#11,stars#12,state#13,type#14], MappedRDD[5] at map at JsonRDD.scala:38)
contains in array in Spark SQL
hi, what would be the syntax for check for an attribute in an array data type for my where clause ? select * from business where cateogories contains 'X' // something like this , is this right syntax ?? attribute: categories type: Array thx srinivas
Re: org/apache/commons/math3/random/RandomGenerator issue
I ran into this problem too and I know of a workaround but don't exactly know what is happening. The work around is to explicitly add either the commons math jar or your application jar (shaded with commons math) to spark.executor.extraClassPath. My hunch is that this is related to the class loader problem described in [1] where Spark loads breeze at the beginning and then having commons math in the user's jar somehow doesn't get picked up. Thanks Shivaram [1] http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-td7042.html#a8307 On Sat, Nov 8, 2014 at 1:21 PM, Sean Owen so...@cloudera.com wrote: This means you haven't actually included commons-math3 in your application. Check the contents of your final app jar and then go check your build file again. On Sat, Nov 8, 2014 at 12:20 PM, lev kat...@gmail.com wrote: Hi, I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having the same error. I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it didn't help. Any ideas what might be the problem? Thanks, Lev. anny9699 wrote I use the breeze.stats.distributions.Bernoulli in my code, however met this problem java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
wierd caching
RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize in TachyonSize on Disk 8 http://hadoop-s1.oculus.guest:4042/storage/rdd?id=8 Memory Deserialized 1x Replicated 426 107% 59.7 GB 0.0 B 0.0 BAnyone understand what it means to have more than 100% of an rdd cached? Thanks, -Nathan
Re: wierd caching
It might mean that some partition was computed on two nodes, because a task for it wasn't able to be scheduled locally on the first node. Did the RDD really have 426 partitions total? You can click on it and see where there are copies of each one. Matei On Nov 8, 2014, at 10:16 PM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size in Tachyon Size on Disk 8 http://hadoop-s1.oculus.guest:4042/storage/rdd?id=8 Memory Deserialized 1x Replicated 426 107%59.7 GB 0.0 B 0.0 B Anyone understand what it means to have more than 100% of an rdd cached? Thanks, -Nathan