Iterative Algorithms with Spark Streaming
I wanted to ask a basic question about the types of algorithms that are possible to apply to a DStream with Spark streaming. With Spark it is possible to perform iterative computations on RDDs like in the gradient descent example val points = spark.textFile(...).map(parsePoint).cache() var w = Vector.random(D) // current separating plane for (i - 1 to ITERATIONS) { val gradient = points.map(p = (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient } which has a global state w that is updated after each iteration and the updated value is then used in the next iteration. My question is whether this type of algorithm is possible if the points variable was a DStream instead of an RDD? It seems like you could perform the same map as above which would create a gradient DStream and also use updateStateByKey to create a DStream for the w variable. But the problem is that there doesn't seem to be a way to reuse the w DStream inside the map. I don't think that it is possible for DStreams to communicate this way. Am I correct that this is not possible with DStreams or am I missing something? Note: The reason I ask this question is that many machine learning algorithms are trained by stochastic gradient descent. sgd is similar to the above gradient descent algorithm except each iteration is on a new minibatch of data points rather than the same data points for every iteration. It seems like Spark streaming provides a natural way to stream in these minibatches (as RDDs) but if it is not able to keep track of an updating global state variable then I don't think it Spark streaming can be used for sgd. Thanks, Alex
word2vec more distributed
I was wondering if there was any chance of getting a more distributed word2vec implementation. I seem to be running out of memory from big local data structures such as val syn1Global = new Array[Float](vocabSize * vectorSize) Is there anyway chance of getting a version where these are all put in RDDs? Thanks,
Using a RowMatrix inside a map
I am working with a RowMatrix and I noticed in the multiply() method that the local matrix with which it is being multiplied is being distributed to all of the rows of the RowMatrix. If this is the case, then is it impossible to multiply a row matrix within a map operation? Because this would essentially be creating RDDs within RDDs. For example, If you had an RDD of local matrices and you wanted to perform a map operation where each local matrix is multiplied with a distributed matrix. This does not seem possible since it would require distributing each local matrix in the map when multiplication occurs (i.e. creating an RDD in each element of the original RDD). If this is true then does it mean you can only multiply a RowMatrix within the driver i.e. you cannot parallelize RowMatrix multiplications? Thanks, Alex
RowMatrix multiplication
I have a rowMatrix on which I want to perform two multiplications. The first is a right multiplication with a local matrix which is fine. But after that I also wish to right multiply the transpose of my rowMatrix with a different local matrix. I understand that there is no functionality to transpose a rowMatrix at this time but I was wondering if anyone could suggest a any kind of work-around for this. I had thought that I might be able to initially create two rowMatrices - a normal version and a transposed version - and use either when appropriate. Can anyone think of another alternative? Thanks, Alex
Re: RowMatrix multiplication
That's not quite what I'm looking for. Let me provide an example. I have a rowmatrix A that is nxm and I have two local matrices b and c. b is mx1 and c is nx1. In my spark job I wish to perform the following two computations A*b and A^T*c I don't think this is possible without being able to transpose a rowmatrix. Am I correct? Thanks, Alex From: Reza Zadeh r...@databricks.com Sent: Monday, January 12, 2015 1:58 PM To: Alex Minnaar Cc: u...@spark.incubator.apache.org Subject: Re: RowMatrix multiplication As you mentioned, you can perform A * b, where A is a rowmatrix and b is a local matrix. From your email, I figure you want to compute b * A^T. To do this, you can compute C = A b^T, whose result is the transpose of what you were looking for, i.e. C^T = b * A^T. To undo the transpose, you would have transpose C manually yourself. Be careful though, because the result might not have each Row fit in memory on a single machine, which is what RowMatrix requires. This danger is why we didn't provide a transpose operation in RowMatrix natively. To address this and more, there is an effort to provide more comprehensive linear algebra through block matrices, which will likely make it to 1.3: https://issues.apache.org/jira/browse/SPARK-3434 Best, Reza On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote: I have a rowMatrix on which I want to perform two multiplications. The first is a right multiplication with a local matrix which is fine. But after that I also wish to right multiply the transpose of my rowMatrix with a different local matrix. I understand that there is no functionality to transpose a rowMatrix at this time but I was wondering if anyone could suggest a any kind of work-around for this. I had thought that I might be able to initially create two rowMatrices - a normal version and a transposed version - and use either when appropriate. Can anyone think of another alternative? Thanks, Alex
Re: RowMatrix multiplication
?Good idea! Join each element of c with the corresponding row of A, multiply through, then reduce. I'll give this a try. Thanks, Alex From: Reza Zadeh r...@databricks.com Sent: Monday, January 12, 2015 3:05 PM To: Alex Minnaar Cc: u...@spark.incubator.apache.org Subject: Re: RowMatrix multiplication Yes you are correct, to do it with existing operations you would need a transpose on rowmatrix. However, you can fairly easily perform the operation manually by doing a join (if the c vector is an RDD) or broadcasting c (if the c vector is small enough to fit in memory on a single machine). On Mon, Jan 12, 2015 at 11:45 AM, Alex Minnaar aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote: That's not quite what I'm looking for. Let me provide an example. I have a rowmatrix A that is nxm and I have two local matrices b and c. b is mx1 and c is nx1. In my spark job I wish to perform the following two computations A*b and A^T*c I don't think this is possible without being able to transpose a rowmatrix. Am I correct? Thanks, Alex From: Reza Zadeh r...@databricks.commailto:r...@databricks.com Sent: Monday, January 12, 2015 1:58 PM To: Alex Minnaar Cc: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: RowMatrix multiplication As you mentioned, you can perform A * b, where A is a rowmatrix and b is a local matrix. From your email, I figure you want to compute b * A^T. To do this, you can compute C = A b^T, whose result is the transpose of what you were looking for, i.e. C^T = b * A^T. To undo the transpose, you would have transpose C manually yourself. Be careful though, because the result might not have each Row fit in memory on a single machine, which is what RowMatrix requires. This danger is why we didn't provide a transpose operation in RowMatrix natively. To address this and more, there is an effort to provide more comprehensive linear algebra through block matrices, which will likely make it to 1.3: https://issues.apache.org/jira/browse/SPARK-3434 Best, Reza On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote: I have a rowMatrix on which I want to perform two multiplications. The first is a right multiplication with a local matrix which is fine. But after that I also wish to right multiply the transpose of my rowMatrix with a different local matrix. I understand that there is no functionality to transpose a rowMatrix at this time but I was wondering if anyone could suggest a any kind of work-around for this. I had thought that I might be able to initially create two rowMatrices - a normal version and a transposed version - and use either when appropriate. Can anyone think of another alternative? Thanks, Alex
Re: what is the best way to implement mini batches?
I am trying to do the same thing and also wondering what the best strategy is. Thanks From: ll duy.huynh@gmail.com Sent: Wednesday, December 3, 2014 10:28 AM To: u...@spark.incubator.apache.org Subject: what is the best way to implement mini batches? hi. what is the best way to pass through a large dataset in small, sequential mini batches? for example, with 1,000,000 data points and the mini batch size is 10, we would need to do some computation at these mini batches (0..9), (10..19), (20..29), ... (N-9, N) RDD.repartition(N/10).mapPartitions() work? thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Example standalone app error!
I think this is the problem. I was working in a project that inherited some other Akka dependencies (of a different version). I'm switching to a fresh new project which should solve the problem. Thanks, Alex From: Tathagata Das tathagata.das1...@gmail.com Sent: Thursday, July 31, 2014 8:36 PM To: user@spark.apache.org Subject: Re: Example standalone app error! When are you guys getting the error? When Sparkcontext is created? Or when it is being shutdown? If this error is being thrown when the SparkContext is created, then one possible reason maybe conflicting versions of Akka. Spark depends on a version of Akka which is different from that of Scala, and launching a spark app using Scala command (instead of Java) can cause issues. TD On Thu, Jul 31, 2014 at 6:30 AM, Alex Minnaar aminn...@verticalscope.com wrote: I am eager to solve this problem. So if anyone has any suggestions I would be glad to hear them. Thanks, Alex From: Andrew Or and...@databricks.com Sent: Tuesday, July 29, 2014 4:53 PM To: user@spark.apache.org Subject: Re: Example standalone app error! Hi Alex, Very strange. This error occurs when someone tries to call an abstract method. I have run into this before and resolved it with a SBT clean followed by an assembly, so maybe you could give that a try. Let me know if that fixes it, Andrew 2014-07-29 13:01 GMT-07:00 Alex Minnaar aminn...@verticalscope.com: I am trying to run an example Spark standalone app with the following code import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object SparkGensimLDA extends App{ val ssc=new StreamingContext(local,testApp,Seconds(5)) val lines=ssc.textFileStream(/.../spark_example/) val words=lines.flatMap(_.split( )) val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() } However I am getting the following error 15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/' 15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark] java.lang.AbstractMethodError
RE: Example standalone app error!
Hi Andrew, I'm not sure why an assembly would help (I don't have the Spark source code, I have just included Spark core and Spark streaming in my dependencies in my build file). I did try it though and the error is still occurring. I have tried cleaning and refreshing SBT many times as well. Any other ideas? Thanks, Alex From: Andrew Or and...@databricks.com Sent: Tuesday, July 29, 2014 4:53 PM To: user@spark.apache.org Subject: Re: Example standalone app error! Hi Alex, Very strange. This error occurs when someone tries to call an abstract method. I have run into this before and resolved it with a SBT clean followed by an assembly, so maybe you could give that a try. Let me know if that fixes it, Andrew 2014-07-29 13:01 GMT-07:00 Alex Minnaar aminn...@verticalscope.commailto:aminn...@verticalscope.com: I am trying to run an example Spark standalone app with the following code import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object SparkGensimLDA extends App{ val ssc=new StreamingContext(local,testApp,Seconds(5)) val lines=ssc.textFileStream(/.../spark_example/) val words=lines.flatMap(_.split( )) val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() } However I am getting the following error 15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/' 15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark] java.lang.AbstractMethodError: org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.terminate(ActorCell.scala:369) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na
Example standalone app error!
I am trying to run an example Spark standalone app with the following code import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object SparkGensimLDA extends App{ val ssc=new StreamingContext(local,testApp,Seconds(5)) val lines=ssc.textFileStream(/.../spark_example/) val words=lines.flatMap(_.split( )) val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() } However I am getting the following error 15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/' 15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark] java.lang.AbstractMethodError: org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.terminate(ActorCell.scala:369) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.177 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError:
Spark java.lang.AbstractMethodError
I am trying to run an example Spark standalone app with the following code import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object SparkGensimLDA extends App{ val ssc=new StreamingContext(local,testApp,Seconds(5)) val lines=ssc.textFileStream(/.../spark_example/) val words=lines.flatMap(_.split( )) val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() } However I am getting the following error 15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError: null at akka.actor.ActorCell.create(ActorCell.scala:580) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/' 15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark] java.lang.AbstractMethodError: org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.terminate(ActorCell.scala:369) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) ~[akka-actor_2.10-2.3.2.jar:na] at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.2.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na] 15:35:40.177 [spark-akka.actor.default-dispatcher-4] ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark] java.lang.AbstractMethodError: