Iterative Algorithms with Spark Streaming

2015-03-16 Thread Alex Minnaar
I wanted to ask a basic question about the types of algorithms that are 
possible to apply to a DStream with Spark streaming.  With Spark it is possible 
to perform iterative computations on RDDs like in the gradient descent example


  val points = spark.textFile(...).map(parsePoint).cache()
var w = Vector.random(D) // current separating plane
for (i - 1 to ITERATIONS) {
  val gradient = points.map(p =
(1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
  ).reduce(_ + _)
  w -= gradient
}


which has a global state w that is updated after each iteration and the updated 
value is then used in the next iteration.  My question is whether this type of 
algorithm is possible if the points variable was a DStream instead of an RDD?  
It seems like you could perform the same map as above which would create a 
gradient DStream and also use updateStateByKey to create a DStream for the w 
variable.  But the problem is that there doesn't seem to be a way to reuse the 
w DStream inside the map.  I don't think that it is possible for DStreams to 
communicate this way.  Am I correct that this is not possible with DStreams or 
am I missing something?


Note:  The reason I ask this question is that many machine learning algorithms 
are trained by stochastic gradient descent.  sgd is similar to the above 
gradient descent algorithm except each iteration is on a new minibatch of 
data points rather than the same data points for every iteration.  It seems 
like Spark streaming provides a natural way to stream in these minibatches (as 
RDDs) but if it is not able to keep track of an updating global state variable 
then I don't think it Spark streaming can be used for sgd.


Thanks,


Alex


word2vec more distributed

2015-02-05 Thread Alex Minnaar
I was wondering if there was any chance of getting a more distributed word2vec 
implementation.  I seem to be running out of memory from big local data 
structures such as

val syn1Global = new Array[Float](vocabSize * vectorSize)


Is there anyway chance of getting a version where these are all put in RDDs?


Thanks,


Using a RowMatrix inside a map

2015-01-14 Thread Alex Minnaar
I am working with a RowMatrix and I noticed in the multiply() method that the 
local matrix with which it is being multiplied is being distributed to all of 
the rows of the RowMatrix.  If this is the case, then is it impossible to 
multiply a row matrix within a map operation? Because this would essentially be 
creating RDDs within RDDs.  For example, If you had an RDD of local matrices 
and you wanted to perform a map operation where each local matrix is multiplied 
with a distributed matrix.  This does not seem possible since it would require 
distributing each local matrix in the map when multiplication occurs (i.e. 
creating an RDD in each element of the original RDD).  If this is true then 
does it mean you can only multiply a RowMatrix within the driver i.e. you 
cannot parallelize RowMatrix multiplications?


Thanks,


Alex


RowMatrix multiplication

2015-01-12 Thread Alex Minnaar
I have a rowMatrix on which I want to perform two multiplications.  The first 
is a right multiplication with a local matrix which is fine.  But after that I 
also wish to right multiply the transpose of my rowMatrix with a different 
local matrix.  I understand that there is no functionality to transpose a 
rowMatrix at this time but I was wondering if anyone could suggest a any kind 
of work-around for this.  I had thought that I might be able to initially 
create two rowMatrices - a normal version and a transposed version - and use 
either when appropriate.  Can anyone think of another alternative?


Thanks,


Alex


Re: RowMatrix multiplication

2015-01-12 Thread Alex Minnaar
That's not quite what I'm looking for.  Let me provide an example.  I have a 
rowmatrix A that is nxm and I have two local matrices b and c.  b is mx1 and c 
is nx1.  In my spark job I wish to perform the following two computations


A*b


and


A^T*c


I don't think this is possible without being able to transpose a rowmatrix.  Am 
I correct?


Thanks,


Alex


From: Reza Zadeh r...@databricks.com
Sent: Monday, January 12, 2015 1:58 PM
To: Alex Minnaar
Cc: u...@spark.incubator.apache.org
Subject: Re: RowMatrix multiplication

As you mentioned, you can perform A * b, where A is a rowmatrix and b is a 
local matrix.

From your email, I figure you want to compute b * A^T. To do this, you can 
compute C = A b^T, whose result is the transpose of what you were looking for, 
i.e. C^T = b * A^T. To undo the transpose, you would have transpose C manually 
yourself. Be careful though, because the result might not have each Row fit in 
memory on a single machine, which is what RowMatrix requires. This danger is 
why we didn't provide a transpose operation in RowMatrix natively.

To address this and more, there is an effort to provide more comprehensive 
linear algebra through block matrices, which will likely make it to 1.3:
https://issues.apache.org/jira/browse/SPARK-3434

Best,
Reza

On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar 
aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote:

I have a rowMatrix on which I want to perform two multiplications.  The first 
is a right multiplication with a local matrix which is fine.  But after that I 
also wish to right multiply the transpose of my rowMatrix with a different 
local matrix.  I understand that there is no functionality to transpose a 
rowMatrix at this time but I was wondering if anyone could suggest a any kind 
of work-around for this.  I had thought that I might be able to initially 
create two rowMatrices - a normal version and a transposed version - and use 
either when appropriate.  Can anyone think of another alternative?


Thanks,


Alex



Re: RowMatrix multiplication

2015-01-12 Thread Alex Minnaar
?Good idea! Join each element of c with the corresponding row of A, multiply 
through, then reduce.  I'll give this a try.


Thanks,


Alex


From: Reza Zadeh r...@databricks.com
Sent: Monday, January 12, 2015 3:05 PM
To: Alex Minnaar
Cc: u...@spark.incubator.apache.org
Subject: Re: RowMatrix multiplication

Yes you are correct, to do it with existing operations you would need a 
transpose on rowmatrix.

However, you can fairly easily perform the operation manually by doing a join 
(if the c vector is an RDD) or broadcasting c (if the c vector is small enough 
to fit in memory on a single machine).

On Mon, Jan 12, 2015 at 11:45 AM, Alex Minnaar 
aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote:

That's not quite what I'm looking for.  Let me provide an example.  I have a 
rowmatrix A that is nxm and I have two local matrices b and c.  b is mx1 and c 
is nx1.  In my spark job I wish to perform the following two computations


A*b


and


A^T*c


I don't think this is possible without being able to transpose a rowmatrix.  Am 
I correct?


Thanks,


Alex


From: Reza Zadeh r...@databricks.commailto:r...@databricks.com
Sent: Monday, January 12, 2015 1:58 PM
To: Alex Minnaar
Cc: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
Subject: Re: RowMatrix multiplication

As you mentioned, you can perform A * b, where A is a rowmatrix and b is a 
local matrix.

From your email, I figure you want to compute b * A^T. To do this, you can 
compute C = A b^T, whose result is the transpose of what you were looking for, 
i.e. C^T = b * A^T. To undo the transpose, you would have transpose C manually 
yourself. Be careful though, because the result might not have each Row fit in 
memory on a single machine, which is what RowMatrix requires. This danger is 
why we didn't provide a transpose operation in RowMatrix natively.

To address this and more, there is an effort to provide more comprehensive 
linear algebra through block matrices, which will likely make it to 1.3:
https://issues.apache.org/jira/browse/SPARK-3434

Best,
Reza

On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar 
aminn...@verticalscope.commailto:aminn...@verticalscope.com wrote:

I have a rowMatrix on which I want to perform two multiplications.  The first 
is a right multiplication with a local matrix which is fine.  But after that I 
also wish to right multiply the transpose of my rowMatrix with a different 
local matrix.  I understand that there is no functionality to transpose a 
rowMatrix at this time but I was wondering if anyone could suggest a any kind 
of work-around for this.  I had thought that I might be able to initially 
create two rowMatrices - a normal version and a transposed version - and use 
either when appropriate.  Can anyone think of another alternative?


Thanks,


Alex




Re: what is the best way to implement mini batches?

2014-12-03 Thread Alex Minnaar
I am trying to do the same thing and also wondering what the best strategy is.

Thanks


From: ll duy.huynh@gmail.com
Sent: Wednesday, December 3, 2014 10:28 AM
To: u...@spark.incubator.apache.org
Subject: what is the best way to implement mini batches?

hi.  what is the best way to pass through a large dataset in small,
sequential mini batches?

for example, with 1,000,000 data points and the mini batch size is 10,  we
would need to do some computation at these mini batches (0..9), (10..19),
(20..29), ... (N-9, N)

RDD.repartition(N/10).mapPartitions() work?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Example standalone app error!

2014-08-01 Thread Alex Minnaar
I think this is the problem.  I was working in a project that inherited some 
other Akka dependencies (of a different version).  I'm switching to a fresh new 
project which should solve the problem.

Thanks,

Alex

From: Tathagata Das tathagata.das1...@gmail.com
Sent: Thursday, July 31, 2014 8:36 PM
To: user@spark.apache.org
Subject: Re: Example standalone app error!

When are you guys getting the error? When Sparkcontext is created? Or
when it is being shutdown?
If this error is being thrown when the SparkContext is created, then
one possible reason maybe conflicting versions of Akka. Spark depends
on a version of Akka which is different from that of Scala, and
launching a spark app using Scala command (instead of Java) can cause
issues.

TD

On Thu, Jul 31, 2014 at 6:30 AM, Alex Minnaar
aminn...@verticalscope.com wrote:
 I am eager to solve this problem.  So if anyone has any suggestions I would
 be glad to hear them.


 Thanks,


 Alex

 
 From: Andrew Or and...@databricks.com
 Sent: Tuesday, July 29, 2014 4:53 PM
 To: user@spark.apache.org
 Subject: Re: Example standalone app error!

 Hi Alex,

 Very strange. This error occurs when someone tries to call an abstract
 method. I have run into this before and resolved it with a SBT clean
 followed by an assembly, so maybe you could give that a try.

 Let me know if that fixes it,
 Andrew


 2014-07-29 13:01 GMT-07:00 Alex Minnaar aminn...@verticalscope.com:

 I am trying to run an example Spark standalone app with the following code

 import org.apache.spark.streaming._
 import org.apache.spark.streaming.StreamingContext._

 object SparkGensimLDA extends App{

   val ssc=new StreamingContext(local,testApp,Seconds(5))

   val lines=ssc.textFileStream(/.../spark_example/)

   val words=lines.flatMap(_.split( ))

   val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _)

   wordCounts.print()


   ssc.start()
   ssc.awaitTermination()

 }


 However I am getting the following error


 15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR
 akka.actor.ActorSystemImpl - Uncaught fatal error from thread
 [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark]
 java.lang.AbstractMethodError: null
 at akka.actor.ActorCell.create(ActorCell.scala:580)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
 15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR
 akka.actor.ActorSystemImpl - Uncaught fatal error from thread
 [spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
 java.lang.AbstractMethodError: null
 at akka.actor.ActorCell.create(ActorCell.scala:580)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 ~[akka-actor_2.10-2.3.2.jar:na]
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 [scala-library-2.10.4.jar:na]
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
 15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating
 local directories at root dirs
 '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/'
 15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR
 akka.actor.ActorSystemImpl - Uncaught fatal error from thread
 [spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark]
 java.lang.AbstractMethodError

RE: Example standalone app error!

2014-07-30 Thread Alex Minnaar
Hi Andrew,


I'm not sure why an assembly would help (I don't have the Spark source code, I 
have just included Spark core and Spark streaming in my dependencies in my 
build file).  I did try it though and the error is still occurring.  I have 
tried cleaning and refreshing SBT many times as well.  Any other ideas?


Thanks,


Alex


From: Andrew Or and...@databricks.com
Sent: Tuesday, July 29, 2014 4:53 PM
To: user@spark.apache.org
Subject: Re: Example standalone app error!

Hi Alex,

Very strange. This error occurs when someone tries to call an abstract method. 
I have run into this before and resolved it with a SBT clean followed by an 
assembly, so maybe you could give that a try.

Let me know if that fixes it,
Andrew


2014-07-29 13:01 GMT-07:00 Alex Minnaar 
aminn...@verticalscope.commailto:aminn...@verticalscope.com:
I am trying to run an example Spark standalone app with the following code

import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object SparkGensimLDA extends App{

  val ssc=new StreamingContext(local,testApp,Seconds(5))

  val lines=ssc.textFileStream(/.../spark_example/)

  val words=lines.flatMap(_.split( ))

  val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _)

  wordCounts.print()


  ssc.start()
  ssc.awaitTermination()

}


However I am getting the following error


15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local 
directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/'
15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: 
org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
 ~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.terminate(ActorCell.scala:369) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na

Example standalone app error!

2014-07-29 Thread Alex Minnaar
I am trying to run an example Spark standalone app with the following code

import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object SparkGensimLDA extends App{

  val ssc=new StreamingContext(local,testApp,Seconds(5))

  val lines=ssc.textFileStream(/.../spark_example/)

  val words=lines.flatMap(_.split( ))

  val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _)

  wordCounts.print()


  ssc.start()
  ssc.awaitTermination()

}


However I am getting the following error


15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local 
directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/'
15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: 
org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
 ~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.terminate(ActorCell.scala:369) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.177 [spark-akka.actor.default-dispatcher-4] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: 

Spark java.lang.AbstractMethodError

2014-07-28 Thread Alex Minnaar
I am trying to run an example Spark standalone app with the following code


import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object SparkGensimLDA extends App{

  val ssc=new StreamingContext(local,testApp,Seconds(5))

  val lines=ssc.textFileStream(/.../spark_example/)

  val words=lines.flatMap(_.split( ))

  val wordCounts=words.map(x = (x,1)).reduceByKey(_ + _)

  wordCounts.print()


  ssc.start()
  ssc.awaitTermination()

}



However I am getting the following error



15:35:40.170 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.171 [spark-akka.actor.default-dispatcher-2] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: null
at akka.actor.ActorCell.create(ActorCell.scala:580) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.175 [main] DEBUG o.a.spark.storage.DiskBlockManager - Creating local 
directories at root dirs '/var/folders/6y/h1f088_j007_d11kpwb1jg6mgp/T/'
15:35:40.176 [spark-akka.actor.default-dispatcher-4] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-2] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: 
org.apache.spark.storage.BlockManagerMasterActor.aroundPostStop()V
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
 ~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.terminate(ActorCell.scala:369) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) 
~[akka-actor_2.10-2.3.2.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) ~[akka-actor_2.10-2.3.2.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 [akka-actor_2.10-2.3.2.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[scala-library-2.10.4.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 [scala-library-2.10.4.jar:na]
15:35:40.177 [spark-akka.actor.default-dispatcher-4] ERROR 
akka.actor.ActorSystemImpl - Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.AbstractMethodError: