Re: Spark issue running jar on Linux vs Windows
Thanks for the advice. In my case it turned out to be two issues. - use Java rather than Scala to launch the process, putting the core Scala libs on the class path. - I needed a merge strategy of Concat for reference.conf files in my build.sbt Regards, Mike > On 23 Oct 2015, at 01:00, Ted Yu wrote: > > RemoteActorRefProvider is in akka-remote_2.10-2.3.11.jar > > jar tvf > ~/.m2/repository/com/typesafe/akka/akka-remote_2.10/2.3.11/akka-remote_2.10-2.3.11.jar > | grep RemoteActorRefProvi > 1761 Fri May 08 16:13:02 PDT 2015 > akka/remote/RemoteActorRefProvider$$anonfun$5.class > 1416 Fri May 08 16:13:02 PDT 2015 > akka/remote/RemoteActorRefProvider$$anonfun$6.class > > Is the above jar on your classpath ? > > Cheers > >> On Thu, Oct 22, 2015 at 4:39 PM, Michael Lewis wrote: >> Hi, >> >> I have a Spark driver process that I have built into a single ‘fat jar’ this >> runs fine, in Cygwin, on my development machine, >> I can run: >> >> scala -cp my-fat-jar-1.0.0.jar com.foo.MyMainClass >> >> this works fine, it will submit Spark job, they process, all good. >> >> >> However, on Linux (all Jars Spark(1.4) and Scala version (2.10.5) being the >> same), I get this error: >> >> 18:59:14.358 [Curator-QueueBuilder-2] ERROR o.a.c.f.r.queue.DistributedQueue >> - Exception processing queue item: queue-00 >> java.lang.NoSuchMethodException: >> akka.remote.RemoteActorRefProvider.(java.lang.String, >> akka.actor.ActorSystem$Settings, akka.event.EventStream, >> akka.actor.Scheduler, akka.act >> at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_60] >> at java.lang.Class.getDeclaredConstructor(Class.java:2178) >> ~[na:1.8.0_60] >> >> i.e. No such method exception. Can anyone suggest how I can fix this? >> >> I’ve tried changing the scala to java and putting scale-lang on the class >> path, but this just generates new errors about missing akka configuration. >> >> Given the identical jars and scala version - I’m not sure why I’m getting >> this error running driver on Linux. >> >> Appreciate any help/pointers. >> >> >> Thanks, >> Mike Lewis >
Spark issue running jar on Linux vs Windows
Hi, I have a Spark driver process that I have built into a single ‘fat jar’ this runs fine, in Cygwin, on my development machine, I can run: scala -cp my-fat-jar-1.0.0.jar com.foo.MyMainClass this works fine, it will submit Spark job, they process, all good. However, on Linux (all Jars Spark(1.4) and Scala version (2.10.5) being the same), I get this error: 18:59:14.358 [Curator-QueueBuilder-2] ERROR o.a.c.f.r.queue.DistributedQueue - Exception processing queue item: queue-00 java.lang.NoSuchMethodException: akka.remote.RemoteActorRefProvider.(java.lang.String, akka.actor.ActorSystem$Settings, akka.event.EventStream, akka.actor.Scheduler, akka.act at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_60] at java.lang.Class.getDeclaredConstructor(Class.java:2178) ~[na:1.8.0_60] i.e. No such method exception. Can anyone suggest how I can fix this? I’ve tried changing the scala to java and putting scale-lang on the class path, but this just generates new errors about missing akka configuration. Given the identical jars and scala version - I’m not sure why I’m getting this error running driver on Linux. Appreciate any help/pointers. Thanks, Mike Lewis
'nested' RDD problem, advise needed
Hi, I wonder if someone can help suggest a solution to my problem, I had a simple process working using Strings and now want to convert to RDD[Char], the problem is when I end up with a nested call as follow: 1) Load a text file into an RDD[Char] val inputRDD = sc.textFile(“myFile.txt”).flatMap(_.toIterator) 2) I have a method that takes two parameters: object Foo { def myFunction(inputRDD: RDD[Char], int val) : RDD[Char] ... 3) I have a method that the driver process calls once its loaded the inputRDD ‘bar’ as follows: def bar(inputRDD: Rdd[Char) : Int = { val solutionSet = sc.parallelize(1 to alphabetLength toList).map(shift => (shift, Object.myFunction(inputRDD,shift))) What I’m trying to do is take a list 1..26 and generate a set of tuples { (1,RDD(1)), …. (26,RDD(26)) } which is the inputRDD passed through the function above, but with different set of shift parameters. In my original I could parallelise the algorithm fine, but my input string had to be in a ‘String’ variable, I’d rather it be an RDD (string could be large). I think the way I’m trying to do it above won’t work because its a nested RDD call. Can anybody suggest a solution? Regards, Mike Lewis - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Reading a text file into RDD[Char] instead of RDD[String]
Hi, I’m struggling to think of the best way to read a text file into an RDD[Char] rather than [String] I can do: sc.textFile(….) which gives me the Rdd[String], Can anyone suggest the most efficient way to create the RDD[Char] ? I’m sure I’ve missed something simple… Regards, Mike - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
memory leak query
Hi, I hope someone can help as I’m not sure if I’m using Spark correctly. Basically, in the simple example below I create an RDD which is just a sequence of random numbers. I then have a loop where I just invoke rdd.count() what I can see is that the memory use always nudges upwards. If I attach YourKit to the JVM, I can see the garbage collector in action, but eventually the JVM runs out of memory. Can anyone spot if I am doing something wrong? (Obviously the example is slightly contrived, but basically I have an RDD with a set of numbers and I’d like to submit lots of jobs that perform some calculation, this was the simplest case I could create that would exhibit same memory issue.) Regards & Thanks, Mike import org.apache.spark.rdd.RDD import org.apache.spark.{SparkContext, SparkConf} import scala.util.Random object SparkTest { def main(args: Array[String]) { println ("spark memory test") val jars = Seq("spark-test-1.0-SNAPSHOT.jar") val sparkConfig : SparkConf = new SparkConf() .setMaster("local") .setAppName("tester") .setJars(jars) val sparkContext = new SparkContext(sparkConfig) val list = Seq.fill(120)(Random.nextInt) val rdd : RDD[Int] = sparkContext.makeRDD(list,10) for (i <- 1 to 100) { rdd.count() } sparkContext.stop() } }