Re: Spark issue running jar on Linux vs Windows

2015-10-23 Thread Michael Lewis
Thanks for the advice. In my case it turned out to be two issues.

- use Java rather than Scala to launch the process, putting the core Scala libs 
on the class path.

- I needed a merge strategy of Concat for reference.conf files in my build.sbt

Regards,
Mike


> On 23 Oct 2015, at 01:00, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> RemoteActorRefProvider is in akka-remote_2.10-2.3.11.jar
> 
> jar tvf 
> ~/.m2/repository/com/typesafe/akka/akka-remote_2.10/2.3.11/akka-remote_2.10-2.3.11.jar
>  | grep RemoteActorRefProvi
>   1761 Fri May 08 16:13:02 PDT 2015 
> akka/remote/RemoteActorRefProvider$$anonfun$5.class
>   1416 Fri May 08 16:13:02 PDT 2015 
> akka/remote/RemoteActorRefProvider$$anonfun$6.class
> 
> Is the above jar on your classpath ?
> 
> Cheers
> 
>> On Thu, Oct 22, 2015 at 4:39 PM, Michael Lewis <lewi...@icloud.com> wrote:
>> Hi,
>> 
>> I have a Spark driver process that I have built into a single ‘fat jar’ this 
>> runs fine,  in Cygwin, on my development machine,
>> I can run:
>>  
>> scala -cp my-fat-jar-1.0.0.jar com.foo.MyMainClass
>> 
>> this works fine, it will submit Spark job, they process, all good.
>> 
>> 
>> However, on Linux (all Jars Spark(1.4) and Scala version (2.10.5) being the 
>> same), I get this error:
>> 
>> 18:59:14.358 [Curator-QueueBuilder-2] ERROR o.a.c.f.r.queue.DistributedQueue 
>> - Exception processing queue item: queue-00
>> java.lang.NoSuchMethodException: 
>> akka.remote.RemoteActorRefProvider.(java.lang.String, 
>> akka.actor.ActorSystem$Settings, akka.event.EventStream, 
>> akka.actor.Scheduler, akka.act
>> at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_60]
>> at java.lang.Class.getDeclaredConstructor(Class.java:2178) 
>> ~[na:1.8.0_60]
>>  
>> i.e. No such method exception.  Can anyone suggest how I can fix this?
>> 
>> I’ve tried changing the scala to java and putting scale-lang on the class 
>> path, but this just generates new errors about missing akka configuration.
>> 
>> Given the identical jars and scala version - I’m not sure why I’m getting 
>> this error running driver on Linux.
>> 
>> Appreciate any help/pointers.
>> 
>> 
>> Thanks,
>> Mike Lewis
> 


Spark issue running jar on Linux vs Windows

2015-10-22 Thread Michael Lewis
Hi,

I have a Spark driver process that I have built into a single ‘fat jar’ this 
runs fine,  in Cygwin, on my development machine,
I can run:
 
scala -cp my-fat-jar-1.0.0.jar com.foo.MyMainClass

this works fine, it will submit Spark job, they process, all good.


However, on Linux (all Jars Spark(1.4) and Scala version (2.10.5) being the 
same), I get this error:

18:59:14.358 [Curator-QueueBuilder-2] ERROR o.a.c.f.r.queue.DistributedQueue - 
Exception processing queue item: queue-00
java.lang.NoSuchMethodException: 
akka.remote.RemoteActorRefProvider.(java.lang.String, 
akka.actor.ActorSystem$Settings, akka.event.EventStream, akka.actor.Scheduler, 
akka.act
at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_60]
at java.lang.Class.getDeclaredConstructor(Class.java:2178) 
~[na:1.8.0_60]
 
i.e. No such method exception.  Can anyone suggest how I can fix this?

I’ve tried changing the scala to java and putting scale-lang on the class path, 
but this just generates new errors about missing akka configuration.

Given the identical jars and scala version - I’m not sure why I’m getting this 
error running driver on Linux.

Appreciate any help/pointers.


Thanks,
Mike Lewis

'nested' RDD problem, advise needed

2015-03-21 Thread Michael Lewis
Hi,

I wonder if someone can help suggest a solution to my problem, I had a simple 
process working using Strings and now
want to convert to RDD[Char], the problem is when I end up with a nested call 
as follow:


1) Load a text file into an RDD[Char]

val inputRDD = sc.textFile(“myFile.txt”).flatMap(_.toIterator)


2) I have a method that takes two parameters:

object Foo
{
def myFunction(inputRDD: RDD[Char], int val) : RDD[Char] ...


3) I have a method that the driver process calls once its loaded the inputRDD 
‘bar’ as follows:

def bar(inputRDD: Rdd[Char) : Int = {

 val solutionSet = sc.parallelize(1 to alphabetLength toList).map(shift 
= (shift, Object.myFunction(inputRDD,shift)))



What I’m trying to do is take a list 1..26 and generate a set of tuples { 
(1,RDD(1)), …. (26,RDD(26)) }  which is the inputRDD passed through
the function above, but with different set of shift parameters.

In my original I could parallelise the algorithm fine, but my input string had 
to be in a ‘String’ variable, I’d rather it be an RDD 
(string could be large). I think the way I’m trying to do it above won’t work 
because its a nested RDD call. 

Can anybody suggest a solution?

Regards,
Mike Lewis





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Reading a text file into RDD[Char] instead of RDD[String]

2015-03-19 Thread Michael Lewis
Hi,

I’m struggling to think of the best way to read a text file into an RDD[Char] 
rather than [String]

I can do: 

sc.textFile(….)  which gives me the Rdd[String],

Can anyone suggest the most efficient way to create the RDD[Char] ? I’m sure 
I’ve missed something simple…

Regards,
Mike
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



memory leak query

2014-07-07 Thread Michael Lewis
Hi,

I hope someone can help as  I’m not sure if I’m using Spark correctly. 
Basically, in the simple example below 
I create an RDD which is just a sequence of random numbers. I then have a loop 
where I just invoke rdd.count()
what I can see  is that the memory use always nudges upwards.

If I attach YourKit to the JVM, I can see the garbage collector in action, but 
eventually the JVM runs out of memory.

Can anyone spot if I am doing something wrong? (Obviously the example is 
slightly contrived, but basically I 
have an RDD with a set of numbers and I’d like to submit lots of jobs that 
perform some calculation, this was
the simplest case I could create that would exhibit same memory issue.)

Regards  Thanks,
Mike


import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import scala.util.Random

object SparkTest {
  def main(args: Array[String]) {
println (spark memory test)

val jars = Seq(spark-test-1.0-SNAPSHOT.jar)

val sparkConfig : SparkConf = new SparkConf()
  .setMaster(local)
  .setAppName(tester)
  .setJars(jars)

val sparkContext = new SparkContext(sparkConfig)
val list = Seq.fill(120)(Random.nextInt)
val rdd : RDD[Int] = sparkContext.makeRDD(list,10)

for (i - 1 to 100) {
  rdd.count()
}
sparkContext.stop()
  }
}