Re: ADD_JARS doesn't properly work for spark-shell

2014-01-05 Thread Aureliano Buendia
On Sun, Jan 5, 2014 at 6:01 AM, Aaron Davidson ilike...@gmail.com wrote:

 That sounds like a different issue. What is the type of myrdd (i.e., if
 you just type myrdd into the shell)? It's possible it's defined as an
 RDD[Nothing] and thus all operations try to typecast to Nothing, which
 always fails. Perhaps declaring it initially with respect to your class
 would help, something like
 val myrdd: RDD[mypackage.MyClass] = sc.sequenceFile(...)


This solved the problem, thanks!

Is it because sc.objectFile() returns RDD{Nothing], or is it a spark-shell
problem?




 On Sat, Jan 4, 2014 at 8:29 PM, Aureliano Buendia buendia...@gmail.comwrote:

 While myrdd.count() works, a lot of other actions and transformations do
 not still work in spark-shell. Eg myrdd.first() gives this error:

 java.lang.ClassCastException: mypackage.MyClass cannot be cast to
 scala.runtime.Nothing$

 Also, myrdd.map(r = r) returns:

 org.apache.spark.rdd.RDD[*Nothing*] = MappedRDD[2]

 Basically, type mypackage.MyClass gets converted to Nothing during any
 action/transformation.



 On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia 
 buendia...@gmail.comwrote:

 Sorry, I had a typo. I can conform that using ADD_JARS together with
 SPARK_CLASSPATH works as expected in spark-shell.

 It'd make sense to have the two combined as one option.


 On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson ilike...@gmail.comwrote:

 Cool. To confirm, you said you can access the class and construct new
 objects -- did you do this in the shell itself (i.e., on the driver), or on
 the executors?

 Specifically, one of the following two should fail in the shell:
  new mypackage.MyClass()
  sc.parallelize(0 until 10, 2).foreach(_ = new mypackage.MyClass())
 (or just import it)

 You could also try running MASTER=local-cluster[2,1,512] which launches
 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster
 more closely, in case it's a bug only related to using local mode.


 On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia buendia...@gmail.com
  wrote:




 On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson ilike...@gmail.comwrote:

 Additionally, which version of Spark are you running?


 0.8.1.

 Unfortunately, this doesn't work either:

 MASTER=local[2] ADD_JARS=/path/to/my/jar
 SPARK_CLASSPATH=/path/to/my/jar ./spark-shell




 On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson ilike...@gmail.comwrote:

 I am not an expert on these classpath issues, but if you're using
 local mode, you might also try to set SPARK_CLASSPATH to include the 
 path
 to the jar file as well. This should not really help, since adding 
 jars
 is the right way to get the jars to your executors (which is where the
 exception appears to be happening), but it would sure be interesting if 
 it
 did.


 On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia 
 buendia...@gmail.com wrote:

 I should add that I can see in the log that the jar being shipped
 to the workers:

 14/01/04 15:34:52 INFO Executor: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar with timestamp
 131979092
 14/01/04 15:34:52 INFO Utils: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar to
 /var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/fetchFileTemp8322008964976744710.tmp
 14/01/04 15:34:53 INFO Executor: Adding
 file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
 to class loader


 On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia 
 buendia...@gmail.com wrote:

 Hi,

 I'm trying to access my stand alone spark app from spark-shell. I
 tried starting the shell by:

 MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

 The log shows that the jar file was loaded. Also, I can access and
 create a new instance of mypackage.MyClass.

 The problem is that myRDD.collect() returns RDD[MyClass], and
 that throws this exception:

 java.lang.ClassNotFoundException: mypackage.MyClass
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at
 java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
   at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
   at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
   at
 java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
   at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
   at
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
   at
 

ADD_JARS doesn't properly work for spark-shell

2014-01-04 Thread Aureliano Buendia
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried
starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a
new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws
this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at
org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at
org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)

Does this mean that my jar was not shipped to the workers? Is this a known
issue, or am I doing something wrong here?


Re: ADD_JARS doesn't properly work for spark-shell

2014-01-04 Thread Imran Rashid
actually, I think adding it to SPARK_CLASSPATH is exactly right.  The
exception is not on the executors, but in the driver -- its happening when
the driver tries to read results that the executor is sending back to it.

So the executors know about mypackage.MyClass, they happily run and send
their data back to the driver, and then the driver tries to read those
results and blows up, because it hasn't loaded the jar.

probably ADD_JARS should get auto-added to SPARK_CLASSPATH, but for now, I
think it will work if you just list it in both


On Sat, Jan 4, 2014 at 8:28 PM, Aaron Davidson ilike...@gmail.com wrote:

 Additionally, which version of Spark are you running?


 On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson ilike...@gmail.com wrote:

 I am not an expert on these classpath issues, but if you're using local
 mode, you might also try to set SPARK_CLASSPATH to include the path to the
 jar file as well. This should not really help, since adding jars is the
 right way to get the jars to your executors (which is where the exception
 appears to be happening), but it would sure be interesting if it did.


 On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia 
 buendia...@gmail.comwrote:

 I should add that I can see in the log that the jar being shipped to the
 workers:

 14/01/04 15:34:52 INFO Executor: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar with timestamp 131979092
 14/01/04 15:34:52 INFO Utils: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar to
 /var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/fetchFileTemp8322008964976744710.tmp
 14/01/04 15:34:53 INFO Executor: Adding
 file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
 to class loader


 On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia buendia...@gmail.com
  wrote:

 Hi,

 I'm trying to access my stand alone spark app from spark-shell. I tried
 starting the shell by:

 MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

 The log shows that the jar file was loaded. Also, I can access and
 create a new instance of mypackage.MyClass.

 The problem is that myRDD.collect() returns RDD[MyClass], and that
 throws this exception:

 java.lang.ClassNotFoundException: mypackage.MyClass
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
   at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
   at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
   at
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
   at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)

 Does this mean that my jar was not shipped to the workers? Is this a
 known issue, or am I doing something wrong here?







Re: ADD_JARS doesn't properly work for spark-shell

2014-01-04 Thread Aureliano Buendia
On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson ilike...@gmail.com wrote:

 Additionally, which version of Spark are you running?


0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar
SPARK_CLASSPATH=/path/to/my/jar./spark-shell




 On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson ilike...@gmail.com wrote:

 I am not an expert on these classpath issues, but if you're using local
 mode, you might also try to set SPARK_CLASSPATH to include the path to the
 jar file as well. This should not really help, since adding jars is the
 right way to get the jars to your executors (which is where the exception
 appears to be happening), but it would sure be interesting if it did.


 On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia 
 buendia...@gmail.comwrote:

 I should add that I can see in the log that the jar being shipped to the
 workers:

 14/01/04 15:34:52 INFO Executor: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar with timestamp 131979092
 14/01/04 15:34:52 INFO Utils: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar to
 /var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/fetchFileTemp8322008964976744710.tmp
 14/01/04 15:34:53 INFO Executor: Adding
 file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
 to class loader


 On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia buendia...@gmail.com
  wrote:

 Hi,

 I'm trying to access my stand alone spark app from spark-shell. I tried
 starting the shell by:

 MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

 The log shows that the jar file was loaded. Also, I can access and
 create a new instance of mypackage.MyClass.

 The problem is that myRDD.collect() returns RDD[MyClass], and that
 throws this exception:

 java.lang.ClassNotFoundException: mypackage.MyClass
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
   at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
   at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
   at
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
   at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)

 Does this mean that my jar was not shipped to the workers? Is this a
 known issue, or am I doing something wrong here?







Re: ADD_JARS doesn't properly work for spark-shell

2014-01-04 Thread Aaron Davidson
Cool. To confirm, you said you can access the class and construct new
objects -- did you do this in the shell itself (i.e., on the driver), or on
the executors?

Specifically, one of the following two should fail in the shell:
 new mypackage.MyClass()
 sc.parallelize(0 until 10, 2).foreach(_ = new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2
executors, 1 core each, with 512MB in a setup that mimics a real cluster
more closely, in case it's a bug only related to using local mode.


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia buendia...@gmail.comwrote:




 On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson ilike...@gmail.com wrote:

 Additionally, which version of Spark are you running?


 0.8.1.

 Unfortunately, this doesn't work either:

 MASTER=local[2] ADD_JARS=/path/to/my/jar 
 SPARK_CLASSPATH=/path/to/my/jar./spark-shell




 On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson ilike...@gmail.comwrote:

 I am not an expert on these classpath issues, but if you're using local
 mode, you might also try to set SPARK_CLASSPATH to include the path to the
 jar file as well. This should not really help, since adding jars is the
 right way to get the jars to your executors (which is where the exception
 appears to be happening), but it would sure be interesting if it did.


 On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia 
 buendia...@gmail.comwrote:

 I should add that I can see in the log that the jar being shipped to
 the workers:

 14/01/04 15:34:52 INFO Executor: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar with timestamp 131979092
 14/01/04 15:34:52 INFO Utils: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar to
 /var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/fetchFileTemp8322008964976744710.tmp
 14/01/04 15:34:53 INFO Executor: Adding
 file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
 to class loader


 On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia 
 buendia...@gmail.com wrote:

 Hi,

 I'm trying to access my stand alone spark app from spark-shell. I
 tried starting the shell by:

 MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

 The log shows that the jar file was loaded. Also, I can access and
 create a new instance of mypackage.MyClass.

 The problem is that myRDD.collect() returns RDD[MyClass], and that
 throws this exception:

 java.lang.ClassNotFoundException: mypackage.MyClass
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
   at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
   at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
   at
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
   at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)

 Does this mean that my jar was not shipped to the workers? Is this a
 known issue, or am I doing something wrong here?








Re: ADD_JARS doesn't properly work for spark-shell

2014-01-04 Thread Aureliano Buendia
Sorry, I had a typo. I can conform that using ADD_JARS together with
SPARK_CLASSPATH works as expected in spark-shell.

It'd make sense to have the two combined as one option.


On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson ilike...@gmail.com wrote:

 Cool. To confirm, you said you can access the class and construct new
 objects -- did you do this in the shell itself (i.e., on the driver), or on
 the executors?

 Specifically, one of the following two should fail in the shell:
  new mypackage.MyClass()
  sc.parallelize(0 until 10, 2).foreach(_ = new mypackage.MyClass())
 (or just import it)

 You could also try running MASTER=local-cluster[2,1,512] which launches 2
 executors, 1 core each, with 512MB in a setup that mimics a real cluster
 more closely, in case it's a bug only related to using local mode.


 On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia buendia...@gmail.comwrote:




 On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson ilike...@gmail.comwrote:

 Additionally, which version of Spark are you running?


 0.8.1.

 Unfortunately, this doesn't work either:

 MASTER=local[2] ADD_JARS=/path/to/my/jar 
 SPARK_CLASSPATH=/path/to/my/jar./spark-shell




 On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson ilike...@gmail.comwrote:

 I am not an expert on these classpath issues, but if you're using local
 mode, you might also try to set SPARK_CLASSPATH to include the path to the
 jar file as well. This should not really help, since adding jars is the
 right way to get the jars to your executors (which is where the exception
 appears to be happening), but it would sure be interesting if it did.


 On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia buendia...@gmail.com
  wrote:

 I should add that I can see in the log that the jar being shipped to
 the workers:

 14/01/04 15:34:52 INFO Executor: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar with timestamp
 131979092
 14/01/04 15:34:52 INFO Utils: Fetching
 http://192.168.1.111:51031/jars/my.jar.jar to
 /var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/fetchFileTemp8322008964976744710.tmp
 14/01/04 15:34:53 INFO Executor: Adding
 file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw4gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
 to class loader


 On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia 
 buendia...@gmail.com wrote:

 Hi,

 I'm trying to access my stand alone spark app from spark-shell. I
 tried starting the shell by:

 MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

 The log shows that the jar file was loaded. Also, I can access and
 create a new instance of mypackage.MyClass.

 The problem is that myRDD.collect() returns RDD[MyClass], and that
 throws this exception:

 java.lang.ClassNotFoundException: mypackage.MyClass
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at
 java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
   at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
   at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
   at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at
 org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
   at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
   at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
   at
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
   at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)

 Does this mean that my jar was not