Re: NullPointerExceptions when using val or broadcast on a standalone cluster.

2014-06-17 Thread bdamos
Hi, I think this is a bug in Spark, because changing my program to using
a main method instead of using the App trait fixes this problem.
I've filed this as SPARK-2175, apologies if this turns out to be a
duplicate.

https://issues.apache.org/jira/browse/SPARK-2175

Regards,
Brandon.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerExceptions-when-using-val-or-broadcast-on-a-standalone-cluster-tp7524p7797.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: NullPointerExceptions when using val or broadcast on a standalone cluster.

2014-06-12 Thread Gerard Maas
That stack trace is quite similar to the one that is generated when trying
to do a collect within a closure.  In this case, it feels "wrong" to
collect in a closure, but I wonder what's reason behind the NPE.
Curious to know whether they are related.

Here's a very simple example:
rrd1.flatMap(x=> rrd2.collect.flatMap(y=> List(y,x)))
res7: org.apache.spark.rdd.RDD[Int] = FlatMappedRDD[10] at flatMap at
:17

scala> res7.collect
14/06/13 01:11:48 INFO SparkContext: Starting job: collect at :19
14/06/13 01:11:48 INFO DAGScheduler: Got job 2 (collect at :19)
with 3 output partitions (allowLocal=false)
14/06/13 01:11:48 INFO DAGScheduler: Final stage: Stage 4(collect at
:19)
14/06/13 01:11:48 INFO DAGScheduler: Parents of final stage: List()
14/06/13 01:11:48 INFO DAGScheduler: Missing parents: List()
14/06/13 01:11:48 INFO DAGScheduler: Submitting Stage 4 (FlatMappedRDD[10]
at flatMap at :17), which has no missing parents
14/06/13 01:11:48 INFO DAGScheduler: Submitting 3 missing tasks from Stage
4 (FlatMappedRDD[10] at flatMap at :17)
14/06/13 01:11:48 INFO TaskSchedulerImpl: Adding task set 4.0 with 3 tasks
14/06/13 01:11:48 INFO TaskSetManager: Starting task 4.0:0 as TID 16 on
executor localhost: localhost (PROCESS_LOCAL)
14/06/13 01:11:48 INFO TaskSetManager: Serialized task 4.0:0 as 1850 bytes
in 0 ms
14/06/13 01:11:48 INFO TaskSetManager: Starting task 4.0:1 as TID 17 on
executor localhost: localhost (PROCESS_LOCAL)
14/06/13 01:11:48 INFO TaskSetManager: Serialized task 4.0:1 as 1850 bytes
in 0 ms
14/06/13 01:11:48 INFO TaskSetManager: Starting task 4.0:2 as TID 18 on
executor localhost: localhost (PROCESS_LOCAL)
14/06/13 01:11:48 INFO TaskSetManager: Serialized task 4.0:2 as 1850 bytes
in 0 ms
14/06/13 01:11:48 INFO Executor: Running task ID 16
14/06/13 01:11:48 INFO Executor: Running task ID 17
14/06/13 01:11:48 INFO Executor: Running task ID 18
14/06/13 01:11:48 ERROR Executor: Exception in task ID 18
java.lang.NullPointerException
at org.apache.spark.rdd.RDD.collect(RDD.scala:728)
at $line45.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:17)
 at $line45.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:17)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:728)
 at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:728)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1079)
 at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1079)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
14/06/13 01:11:48 ERROR Executor: Exception in task ID 16
... same for each partition.

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1037)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1021)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1019)
 at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1019)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:637)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:637)
at scala.Option.foreach(Option.scala:236)
 at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:637)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1211)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispa