[ 
https://issues.apache.org/jira/browse/SPARK-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761109#comment-15761109
 ] 

Seth Bromberger commented on SPARK-6062:
----------------------------------------

This appears to be a known bug in scala's hashmaps: See 
https://issues.scala-lang.org/browse/SI-9895 and 
https://issues.apache.org/jira/browse/SPARK-18916.

> HashMap.merged - Null Pointer Exception
> ---------------------------------------
>
>                 Key: SPARK-6062
>                 URL: https://issues.apache.org/jira/browse/SPARK-6062
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.1
>            Reporter: John Sandiford
>
> Hi, I am getting an error with the following,
> import scala.collection.immutable.HashMap
>   case class Test(a: String, b: String)
>   val data = sc.parallelize(Seq(
>     HashMap(Test("A", "B") -> 2.0),
>     HashMap(Test("A", "B") -> 4.0)
>   ))
>   println(data.reduce((a, b) => a.merged(b)((a, b) => (a._1, a._2 + b._2))))
> The merge function is being passed null values for a and b.
> If I print out a and b within the merged function, it starts working.
> The workaround I am using is 
> data.flatMap(m => m.iterator).reduceByKey(_ + _).collectAsMap()
> Error message:
> Exception in thread "main" org.apache.spark.SparkDriverExecutionException: 
> Execution error
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:997)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1417)
>       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
>       at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22)
>       at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22)
>       at 
> scala.collection.immutable.HashMap$$anon$2$$anon$3.apply(HashMap.scala:139)
>       at 
> scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:206)
>       at scala.collection.immutable.HashMap$HashMap1.merge0(HashMap.scala:228)
>       at scala.collection.immutable.HashMap.merged(HashMap.scala:106)
>       at Main$$anonfun$1.apply(Main.scala:22)
>       at Main$$anonfun$1.apply(Main.scala:22)
>       at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:879)
>       at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:876)
>       at 
> org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:993)
>       ... 12 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to