[ https://issues.apache.org/jira/browse/SPARK-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761109#comment-15761109 ]
Seth Bromberger commented on SPARK-6062: ---------------------------------------- This appears to be a known bug in scala's hashmaps: See https://issues.scala-lang.org/browse/SI-9895 and https://issues.apache.org/jira/browse/SPARK-18916. > HashMap.merged - Null Pointer Exception > --------------------------------------- > > Key: SPARK-6062 > URL: https://issues.apache.org/jira/browse/SPARK-6062 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.1 > Reporter: John Sandiford > > Hi, I am getting an error with the following, > import scala.collection.immutable.HashMap > case class Test(a: String, b: String) > val data = sc.parallelize(Seq( > HashMap(Test("A", "B") -> 2.0), > HashMap(Test("A", "B") -> 4.0) > )) > println(data.reduce((a, b) => a.merged(b)((a, b) => (a._1, a._2 + b._2)))) > The merge function is being passed null values for a and b. > If I print out a and b within the merged function, it starts working. > The workaround I am using is > data.flatMap(m => m.iterator).reduceByKey(_ + _).collectAsMap() > Error message: > Exception in thread "main" org.apache.spark.SparkDriverExecutionException: > Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:997) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1417) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException > at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22) > at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22) > at > scala.collection.immutable.HashMap$$anon$2$$anon$3.apply(HashMap.scala:139) > at > scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:206) > at scala.collection.immutable.HashMap$HashMap1.merge0(HashMap.scala:228) > at scala.collection.immutable.HashMap.merged(HashMap.scala:106) > at Main$$anonfun$1.apply(Main.scala:22) > at Main$$anonfun$1.apply(Main.scala:22) > at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:879) > at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:876) > at > org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:993) > ... 12 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org