[
https://issues.apache.org/jira/browse/SPARK-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761109#comment-15761109
]
Seth Bromberger commented on SPARK-6062:
----------------------------------------
This appears to be a known bug in scala's hashmaps: See
https://issues.scala-lang.org/browse/SI-9895 and
https://issues.apache.org/jira/browse/SPARK-18916.
> HashMap.merged - Null Pointer Exception
> ---------------------------------------
>
> Key: SPARK-6062
> URL: https://issues.apache.org/jira/browse/SPARK-6062
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.1
> Reporter: John Sandiford
>
> Hi, I am getting an error with the following,
> import scala.collection.immutable.HashMap
> case class Test(a: String, b: String)
> val data = sc.parallelize(Seq(
> HashMap(Test("A", "B") -> 2.0),
> HashMap(Test("A", "B") -> 4.0)
> ))
> println(data.reduce((a, b) => a.merged(b)((a, b) => (a._1, a._2 + b._2))))
> The merge function is being passed null values for a and b.
> If I print out a and b within the merged function, it starts working.
> The workaround I am using is
> data.flatMap(m => m.iterator).reduceByKey(_ + _).collectAsMap()
> Error message:
> Exception in thread "main" org.apache.spark.SparkDriverExecutionException:
> Execution error
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:997)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1417)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22)
> at Main$$anonfun$1$$anonfun$apply$1.apply(Main.scala:22)
> at
> scala.collection.immutable.HashMap$$anon$2$$anon$3.apply(HashMap.scala:139)
> at
> scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:206)
> at scala.collection.immutable.HashMap$HashMap1.merge0(HashMap.scala:228)
> at scala.collection.immutable.HashMap.merged(HashMap.scala:106)
> at Main$$anonfun$1.apply(Main.scala:22)
> at Main$$anonfun$1.apply(Main.scala:22)
> at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:879)
> at org.apache.spark.rdd.RDD$$anonfun$20.apply(RDD.scala:876)
> at
> org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:993)
> ... 12 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]