Hi,

I initially send this on the user mailing list, however I didn't get any
response.
I figured this could be a bug so it might of more concern to the dev-list.

I recently switched to using kryo serialization and I've been running into
errors
with the mutable.LinkedHashMap class.

If I don't register the mutable.LinkedHashMap class then I get an
ArrayStoreException seen below.
If I do register the class, then when the LinkedHashMap is collected on the
driver, it does not contain any elements.

Here is the snippet of code I used :

val sc = new SparkContext(new SparkConf()
  .setMaster("local[*]")
  .setAppName("Sample")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .registerKryoClasses(Array(classOf[mutable.LinkedHashMap[String, String]])))

val collect = sc.parallelize(0 to 10)
  .map(p => new mutable.LinkedHashMap[String, String]() ++=
Array(("hello", "bonjour"), ("good", "bueno")))

val mapSideSizes = collect.map(p => p.size).collect()(0)
val driverSideSizes = collect.collect()(0).size

println("The sizes before collect : " + mapSideSizes)
println("The sizes after collect : " + driverSideSizes)


** The following only occurs if I did not register the
mutable.LinkedHashMap class **
16/08/20 18:10:38 ERROR TaskResultGetter: Exception while getting task
result
java.lang.ArrayStoreException: scala.collection.mutable.HashMap
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$
ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$
ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at org.apache.spark.serializer.KryoSerializerInstance.
deserialize(KryoSerializer.scala:311)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$
anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(
TaskResultGetter.scala:51)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(
TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(
TaskResultGetter.scala:50)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I hope this is a known issue and/or I'm missing something important in my
setup.
Appreciate any help or advice!

As a bit of background this was encountered in the SciSpark project being
developed at NASA JPL.
The mutable.LinkedHashMap is necessary as it enables us to deal with Netcdf
attributes in the order they appear in the original Netcdf files.
The test case I posted above was just to show the error I'm seeing more
clearly.
Our actual use case is slightly different, but we see the same result
(empty HashMaps)..

Rahul Palamuttam

Reply via email to