My guess is that Kryo specially handles Maps generically or relies on
some mechanism that does, and it happens to iterate over all
key/values as part of that and of course there aren't actually any
key/values in the map. The Java serialization is a much more literal
(expensive) field-by-field serialization which works here because
there's no special treatment. I think you could register a custom
serializer that handles this case. Or work around it in your client
code. I know there have been other issues with Kryo and Map because,
for example, sometimes a Map in an application is actually some
non-serializable wrapper view.

On Wed, Sep 28, 2016 at 3:18 AM, Maciej Szymkiewicz
<mszymkiew...@gmail.com> wrote:
> Hi everyone,
>
> I suspect there is no point in submitting a JIRA to fix this (not a Spark
> issue?) but I would like to know if this problem is documented anywhere.
> Somehow Kryo is loosing default value during serialization:
>
> scala> import org.apache.spark.{SparkContext, SparkConf}
> import org.apache.spark.{SparkContext, SparkConf}
>
> scala> val aMap = Map[String, Long]().withDefaultValue(0L)
> aMap: scala.collection.immutable.Map[String,Long] = Map()
>
> scala> aMap("a")
> res6: Long = 0
>
> scala> val sc = new SparkContext(new
> SparkConf().setAppName("bar").set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer"))
>
> scala> sc.parallelize(Seq(aMap)).map(_("a")).first
> 16/09/28 09:13:47 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
> java.util.NoSuchElementException: key not found: a
>
> while Java serializer works just fine:
>
> scala> val sc = new SparkContext(new
> SparkConf().setAppName("bar").set("spark.serializer",
> "org.apache.spark.serializer.JavaSerializer"))
>
> scala> sc.parallelize(Seq(aMap)).map(_("a")).first
> res9: Long = 0
>
> --
> Best regards,
> Maciej

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to