Alexis Sarda-Espinosa created SPARK-26980:
---------------------------------------------

             Summary: Kryo deserialization not working with KryoSerializable 
class
                 Key: SPARK-26980
                 URL: https://issues.apache.org/jira/browse/SPARK-26980
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
         Environment: Local Spark v2.4.0

Kotlin v1.3.21
            Reporter: Alexis Sarda-Espinosa


 

I'm trying to create an {{Aggregator}} that uses a custom container that should 
be serialized with {{Kryo}}:

 
{code:java}
class StringSet(other: Collection<String>) : HashSet<String>(other), 
KryoSerializable {
    companion object {
        @JvmStatic
        private val serialVersionUID = 1L
    }

    constructor() : this(Collections.emptyList())

    override fun write(kryo: Kryo, output: Output) {
        output.writeInt(this.size)
        for (string in this) {
            output.writeString(string)
        }
    }

    override fun read(kryo: Kryo, input: Input) {
        val size = input.readInt()
        repeat(size) { this.add(input.readString()) }
    }
}
{code}
However, if I look at the corresponding value in the {{Row}} after aggregation 
(for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, 
the first byte in that array seems to be some sort of noise, and I can 
deserialize by doing something like this:

 

 
{code:java}
val b = row.getAs<ByteArray>(2)
val input = Input(b.copyOfRange(1, b.size)) // extra byte?
val set = Kryo().readObject(input, StringSet::class.java)
{code}
Used configuration:

 

 
{code:java}
SparkConf()
    .setAppName("Hello Spark with Kotlin")
    .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .set("spark.kryo.registrationRequired", "true")
    .registerKryoClasses(arrayOf(StringSet::class.java))
{code}
[Sample repo with all the 
code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c].

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to