[jira] [Commented] (SPARK-26980) Kryo deserialization not working with KryoSerializable class
[ https://issues.apache.org/jira/browse/SPARK-26980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782082#comment-16782082 ] Alexis Sarda-Espinosa commented on SPARK-26980: --- Kryo works fine if used directly ([example here|https://github.com/asardaes/hello-spark-kotlin/blob/master/src/test/kotlin/hello/spark/kotlin/MainTest.kt#L21]), it just breaks when the data goes through Spark. > Kryo deserialization not working with KryoSerializable class > > > Key: SPARK-26980 > URL: https://issues.apache.org/jira/browse/SPARK-26980 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 > Environment: Local Spark v2.4.0 > Kotlin v1.3.21 >Reporter: Alexis Sarda-Espinosa >Priority: Minor > Labels: kryo, serialization > > I'm trying to create an {{Aggregator}} that uses a custom container that > should be serialized with {{Kryo:}} > {code:java} > class StringSet(other: Collection) : HashSet(other), > KryoSerializable { > companion object { > @JvmStatic > private val serialVersionUID = 1L > } > constructor() : this(Collections.emptyList()) > override fun write(kryo: Kryo, output: Output) { > output.writeInt(this.size) > for (string in this) { > output.writeString(string) > } > } > override fun read(kryo: Kryo, input: Input) { > val size = input.readInt() > repeat(size) { this.add(input.readString()) } > } > } > {code} > However, if I look at the corresponding value in the {{Row}} after > aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. > Interestingly, the first byte in that array seems to be some sort of noise, > and I can deserialize by doing something like this: > {code:java} > val b = row.getAs(2) > val input = Input(b.copyOfRange(1, b.size)) // extra byte? > val set = Kryo().readObject(input, StringSet::class.java) > {code} > Used configuration: > {code:java} > SparkConf() > .setAppName("Hello Spark with Kotlin") > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > .set("spark.kryo.registrationRequired", "true") > .registerKryoClasses(arrayOf(StringSet::class.java)) > {code} > [Sample repo with all the > code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26980) Kryo deserialization not working with KryoSerializable class
[ https://issues.apache.org/jira/browse/SPARK-26980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Sarda-Espinosa updated SPARK-26980: -- Description: I'm trying to create an {{Aggregator}} that uses a custom container that should be serialized with {{Kryo:}} {code:java} class StringSet(other: Collection) : HashSet(other), KryoSerializable { companion object { @JvmStatic private val serialVersionUID = 1L } constructor() : this(Collections.emptyList()) override fun write(kryo: Kryo, output: Output) { output.writeInt(this.size) for (string in this) { output.writeString(string) } } override fun read(kryo: Kryo, input: Input) { val size = input.readInt() repeat(size) { this.add(input.readString()) } } } {code} However, if I look at the corresponding value in the {{Row}} after aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, the first byte in that array seems to be some sort of noise, and I can deserialize by doing something like this: {code:java} val b = row.getAs(2) val input = Input(b.copyOfRange(1, b.size)) // extra byte? val set = Kryo().readObject(input, StringSet::class.java) {code} Used configuration: {code:java} SparkConf() .setAppName("Hello Spark with Kotlin") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrationRequired", "true") .registerKryoClasses(arrayOf(StringSet::class.java)) {code} [Sample repo with all the code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c]. was: I'm trying to create an {{Aggregator}} that uses a custom container that should be serialized with {{Kryo}}: {code:java} class StringSet(other: Collection) : HashSet(other), KryoSerializable { companion object { @JvmStatic private val serialVersionUID = 1L } constructor() : this(Collections.emptyList()) override fun write(kryo: Kryo, output: Output) { output.writeInt(this.size) for (string in this) { output.writeString(string) } } override fun read(kryo: Kryo, input: Input) { val size = input.readInt() repeat(size) { this.add(input.readString()) } } } {code} However, if I look at the corresponding value in the {{Row}} after aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, the first byte in that array seems to be some sort of noise, and I can deserialize by doing something like this: {code:java} val b = row.getAs(2) val input = Input(b.copyOfRange(1, b.size)) // extra byte? val set = Kryo().readObject(input, StringSet::class.java) {code} Used configuration: {code:java} SparkConf() .setAppName("Hello Spark with Kotlin") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrationRequired", "true") .registerKryoClasses(arrayOf(StringSet::class.java)) {code} [Sample repo with all the code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c]. > Kryo deserialization not working with KryoSerializable class > > > Key: SPARK-26980 > URL: https://issues.apache.org/jira/browse/SPARK-26980 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 > Environment: Local Spark v2.4.0 > Kotlin v1.3.21 >Reporter: Alexis Sarda-Espinosa >Priority: Minor > Labels: kryo, serialization > > I'm trying to create an {{Aggregator}} that uses a custom container that > should be serialized with {{Kryo:}} > {code:java} > class StringSet(other: Collection) : HashSet(other), > KryoSerializable { > companion object { > @JvmStatic > private val serialVersionUID = 1L > } > constructor() : this(Collections.emptyList()) > override fun write(kryo: Kryo, output: Output) { > output.writeInt(this.size) > for (string in this) { > output.writeString(string) > } > } > override fun read(kryo: Kryo, input: Input) { > val size = input.readInt() > repeat(size) { this.add(input.readString()) } > } > } > {code} > However, if I look at the corresponding value in the {{Row}} after > aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. > Interestingly, the first byte in that array seems to be some sort of noise, > and I can deserialize by doing something like this: > {code:java} > val b = row.getAs(2) > val input = Input(b.copyOfRange(1, b.size)) // extra byte? > val set = Kryo().readObject(input, StringSet::class.java) > {code} > Used configuration: > {code:java} > S
[jira] [Created] (SPARK-26980) Kryo deserialization not working with KryoSerializable class
Alexis Sarda-Espinosa created SPARK-26980: - Summary: Kryo deserialization not working with KryoSerializable class Key: SPARK-26980 URL: https://issues.apache.org/jira/browse/SPARK-26980 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Environment: Local Spark v2.4.0 Kotlin v1.3.21 Reporter: Alexis Sarda-Espinosa I'm trying to create an {{Aggregator}} that uses a custom container that should be serialized with {{Kryo}}: {code:java} class StringSet(other: Collection) : HashSet(other), KryoSerializable { companion object { @JvmStatic private val serialVersionUID = 1L } constructor() : this(Collections.emptyList()) override fun write(kryo: Kryo, output: Output) { output.writeInt(this.size) for (string in this) { output.writeString(string) } } override fun read(kryo: Kryo, input: Input) { val size = input.readInt() repeat(size) { this.add(input.readString()) } } } {code} However, if I look at the corresponding value in the {{Row}} after aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, the first byte in that array seems to be some sort of noise, and I can deserialize by doing something like this: {code:java} val b = row.getAs(2) val input = Input(b.copyOfRange(1, b.size)) // extra byte? val set = Kryo().readObject(input, StringSet::class.java) {code} Used configuration: {code:java} SparkConf() .setAppName("Hello Spark with Kotlin") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrationRequired", "true") .registerKryoClasses(arrayOf(StringSet::class.java)) {code} [Sample repo with all the code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org