Re: Map with state keys serialization
That fixed it!. I still had the serializer registered as a workaround for SPARK-12591. Thanks so much for your help Ryan! -Joey On Wed, Oct 12, 2016 at 2:16 PM, Shixiong(Ryan) Zhuwrote: > Oh, OpenHashMapBasedStateMap is serialized using Kryo's > "com.esotericsoftware.kryo.serializers.JavaSerializer". Did you set it for > OpenHashMapBasedStateMap? You don't need to set anything for Spark's classes > in 1.6.2. > > > On Wed, Oct 12, 2016 at 7:11 AM, Joey Echeverria wrote: >> >> I tried with 1.6.2 and saw the same behavior. >> >> -Joey >> >> On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu >> wrote: >> > There are some known issues in 1.6.0, e.g., >> > https://issues.apache.org/jira/browse/SPARK-12591 >> > >> > Could you try 1.6.1? >> > >> > On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria >> > wrote: >> >> >> >> I tried wrapping my Tuple class (which is generated by Avro) in a >> >> class that implements Serializable, but now I'm getting a >> >> ClassNotFoundException in my Spark application. The exception is >> >> thrown while trying to deserialize checkpoint state: >> >> >> >> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf >> >> >> >> I set some flags[1] on the JVM and I can see the class get loaded in >> >> the >> >> logs. >> >> >> >> Does anyone have any suggestions/recommendations for debugging class >> >> loading issues during checkpoint deserialization? >> >> >> >> I also looked into switching to byte[] for the state keys, but byte[] >> >> doesn't implement value-based equals() or hashCode(). I can't use >> >> ByteBuffer because it doesn't implement Serializable. Spark has a >> >> SerializableBuffer class that wraps ByteBuffer, but it also doesn't >> >> have value-based equals() or hashCode(). >> >> >> >> -Joey >> >> >> >> [1] -verbose:class -Dsun.misc.URLClassPath.debug >> >> >> >> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria >> >> wrote: >> >> > I do, I get the stack trace in this gist: >> >> > >> >> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1 >> >> > >> >> > The class it references, com.rocana.data.Tuple, is registered with >> >> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed >> >> > in a later release let me know. >> >> > >> >> > -Joey >> >> > >> >> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu >> >> > wrote: >> >> >> That's enough. Did you see any error? >> >> >> >> >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria >> >> >> wrote: >> >> >>> >> >> >>> Hi Ryan! >> >> >>> >> >> >>> Do you know where I need to configure Kryo for this? I already have >> >> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my >> >> >>> SparkConf and I registered the class. Is there a different >> >> >>> configuration setting for the state map keys? >> >> >>> >> >> >>> Thanks! >> >> >>> >> >> >>> -Joey >> >> >>> >> >> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu >> >> >>> wrote: >> >> >>> > You can use Kryo. It also implements KryoSerializable which is >> >> >>> > supported >> >> >>> > by >> >> >>> > Kryo. >> >> >>> > >> >> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria >> >> >>> > >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Looking at the source code for StateMap[1], which is used by >> >> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are >> >> >>> >> serialized using an ObjectOutputStream. I couldn't find a >> >> >>> >> reference >> >> >>> >> to >> >> >>> >> this restriction in the documentation. Did I miss that? >> >> >>> >> >> >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo >> >> >>> >> for >> >> >>> >> this serialization? >> >> >>> >> >> >> >>> >> Thanks! >> >> >>> >> >> >> >>> >> -Joey >> >> >>> >> >> >> >>> >> [1] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> - >> >> >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >>> >> >> >> >>> > >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> -Joey >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > -Joey >> >> >> >> >> >> >> >> -- >> >> -Joey >> > >> > >> >> >> >> -- >> -Joey > > -- -Joey - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Map with state keys serialization
Oh, OpenHashMapBasedStateMap is serialized using Kryo's "com.esotericsoftware.kryo.serializers.JavaSerializer". Did you set it for OpenHashMapBasedStateMap? You don't need to set anything for Spark's classes in 1.6.2. On Wed, Oct 12, 2016 at 7:11 AM, Joey Echeverriawrote: > I tried with 1.6.2 and saw the same behavior. > > -Joey > > On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu > wrote: > > There are some known issues in 1.6.0, e.g., > > https://issues.apache.org/jira/browse/SPARK-12591 > > > > Could you try 1.6.1? > > > > On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria > wrote: > >> > >> I tried wrapping my Tuple class (which is generated by Avro) in a > >> class that implements Serializable, but now I'm getting a > >> ClassNotFoundException in my Spark application. The exception is > >> thrown while trying to deserialize checkpoint state: > >> > >> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf > >> > >> I set some flags[1] on the JVM and I can see the class get loaded in the > >> logs. > >> > >> Does anyone have any suggestions/recommendations for debugging class > >> loading issues during checkpoint deserialization? > >> > >> I also looked into switching to byte[] for the state keys, but byte[] > >> doesn't implement value-based equals() or hashCode(). I can't use > >> ByteBuffer because it doesn't implement Serializable. Spark has a > >> SerializableBuffer class that wraps ByteBuffer, but it also doesn't > >> have value-based equals() or hashCode(). > >> > >> -Joey > >> > >> [1] -verbose:class -Dsun.misc.URLClassPath.debug > >> > >> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria > wrote: > >> > I do, I get the stack trace in this gist: > >> > > >> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1 > >> > > >> > The class it references, com.rocana.data.Tuple, is registered with > >> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed > >> > in a later release let me know. > >> > > >> > -Joey > >> > > >> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu > >> > wrote: > >> >> That's enough. Did you see any error? > >> >> > >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria > >> >> wrote: > >> >>> > >> >>> Hi Ryan! > >> >>> > >> >>> Do you know where I need to configure Kryo for this? I already have > >> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my > >> >>> SparkConf and I registered the class. Is there a different > >> >>> configuration setting for the state map keys? > >> >>> > >> >>> Thanks! > >> >>> > >> >>> -Joey > >> >>> > >> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu > >> >>> wrote: > >> >>> > You can use Kryo. It also implements KryoSerializable which is > >> >>> > supported > >> >>> > by > >> >>> > Kryo. > >> >>> > > >> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria > > >> >>> > wrote: > >> >>> >> > >> >>> >> Looking at the source code for StateMap[1], which is used by > >> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are > >> >>> >> serialized using an ObjectOutputStream. I couldn't find a > reference > >> >>> >> to > >> >>> >> this restriction in the documentation. Did I miss that? > >> >>> >> > >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo > for > >> >>> >> this serialization? > >> >>> >> > >> >>> >> Thanks! > >> >>> >> > >> >>> >> -Joey > >> >>> >> > >> >>> >> [1] > >> >>> >> > >> >>> >> > >> >>> >> https://github.com/apache/spark/blob/master/streaming/ > src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 > >> >>> >> > >> >>> >> > >> >>> >> > - > >> >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> >>> >> > >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> -Joey > >> >> > >> >> > >> > > >> > > >> > > >> > -- > >> > -Joey > >> > >> > >> > >> -- > >> -Joey > > > > > > > > -- > -Joey >
Re: Map with state keys serialization
I tried with 1.6.2 and saw the same behavior. -Joey On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhuwrote: > There are some known issues in 1.6.0, e.g., > https://issues.apache.org/jira/browse/SPARK-12591 > > Could you try 1.6.1? > > On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria wrote: >> >> I tried wrapping my Tuple class (which is generated by Avro) in a >> class that implements Serializable, but now I'm getting a >> ClassNotFoundException in my Spark application. The exception is >> thrown while trying to deserialize checkpoint state: >> >> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf >> >> I set some flags[1] on the JVM and I can see the class get loaded in the >> logs. >> >> Does anyone have any suggestions/recommendations for debugging class >> loading issues during checkpoint deserialization? >> >> I also looked into switching to byte[] for the state keys, but byte[] >> doesn't implement value-based equals() or hashCode(). I can't use >> ByteBuffer because it doesn't implement Serializable. Spark has a >> SerializableBuffer class that wraps ByteBuffer, but it also doesn't >> have value-based equals() or hashCode(). >> >> -Joey >> >> [1] -verbose:class -Dsun.misc.URLClassPath.debug >> >> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria wrote: >> > I do, I get the stack trace in this gist: >> > >> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1 >> > >> > The class it references, com.rocana.data.Tuple, is registered with >> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed >> > in a later release let me know. >> > >> > -Joey >> > >> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu >> > wrote: >> >> That's enough. Did you see any error? >> >> >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria >> >> wrote: >> >>> >> >>> Hi Ryan! >> >>> >> >>> Do you know where I need to configure Kryo for this? I already have >> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my >> >>> SparkConf and I registered the class. Is there a different >> >>> configuration setting for the state map keys? >> >>> >> >>> Thanks! >> >>> >> >>> -Joey >> >>> >> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu >> >>> wrote: >> >>> > You can use Kryo. It also implements KryoSerializable which is >> >>> > supported >> >>> > by >> >>> > Kryo. >> >>> > >> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria >> >>> > wrote: >> >>> >> >> >>> >> Looking at the source code for StateMap[1], which is used by >> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are >> >>> >> serialized using an ObjectOutputStream. I couldn't find a reference >> >>> >> to >> >>> >> this restriction in the documentation. Did I miss that? >> >>> >> >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for >> >>> >> this serialization? >> >>> >> >> >>> >> Thanks! >> >>> >> >> >>> >> -Joey >> >>> >> >> >>> >> [1] >> >>> >> >> >>> >> >> >>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 >> >>> >> >> >>> >> >> >>> >> - >> >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>> >> >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> -Joey >> >> >> >> >> > >> > >> > >> > -- >> > -Joey >> >> >> >> -- >> -Joey > > -- -Joey - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Map with state keys serialization
There are some known issues in 1.6.0, e.g., https://issues.apache.org/jira/browse/SPARK-12591 Could you try 1.6.1? On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverriawrote: > I tried wrapping my Tuple class (which is generated by Avro) in a > class that implements Serializable, but now I'm getting a > ClassNotFoundException in my Spark application. The exception is > thrown while trying to deserialize checkpoint state: > > https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf > > I set some flags[1] on the JVM and I can see the class get loaded in the > logs. > > Does anyone have any suggestions/recommendations for debugging class > loading issues during checkpoint deserialization? > > I also looked into switching to byte[] for the state keys, but byte[] > doesn't implement value-based equals() or hashCode(). I can't use > ByteBuffer because it doesn't implement Serializable. Spark has a > SerializableBuffer class that wraps ByteBuffer, but it also doesn't > have value-based equals() or hashCode(). > > -Joey > > [1] -verbose:class -Dsun.misc.URLClassPath.debug > > On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria wrote: > > I do, I get the stack trace in this gist: > > > > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1 > > > > The class it references, com.rocana.data.Tuple, is registered with > > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed > > in a later release let me know. > > > > -Joey > > > > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu > > wrote: > >> That's enough. Did you see any error? > >> > >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria > wrote: > >>> > >>> Hi Ryan! > >>> > >>> Do you know where I need to configure Kryo for this? I already have > >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my > >>> SparkConf and I registered the class. Is there a different > >>> configuration setting for the state map keys? > >>> > >>> Thanks! > >>> > >>> -Joey > >>> > >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu > >>> wrote: > >>> > You can use Kryo. It also implements KryoSerializable which is > supported > >>> > by > >>> > Kryo. > >>> > > >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria > >>> > wrote: > >>> >> > >>> >> Looking at the source code for StateMap[1], which is used by > >>> >> JavaPairDStream#mapWithState(), it looks like state keys are > >>> >> serialized using an ObjectOutputStream. I couldn't find a reference > to > >>> >> this restriction in the documentation. Did I miss that? > >>> >> > >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for > >>> >> this serialization? > >>> >> > >>> >> Thanks! > >>> >> > >>> >> -Joey > >>> >> > >>> >> [1] > >>> >> > >>> >> https://github.com/apache/spark/blob/master/streaming/ > src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 > >>> >> > >>> >> > - > >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >>> >> > >>> > > >>> > >>> > >>> > >>> -- > >>> -Joey > >> > >> > > > > > > > > -- > > -Joey > > > > -- > -Joey >
Re: Map with state keys serialization
I tried wrapping my Tuple class (which is generated by Avro) in a class that implements Serializable, but now I'm getting a ClassNotFoundException in my Spark application. The exception is thrown while trying to deserialize checkpoint state: https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf I set some flags[1] on the JVM and I can see the class get loaded in the logs. Does anyone have any suggestions/recommendations for debugging class loading issues during checkpoint deserialization? I also looked into switching to byte[] for the state keys, but byte[] doesn't implement value-based equals() or hashCode(). I can't use ByteBuffer because it doesn't implement Serializable. Spark has a SerializableBuffer class that wraps ByteBuffer, but it also doesn't have value-based equals() or hashCode(). -Joey [1] -verbose:class -Dsun.misc.URLClassPath.debug On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverriawrote: > I do, I get the stack trace in this gist: > > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1 > > The class it references, com.rocana.data.Tuple, is registered with > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed > in a later release let me know. > > -Joey > > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu > wrote: >> That's enough. Did you see any error? >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria wrote: >>> >>> Hi Ryan! >>> >>> Do you know where I need to configure Kryo for this? I already have >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my >>> SparkConf and I registered the class. Is there a different >>> configuration setting for the state map keys? >>> >>> Thanks! >>> >>> -Joey >>> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu >>> wrote: >>> > You can use Kryo. It also implements KryoSerializable which is supported >>> > by >>> > Kryo. >>> > >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria >>> > wrote: >>> >> >>> >> Looking at the source code for StateMap[1], which is used by >>> >> JavaPairDStream#mapWithState(), it looks like state keys are >>> >> serialized using an ObjectOutputStream. I couldn't find a reference to >>> >> this restriction in the documentation. Did I miss that? >>> >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for >>> >> this serialization? >>> >> >>> >> Thanks! >>> >> >>> >> -Joey >>> >> >>> >> [1] >>> >> >>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 >>> >> >>> >> - >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >> >>> > >>> >>> >>> >>> -- >>> -Joey >> >> > > > > -- > -Joey -- -Joey - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Map with state keys serialization
That's enough. Did you see any error? On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverriawrote: > Hi Ryan! > > Do you know where I need to configure Kryo for this? I already have > spark.serializer=org.apache.spark.serializer.KryoSerializer in my > SparkConf and I registered the class. Is there a different > configuration setting for the state map keys? > > Thanks! > > -Joey > > On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu > wrote: > > You can use Kryo. It also implements KryoSerializable which is supported > by > > Kryo. > > > > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria > wrote: > >> > >> Looking at the source code for StateMap[1], which is used by > >> JavaPairDStream#mapWithState(), it looks like state keys are > >> serialized using an ObjectOutputStream. I couldn't find a reference to > >> this restriction in the documentation. Did I miss that? > >> > >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for > >> this serialization? > >> > >> Thanks! > >> > >> -Joey > >> > >> [1] > >> https://github.com/apache/spark/blob/master/streaming/ > src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251 > >> > >> - > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > > > > > -- > -Joey >