Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Bryan Jeffrey
ache/spark/sql/types/SQLUserDefinedType.java>` >>>> on the class definition. You dont seem to have done it, maybe thats the >>>> reason? >>>> >>>> I would debug by printing the values in the serialize/deserialize >>>> methods, and then passing i

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Jungtaek Lim
> >>> On Fri, Feb 28, 2020 at 2:45 PM Bryan Jeffrey >>> wrote: >>> >>>> Tathagata, >>>> >>>> The difference is more than hours off. In this instance it's different >>>> by 4 years. In other instances it's different by ten

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Jungtaek Lim
ations). >>> >>> We've considered moving to storage as longs, but this makes code much >>> less readable and harder to maintain. The udt serialization bug also causes >>> issues outside of stateful streaming, as when executing a simple group by. >>> >

Fwd: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
g a simple group by. >> >> Regards, >> >> Bryan Jeffrey >> >> Get Outlook for Android <https://aka.ms/ghei36> >> >> -- >> *From:* Tathagata Das >> *Sent:* Friday, February 28, 2020 4:56:07 PM >> *To:*

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
ail.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Structured Streaming: mapGroupsWithState UDT serialization does not work You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Also, if you can encode

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Tathagata Das
ng a simple group by. > > Regards, > > Bryan Jeffrey > > Get Outlook for Android <https://aka.ms/ghei36> > > -- > *From:* Tathagata Das > *Sent:* Friday, February 28, 2020 4:56:07 PM > *To:* Bryan Jeffrey > *Cc:* user > *Subject:*

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
ey Cc: user Subject: Re: Structured Streaming: mapGroupsWithState UDT serialization does not work You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Also, if you can encode it using just long, then I recommend just

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Tathagata Das
You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Also, if you can encode it using just long, then I recommend just saving the value as long and eliminating some of the serialization overheads. Spark will probably

Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
Hello. I'm running Scala 2.11 w/ Spark 2.3.0. I've encountered a problem with mapGroupsWithState, and was wondering if anyone had insight. We use Joda time in a number of data structures, and so we've generated a custom serializer for Joda. This works well in most dataset/dataframe structured