Re: Spark can't identify the event time column being supplied to withWatermark()

2018-06-08 Thread Tathagata Das
Glad that it worked out! It's unfortunate that there exist such pitfalls. And there is no easy way to get around it. If you can, let us know how your experience with mapGroupsWithState has been. TD On Fri, Jun 8, 2018 at 1:49 PM, frankdede wrote: > You are exactly right! A few hours ago, I

Re: Spark can't identify the event time column being supplied to withWatermark()

2018-06-08 Thread frankdede
You are exactly right! A few hours ago, I tried many things and finally got the example working by defining event timestamp column before groupByKey, just like what you suggested, but I wasn't able to figure out the reasoning behind my fix. val sessionUpdates = events

Re: Spark can't identify the event time column being supplied to withWatermark()

2018-06-08 Thread Tathagata Das
Try to define the watermark on the right column immediately before calling `groupByKey(...).mapGroupsWithState(...)`. You are applying the watermark and then doing a bunch of opaque transformation (user-defined flatMap that the planner has no visibility into). This prevents the planner from

Spark can't identify the event time column being supplied to withWatermark()

2018-06-08 Thread frankdede
I was trying to find a way to resessionize features in different events based on the event timestamps using Spark and I found a code example that uses mapGroupsWithStateto resessionize events using processing timestamps in their repo.