Glad that it worked out! It's unfortunate that there exist such pitfalls.
And there is no easy way to get around it.
If you can, let us know how your experience with mapGroupsWithState has
been.
TD
On Fri, Jun 8, 2018 at 1:49 PM, frankdede
wrote:
> You are exactly right! A few hours ago, I trie
You are exactly right! A few hours ago, I tried many things and finally got
the example working by defining event timestamp column before groupByKey,
just like what you suggested, but I wasn't able to figure out the reasoning
behind my fix.
val sessionUpdates = events
.withWatermark("tim
Try to define the watermark on the right column immediately before calling
`groupByKey(...).mapGroupsWithState(...)`. You are applying the watermark
and then doing a bunch of opaque transformation (user-defined flatMap that
the planner has no visibility into). This prevents the planner from
propaga
I was trying to find a way to resessionize features in different events based
on the event timestamps using Spark and I found a code example that uses
mapGroupsWithStateto resessionize events using processing timestamps in
their repo.
https://github.com/apache/spark/blob/v2.3.0/examples/src/main/s