1) I get an error when I set watermark to 0.
2) I set window and slide interval to 1 second with no watermark. It sill
aggregates messages from the previous batch that are in 1 second window.
so is it fair to say there is no declarative way to do stateless
aggregations?
On Thu, May 3, 2018 at
I think you need to group by a window (tumbling) and define watermarks (put a
very low watermark or even 0) to discard the state. Here the window duration
becomes your logical batch.
- Arun
From: kant kodali
Date: Thursday, May 3, 2018 at 1:52 AM
To: "user @spark"
After doing some more research using Google. It's clear that aggregations
by default are stateful in Structured Streaming. so the question now is how
to do stateless aggregations(not storing the result from previous batches)
using Structured Streaming 2.3.0? I am trying to do it using raw spark
Hi All,
I was under an assumption that one needs to run grouby(window(...)) to run
any stateful operations but looks like that is not the case since any
aggregation like query
"select count(*) from some_view" is also stateful since it stores the
result of the count from the previous batch.