@Rayn It's frequently observed in our production environment that different
partition's consumption rate vary for kinds of reasons, including
performance difference of machines holding the partitions, unevenly
distribution of messages and so on. So I hope there can be some advice on
how to design
I don't think ss now support "partitioned" watermark. and why different
partition's consumption rate vary? If the handling logic is quite
different, using different topic is a better way.
On Fri, Sep 1, 2017 at 4:59 PM, 张万新 wrote:
> Thanks, it's true that looser
Thanks, it's true that looser watermark can guarantee more data not be
dropped, but at the same time more state need to be kept. I just consider
if there is sth like kafka-partition-aware watermark in flink in SS may be
a better solution.
Tathagata Das 于2017年8月31日周四
Why not set the watermark to be looser, one that works across all
partitions? The main usage of watermark is to drop state. If you loosen the
watermark threshold (e.g. from 1 hour to 10 hours), then you will keep more
state with older data, but you are guaranteed that you will not drop
important
Hi,
I'm working with Structured Streaming to process logs from kafka and use
watermark to handle late events. Currently the watermark is computed by (max
event time seen by the engine - late threshold), and the same watermark is
used for all partitions.
But in production environment it happens
Hi,
I'm working with Structured Streaming to process logs from kafka and use
watermark to handle late events. Currently the watermark is computed by (max
event time seen by the engine - late threshold), and the same watermark is
used for all partitions.
But in production environment it happens
Hi,
I'm working with Structured Streaming to process logs from kafka and use
watermark to handle late events. Currently the watermark is computed by (max
event time seen by the engine - late threshold), and the same watermark is
used for all partitions.
But in production environment it happens