[Structured Streaming]Usage of watermark

2017-08-31 Thread KevinZwx
Hi, I'm a little confused about the usage of watermark in SS. According to the guideline, when we use a window-based grouping, SS will automatically handle the late event and we should use watermark to limit the state like this(specify a watermark before groupBy): val words = ... // streaming

[Structured Streaming]Data processing and output trigger should be decoupled

2017-08-30 Thread KevinZwx
Hi, I'm working with structured streaming, and I'm wondering whether there should be some improvements about trigger. Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime("10 minutes")), the engine will begin processing data at the time the trigger begins, like 10:00:00,

Different watermark for different kafka partitions in Structured Streaming

2017-08-30 Thread KevinZwx
Hi, I'm working with Structured Streaming to process logs from kafka and use watermark to handle late events. Currently the watermark is computed by (max event time seen by the engine - late threshold), and the same watermark is used for all partitions. But in production environment it happens