Hi, I'm working with Structured Streaming to process logs from kafka and use watermark to handle late events. Currently the watermark is computed by (max event time seen by the engine - late threshold), and the same watermark is used for all partitions.
But in production environment it happens frequently that different partition is consumed at different speed, the consumption of some partitions may be left behind, so the newest event time in these partitions may be much smaller than than the others'. In this case using the same watermark for all partitions may cause heavy data loss. So is there any way to achieve different watermark for different kafka partition or any plan to work on this? -- Wanxin Zhang, Master candidate, National Lab for Parallel and Distributed Processing (PDL), School of Computer Science, National University of Defense Technology, Changsha, Hunan, China