Kafka supports server-side and client-side timestamps since version 0.10.1.
KafkaIO in Beam can provide much better watermark, especially for topics
with server-side timestamps. The default implementation currently just uses
processing time for event time and watermark, which is not very useful.

Wrote a short doc
<https://docs.google.com/document/d/1DyWcLJpALRoUfvYUbiPCDVikYb_Xz2X7Co2aDUVVd4I/edit?usp=sharing>[1]
about the proposal. Your feedback is welcome. I am planning to work on it,
and don't mind guiding if anyone else is interested (it is fairly
accessible for newcomers) .

TL;DR :
   *server-side timestamp* : It monotonically increases within a Kafka
partition. We can provide near perfect watermark : min(timestamp of latest
record consumed on a partition).
    *client-side / custom timestamp* : Watermark is min(timestamp over last
few seconds) similar to PubsubIO in Beam. This is not great, but we will
let user provide tighter bounds or provide entirely own implementation.

Thanks,
Raghu.

[1]
https://docs.google.com/document/d/1DyWcLJpALRoUfvYUbiPCDVikYb_Xz2X7Co2aDUVVd4I/edit?usp=sharing

Reply via email to