Hi all!

https://github.com/apache/incubator-druid/pull/6799

A contribution is up that includes a neat feature we have been using
internally called Watermarks. Basically when operating a large scale and
multi-tenant system, it is handy to be able to monitor how 'well behaved'
the data is with regard to history. This is commonly used to spot holes in
data, and to help give hints to data consumers in a lambda environment on
when data has been run through a thorough check (batch job) vs a best
effort sketch of the results which may or may not handle late data well
(streaming intake).

Unfortunately i'm not really sure what meta-data would be handy to have for
the kafka indexing service, so I'd love input there as well if anyone knows
of any "watermarks" that would make sense for it.

Since the extension was written to be a stand alone service, it can remain
as an extension forever if desired. An alternative I would like to propose
is that the primitives for the watermark feature be added to core druid,
and the extension points be added to their respective places (mysql
extension and google extension to name two explicitly).

Let me know what you think!
Charles Allen

Reply via email to