For Kafka, maybe something that tells you if all committed data is actually
loaded, & what offset has been committed up to? Would there by any problems
caused by the fact that only the most recent commit is saved in the DB?

Is this feature connected at all to an ask I have heard from a few people:
that there be an option to fail a query (or at least include a special
response header) if some segments in the interval are unavailable? (Which,
currently, the broker can't know since it doesn't know details about all
available segments.)

Btw, at your site do you have any plans to migrate to Kafka indexing?

On Wed, Jan 2, 2019 at 5:37 PM Charles Allen <charles.al...@snap.com.invalid>
wrote:

> Hi all!
>
> https://github.com/apache/incubator-druid/pull/6799
>
> A contribution is up that includes a neat feature we have been using
> internally called Watermarks. Basically when operating a large scale and
> multi-tenant system, it is handy to be able to monitor how 'well behaved'
> the data is with regard to history. This is commonly used to spot holes in
> data, and to help give hints to data consumers in a lambda environment on
> when data has been run through a thorough check (batch job) vs a best
> effort sketch of the results which may or may not handle late data well
> (streaming intake).
>
> Unfortunately i'm not really sure what meta-data would be handy to have for
> the kafka indexing service, so I'd love input there as well if anyone knows
> of any "watermarks" that would make sense for it.
>
> Since the extension was written to be a stand alone service, it can remain
> as an extension forever if desired. An alternative I would like to propose
> is that the primitives for the watermark feature be added to core druid,
> and the extension points be added to their respective places (mysql
> extension and google extension to name two explicitly).
>
> Let me know what you think!
> Charles Allen
>

Reply via email to