Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

Guozhang Wang Fri, 21 Oct 2016 13:48:43 -0700

Great to know! Thanks Greg.

Please keep us posted with any new finding you have.



Guozhang

On Fri, Oct 21, 2016 at 12:35 PM, Greg Fodor <gfo...@gmail.com> wrote:

> I managed to track down one case where we were seeing issues with missing
> data when transitioning to a new node to being a retention policy on the
> topic. There is an additional case but have not been able to repro at this
> time. We recently fixed a problem where we were failing to properly
> gracefully shut down our jobs in certain cases so there's a chance that
> might be related. Anyhow, now that I have a better understanding of things
> I will be able to investigate if we experience missing keys in the future,
> thanks!
>
> On Oct 20, 2016 2:08 PM, "Greg Fodor (JIRA)" <j...@apache.org> wrote:
>
>
>     [ https://issues.apache.org/jira/browse/KAFKA-4113?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=15593018#comment-15593018 ]
>
> Greg Fodor commented on KAFKA-4113:
> -----------------------------------
>
> Oh, so it should be doing exactly what makes sense to me -- I am on 0.10.0.
> Let me verify that there isn't something else going on! Thanks for the
> info.
>
> > Allow KTable bootstrap
> > ----------------------
> >
> >                 Key: KAFKA-4113
> >                 URL: https://issues.apache.org/jira/browse/KAFKA-4113
> >             Project: Kafka
> >          Issue Type: Sub-task
> >          Components: streams
> >            Reporter: Matthias J. Sax
> >            Assignee: Guozhang Wang
> >
> > On the mailing list, there are multiple request about the possibility to
> "fully populate" a KTable before actual stream processing start.
> > Even if it is somewhat difficult to define, when the initial populating
> phase should end, there are multiple possibilities:
> > The main idea is, that there is a rarely updated topic that contains the
> data. Only after this topic got read completely and the KTable is ready,
> the application should start processing. This would indicate, that on
> startup, the current partition sizes must be fetched and stored, and after
> KTable got populated up to those offsets, stream processing can start.
> > Other discussed ideas are:
> > 1) an initial fixed time period for populating
> > (it might be hard for a user to estimate the correct value)
> > 2) an "idle" period, ie, if no update to a KTable for a certain time is
> > done, we consider it as populated
> > 3) a timestamp cut off point, ie, all records with an older timestamp
> > belong to the initial populating phase
> > The API change is not decided yet, and the API desing is part of this
> JIRA.
> > One suggestion (for option (4)) was:
> > {noformat}
> > KTable table = builder.table("topic", 1000); // populate the table
> without reading any other topics until see one record with timestamp 1000.
> > {noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
-- Guozhang

Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

Reply via email to