Hi Matthias,

Just to add one more meta comment: for consumer, if it gets a
TimeoutException polling records, would start timing all tasks since that
single consumer would affect all tasks? For other blocking calls like
`endOffsets()` etc, they are usually also issued on behalf of a batch of
tasks, so if that gets timeout exception should we start ticking all the
corresponding tasks as well? Maybe worth clarifying a bit more in the wiki.


Guozhang

On Mon, May 18, 2020 at 12:26 AM Bruno Cadonna <[email protected]> wrote:

> Hi Matthias,
>
> I am +1 (non-binding) on the KIP.
>
> Just one final remark: Wouldn't it be better to specify
> task.timeout.ms to -1 if no retry should be done? IMO it would make
> the config more intuitive because 0 would not have two possible
> meanings (i.e. try once and never try) anymore.
>
> Best,
> Bruno
>
> On Sat, May 16, 2020 at 7:51 PM Guozhang Wang <[email protected]> wrote:
> >
> > Hello Matthias,
> >
> > Thanks for the updated KIP, overall I'm +1 on this proposal. Some minor
> > comments (I know gmail mixed that again for me so I'm leaving it as a
> combo
> > for both DISCUSS and VOTE :)
> >
> > 1) There are some inconsistent statements in the proposal regarding what
> to
> > deprecated: at the beginning it says "We propose to deprecate the retries
> > configuration parameter for the producer and admin client" but later in
> > compatibility we say "Producer and admin client behavior does not change;
> > both still respect retries config." My understanding is that we will only
> > deprecate the StreamsConfig#retries while not touch on
> > ProducerConfig/AdminClientConfig#retries, AND we will always override the
> > embedded producer / admin retries config to infinity so that we never
> rely
> > on those configs but always bounded by the timeout config. Is that
> > right, if yes could you clarify in the doc?
> >
> > 2) We should also document the related behavior change in
> PartitionAssignor
> > that uses AdminClient. More specifically, the admin client's retries
> config
> > is piggy-backed inside InternalTopicManager as an outer-loop retry logic
> in
> > addition to AdminClient's own inner retry loop. There are some existing
> > issues like KAFKA-9999 / 10006 that Sophie and Boyang has been working
> on.
> > I exchanged some ideas with them, and generally we should consider if the
> > outer-loop of InternalTopicManager should just be removed and if we got
> > TimeoutException we should just trigger another rebalance etc.
> >
> > BTW as I mentioned in the previous statement, today throwing an exception
> > that kills one thread but not the whole instance is still an issue for
> > monitoring purposes, but I suppose this is not going to be in this KIP
> but
> > addressed by another KIP, right?
> >
> >
> > Guozhang
> >
> >
> > On Fri, May 15, 2020 at 1:14 PM Boyang Chen <[email protected]>
> > wrote:
> >
> > > Good, good.
> > >
> > > Read through the discussions, the KIP looks good to me, +1
> (non-binding)
> > >
> > > On Fri, May 15, 2020 at 11:51 AM Sophie Blee-Goldman <
> [email protected]>
> > > wrote:
> > >
> > > > Called out!
> > > >
> > > > Seems like gmail struggles with [...] prefixed subjects. You'd think
> they
> > > > would adapt
> > > > all their practices to conform to the Apache Kafka mailing list
> > > standards,
> > > > but no!
> > > >
> > > > +1 (non-binding) by the way
> > > >
> > > > On Fri, May 15, 2020 at 11:46 AM John Roesler <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Boyang,
> > > > >
> > > > > It is a separate thread, and you have just revealed yourself as a
> gmail
> > > > > user ;)
> > > > >
> > > > > (Gmail sometimes conflates vote and discuss threads for no apparent
> > > > reason
> > > > > )
> > > > >
> > > > > -John
> > > > >
> > > > > On Fri, May 15, 2020, at 13:39, Boyang Chen wrote:
> > > > > > Hey Matthias, should this be on a separate VOTE thread?
> > > > > >
> > > > > > On Fri, May 15, 2020 at 11:38 AM John Roesler <
> [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks, Matthias!
> > > > > > >
> > > > > > > I’m +1 (binding)
> > > > > > >
> > > > > > > -John
> > > > > > >
> > > > > > > On Fri, May 15, 2020, at 11:55, Matthias J. Sax wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I would like to start the vote on KIP-572:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams
> > > > > > > >
> > > > > > > >
> > > > > > > > -Matthias
> > > > > > > >
> > > > > > > >
> > > > > > > > Attachments:
> > > > > > > > * signature.asc
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
>


-- 
-- Guozhang

Reply via email to