Hi Matthias, Just to add one more meta comment: for consumer, if it gets a TimeoutException polling records, would start timing all tasks since that single consumer would affect all tasks? For other blocking calls like `endOffsets()` etc, they are usually also issued on behalf of a batch of tasks, so if that gets timeout exception should we start ticking all the corresponding tasks as well? Maybe worth clarifying a bit more in the wiki.
Guozhang On Mon, May 18, 2020 at 12:26 AM Bruno Cadonna <[email protected]> wrote: > Hi Matthias, > > I am +1 (non-binding) on the KIP. > > Just one final remark: Wouldn't it be better to specify > task.timeout.ms to -1 if no retry should be done? IMO it would make > the config more intuitive because 0 would not have two possible > meanings (i.e. try once and never try) anymore. > > Best, > Bruno > > On Sat, May 16, 2020 at 7:51 PM Guozhang Wang <[email protected]> wrote: > > > > Hello Matthias, > > > > Thanks for the updated KIP, overall I'm +1 on this proposal. Some minor > > comments (I know gmail mixed that again for me so I'm leaving it as a > combo > > for both DISCUSS and VOTE :) > > > > 1) There are some inconsistent statements in the proposal regarding what > to > > deprecated: at the beginning it says "We propose to deprecate the retries > > configuration parameter for the producer and admin client" but later in > > compatibility we say "Producer and admin client behavior does not change; > > both still respect retries config." My understanding is that we will only > > deprecate the StreamsConfig#retries while not touch on > > ProducerConfig/AdminClientConfig#retries, AND we will always override the > > embedded producer / admin retries config to infinity so that we never > rely > > on those configs but always bounded by the timeout config. Is that > > right, if yes could you clarify in the doc? > > > > 2) We should also document the related behavior change in > PartitionAssignor > > that uses AdminClient. More specifically, the admin client's retries > config > > is piggy-backed inside InternalTopicManager as an outer-loop retry logic > in > > addition to AdminClient's own inner retry loop. There are some existing > > issues like KAFKA-9999 / 10006 that Sophie and Boyang has been working > on. > > I exchanged some ideas with them, and generally we should consider if the > > outer-loop of InternalTopicManager should just be removed and if we got > > TimeoutException we should just trigger another rebalance etc. > > > > BTW as I mentioned in the previous statement, today throwing an exception > > that kills one thread but not the whole instance is still an issue for > > monitoring purposes, but I suppose this is not going to be in this KIP > but > > addressed by another KIP, right? > > > > > > Guozhang > > > > > > On Fri, May 15, 2020 at 1:14 PM Boyang Chen <[email protected]> > > wrote: > > > > > Good, good. > > > > > > Read through the discussions, the KIP looks good to me, +1 > (non-binding) > > > > > > On Fri, May 15, 2020 at 11:51 AM Sophie Blee-Goldman < > [email protected]> > > > wrote: > > > > > > > Called out! > > > > > > > > Seems like gmail struggles with [...] prefixed subjects. You'd think > they > > > > would adapt > > > > all their practices to conform to the Apache Kafka mailing list > > > standards, > > > > but no! > > > > > > > > +1 (non-binding) by the way > > > > > > > > On Fri, May 15, 2020 at 11:46 AM John Roesler <[email protected]> > > > wrote: > > > > > > > > > Hi Boyang, > > > > > > > > > > It is a separate thread, and you have just revealed yourself as a > gmail > > > > > user ;) > > > > > > > > > > (Gmail sometimes conflates vote and discuss threads for no apparent > > > > reason > > > > > ) > > > > > > > > > > -John > > > > > > > > > > On Fri, May 15, 2020, at 13:39, Boyang Chen wrote: > > > > > > Hey Matthias, should this be on a separate VOTE thread? > > > > > > > > > > > > On Fri, May 15, 2020 at 11:38 AM John Roesler < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > Thanks, Matthias! > > > > > > > > > > > > > > I’m +1 (binding) > > > > > > > > > > > > > > -John > > > > > > > > > > > > > > On Fri, May 15, 2020, at 11:55, Matthias J. Sax wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I would like to start the vote on KIP-572: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams > > > > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > > > > > Attachments: > > > > > > > > * signature.asc > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > -- Guozhang > -- -- Guozhang
