Hi Matthias, Sorry for flooding the thread, but with this KIP I feel the design scope of https://issues.apache.org/jira/browse/KAFKA-6520 can be simplified a lot and may it the design can be just piggy-backed as part of this KIP, wdyt?
Guozhang On Mon, May 18, 2020 at 9:47 AM Guozhang Wang <wangg...@gmail.com> wrote: > Hi Matthias, > > Just to add one more meta comment: for consumer, if it gets a > TimeoutException polling records, would start timing all tasks since that > single consumer would affect all tasks? For other blocking calls like > `endOffsets()` etc, they are usually also issued on behalf of a batch of > tasks, so if that gets timeout exception should we start ticking all the > corresponding tasks as well? Maybe worth clarifying a bit more in the wiki. > > > Guozhang > > On Mon, May 18, 2020 at 12:26 AM Bruno Cadonna <br...@confluent.io> wrote: > >> Hi Matthias, >> >> I am +1 (non-binding) on the KIP. >> >> Just one final remark: Wouldn't it be better to specify >> task.timeout.ms to -1 if no retry should be done? IMO it would make >> the config more intuitive because 0 would not have two possible >> meanings (i.e. try once and never try) anymore. >> >> Best, >> Bruno >> >> On Sat, May 16, 2020 at 7:51 PM Guozhang Wang <wangg...@gmail.com> wrote: >> > >> > Hello Matthias, >> > >> > Thanks for the updated KIP, overall I'm +1 on this proposal. Some minor >> > comments (I know gmail mixed that again for me so I'm leaving it as a >> combo >> > for both DISCUSS and VOTE :) >> > >> > 1) There are some inconsistent statements in the proposal regarding >> what to >> > deprecated: at the beginning it says "We propose to deprecate the >> retries >> > configuration parameter for the producer and admin client" but later in >> > compatibility we say "Producer and admin client behavior does not >> change; >> > both still respect retries config." My understanding is that we will >> only >> > deprecate the StreamsConfig#retries while not touch on >> > ProducerConfig/AdminClientConfig#retries, AND we will always override >> the >> > embedded producer / admin retries config to infinity so that we never >> rely >> > on those configs but always bounded by the timeout config. Is that >> > right, if yes could you clarify in the doc? >> > >> > 2) We should also document the related behavior change in >> PartitionAssignor >> > that uses AdminClient. More specifically, the admin client's retries >> config >> > is piggy-backed inside InternalTopicManager as an outer-loop retry >> logic in >> > addition to AdminClient's own inner retry loop. There are some existing >> > issues like KAFKA-9999 / 10006 that Sophie and Boyang has been working >> on. >> > I exchanged some ideas with them, and generally we should consider if >> the >> > outer-loop of InternalTopicManager should just be removed and if we got >> > TimeoutException we should just trigger another rebalance etc. >> > >> > BTW as I mentioned in the previous statement, today throwing an >> exception >> > that kills one thread but not the whole instance is still an issue for >> > monitoring purposes, but I suppose this is not going to be in this KIP >> but >> > addressed by another KIP, right? >> > >> > >> > Guozhang >> > >> > >> > On Fri, May 15, 2020 at 1:14 PM Boyang Chen <reluctanthero...@gmail.com >> > >> > wrote: >> > >> > > Good, good. >> > > >> > > Read through the discussions, the KIP looks good to me, +1 >> (non-binding) >> > > >> > > On Fri, May 15, 2020 at 11:51 AM Sophie Blee-Goldman < >> sop...@confluent.io> >> > > wrote: >> > > >> > > > Called out! >> > > > >> > > > Seems like gmail struggles with [...] prefixed subjects. You'd >> think they >> > > > would adapt >> > > > all their practices to conform to the Apache Kafka mailing list >> > > standards, >> > > > but no! >> > > > >> > > > +1 (non-binding) by the way >> > > > >> > > > On Fri, May 15, 2020 at 11:46 AM John Roesler <vvcep...@apache.org> >> > > wrote: >> > > > >> > > > > Hi Boyang, >> > > > > >> > > > > It is a separate thread, and you have just revealed yourself as a >> gmail >> > > > > user ;) >> > > > > >> > > > > (Gmail sometimes conflates vote and discuss threads for no >> apparent >> > > > reason >> > > > > ) >> > > > > >> > > > > -John >> > > > > >> > > > > On Fri, May 15, 2020, at 13:39, Boyang Chen wrote: >> > > > > > Hey Matthias, should this be on a separate VOTE thread? >> > > > > > >> > > > > > On Fri, May 15, 2020 at 11:38 AM John Roesler < >> vvcep...@apache.org> >> > > > > wrote: >> > > > > > >> > > > > > > Thanks, Matthias! >> > > > > > > >> > > > > > > I’m +1 (binding) >> > > > > > > >> > > > > > > -John >> > > > > > > >> > > > > > > On Fri, May 15, 2020, at 11:55, Matthias J. Sax wrote: >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > I would like to start the vote on KIP-572: >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams >> > > > > > > > >> > > > > > > > >> > > > > > > > -Matthias >> > > > > > > > >> > > > > > > > >> > > > > > > > Attachments: >> > > > > > > > * signature.asc >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > -- >> > -- Guozhang >> > > > -- > -- Guozhang > -- -- Guozhang