Thanks, Boyang. That seems reasonable. best, Colin
On Mon, Mar 30, 2020, at 13:32, Boyang Chen wrote: > Thanks Guozhang for the summary. If everyone agrees to the strategy here, > which include: > > 1. No affection to 2.5 release, maintaining the same crashing logic for new > Producer API talking to old clients > 2. Add an internal flag to silently downgrade protocol for Kafka Streams > case only > > I will go ahead to make the corresponding changes to unblock. > > Boyang > > On Mon, Mar 30, 2020 at 11:22 AM Guozhang Wang <wangg...@gmail.com> wrote: > > > Hello Colin, > > > > Thanks for the context you provided with CreateAclsRequest to make the > > point for compatibility policy, and I think they make sense -- given > > KIP-447 is primarily motivated for Streams clients, we should still > > consider it's potential compatibility pitfall for non-Streams clients. > > > > About the resolution here, personally I do not think that punt this KIP for > > the next release is a good idea; instead I think we can make the change for > > Streams only while keeping non-Streams to still throw --- Boyang have > > proposed with the internal flag do to so, and I'd want to add that we > > should emphasize on docs when UnsupportedVersion is thrown on producer it > > might already be too late (as shown in the example in KIP wiki, in such a > > consumer -> process -> producer pattern when producer throws some > > processing may already been done with irreversible side-effects) so that > > users are aware. > > > > > > Guozhang > > > > > > On Sun, Mar 29, 2020 at 9:07 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > Thanks for the explanations, all. It seems to me that the proposed > > silent > > > downgrade change leaves non-streams applications with no way to safely > > use > > > KIP-447. If they use it, it will sometimes be safe, but sometimes not, > > > depending on what they are doing. > > > > > > This doesn't seem acceptable. People should be able to rely on EOS when > > > it exists. > > > > > > I agree that it is frustrating that the UnsupportedVersionException > > > happens when the applications have already set up many things, rather > > than > > > being accessible when the application first starts up. But this does not > > > fundamentally change the compatibility policy. It is difficult > > (actually, > > > it's impossible) to work around the lack of CreateAclsRequest or > > > DeleteAclsRequest. But this doesn't motivate us to return fake results > > for > > > the corresponding AdminClient calls when the RPCs are not present. Some > > > things can't be faked. > > > > > > It seems like the right thing to do here would be to expose the KIP-447 > > > functionality as a feature flag. Then, applications can test if that > > > feature flag is present before they perform a long setup process. They > > can > > > also decide if they want to provide a fallback path or not. > > > > > > Of course, feature flags won't make 2.5, so we need some more short-term > > > solution here. Either we punt this KIP to the next release, or we find > > > some way to make it safe for everyone, not just Streams (possibly by only > > > enabling it for streams?) > > > > > > best, > > > Colin > > > > > > > > > On Sat, Mar 28, 2020, at 01:31, Matthias J. Sax wrote: > > > > I agree with Guozhang that the issue only arises for the > > > > `read-process-write` pattern and thus we should guard at the read level > > > > already and thus auto-downgrading the write path commitTx request seems > > > > fine. > > > > > > > > Note that for producer-only and consumer-only apps the auto-downgrade > > is > > > > totally fine (and actually desired, especially for the consumer to > > > > guarantee backward compatibility as there is not public API change) for > > > > the corresponding request. For `read-process-write` with no > > exactly-once > > > > or the existing exactly-once pattern auto-downgrading is also fine for > > > > both clients. > > > > > > > > The only issue I see atm might be, that the consumer side guard depends > > > > on an internal config and there is not public API to enable this guard. > > > > Hence, for non-Streams users there might be a gap: If they don't > > upgrade > > > > their brokers first but switch to the new exactly-once pattern they > > > > loose exactly-once protection without any warning... > > > > > > > > I am frankly not sure how many users would actually build exactly-once > > > > apps without using Streams and might be affected? > > > > > > > > To address the issue I consider the following two options: > > > > > > > > - educate users about this internal consumer config > > > > - make the consumer config public > > > > > > > > > > > > Thoughts? > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > On 3/27/20 10:12 PM, Guozhang Wang wrote: > > > > > Hello Colin, Ismael, > > > > > > > > > > Thanks for your feedbacks, they are quite helpful. Just to provide > > some > > > > > context here about OffsetFetch: > > > > > > > > > > 1) When building the offset fetch request, we used to auto > > "downgrade" > > > the > > > > > request by falling back the requireStable flag when broker supporter > > > > > version is < 7. > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/requests/OffsetFetchRequest.java#L82 > > > > > > > > > > 2) The newly introduced config value "eos-beta" in KIP-447 requires > > > brokers > > > > > to be on version 2.5 or later, and if broker is not on newer version, > > > we > > > > > need to let the client to fail fast; in order to do so an internal > > > config > > > > > from consumer is exposed to Streams, which allows the build function > > to > > > > > throw the UnsupportedVersionException instead of auto downgrading > > when > > > > > building the offset-fetch that requireStable flag. This is only > > turned > > > on > > > > > by Streams so that if broker is not on newer version when "eos-beta" > > is > > > > > enabled on the client, we will fail fast . > > > > > > > > > > In that sense, a new client would not use old RPC talking to the > > > brokers -- > > > > > the key point is that, upon starting a task the offset fetch is > > firstly > > > > > sent before we start processing records, so that's where we should > > > stop the > > > > > client right away instead of silently reading unstable offsets which > > > would > > > > > already break the fencing guarantees if producers were not fenced on > > > txn > > > > > coordinator. In other words, we think that letting the overloaded > > > > > producer#sendOffsetsToTxn to throw UnsupportedVersion exception is > > > already > > > > > too late, and hence we should not relying on that API to let clients > > > detect > > > > > if brokers are on older versions. > > > > > > > > > > For KIP-98, however, letting producer#initTxn to throw > > > UnsupportedVersion > > > > > makes total sense since that's the first API triggered before > > > producers try > > > > > to send any records. > > > > > > > > > > 3) With that in mind, I think letting the overloaded > > > > > producer#sendOffsetsToTxn to not throw UnsupportedVersion is > > > appropriate, > > > > > since KIP-447 only applies to a consume -> process -> produce > > scenario > > > and > > > > > does not apply to a producer-only scenario for transactional > > messaging. > > > > > Instead, the consumer API calls that trigger earlier would detect > > older > > > > > brokers. A side benefit of this is that, on the caller side (a.k.a > > > Streams) > > > > > we do not need to first detect the broker version and then decide > > which > > > > > overloaded `producer#sendOffsetsToTxn` to call based on that, which > > > also > > > > > means that we may eventually deprecate the old overloaded function > > one > > > day. > > > > > > > > > > > > > > > Please let me know if you have any questions regarding this > > rationale, > > > and > > > > > we can continue the discussion from there. > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > On Fri, Mar 27, 2020 at 9:21 PM Ismael Juma <ism...@juma.me.uk> > > wrote: > > > > > > > > > >> I'm a bit puzzled. We added this feature because we thought it was > > > useful. > > > > >> And now we are saying that you don't know if you can rely on it > > since > > > the > > > > >> downgrade happens silently. Can you provide more context on the > > > OffsetFetch > > > > >> downgrade? What is the implication of that? > > > > >> > > > > >> Ismael > > > > >> > > > > >> On Fri, Mar 27, 2020 at 7:30 PM Boyang Chen < > > > reluctanthero...@gmail.com> > > > > >> wrote: > > > > >> > > > > >>> Thanks Colin, I think the point of this change is to make the new > > > client > > > > >>> experience better while working with old brokers, when the upgrade > > is > > > > >>> happening on client side only. For Streams, it is very common to > > have > > > > >> more > > > > >>> advanced client version. On code level, the path to call > > transaction > > > > >> commit > > > > >>> is better to be unified instead of diverged with try-catch. As a > > > matter > > > > >> of > > > > >>> fact, we have already implemented the downgrade of the OffsetFetch > > > > >> protocol > > > > >>> to adapt this broker version incompatibility. New transaction > > commit > > > > >>> protocol thus behaves inconsistent with what we did with other > > > protocols. > > > > >>> > > > > >>> In terms of the impact for this last minute change, silent > > downgrade > > > has > > > > >>> one downside which is the customized EOS applications could not > > > alarm the > > > > >>> user of semantic violation anymore by simply crashing. To be > > honest, > > > I'm > > > > >>> not sure how severe this would affect customized community usages, > > > as we > > > > >>> don't have user stats for that yet :) > > > > >>> > > > > >>> Boyang > > > > >>> > > > > >>> On Fri, Mar 27, 2020 at 6:32 PM Colin McCabe <cmcc...@apache.org> > > > wrote: > > > > >>> > > > > >>>> On Fri, Mar 27, 2020, at 18:29, Colin McCabe wrote: > > > > >>>>> On Fri, Mar 27, 2020, at 16:06, Boyang Chen wrote: > > > > >>>>>> Hey there, > > > > >>>>>> > > > > >>>>>> we would like to address an improvement on the > > > > >>>>>> *Producer#sendOffsetsToTransaction(offsets, > > > > >>>>>> groupMetadata) *API. Previously we assume the calling of this > > > > >>> function > > > > >>>>>> would crash the app if broker is not on version 2.5.0 or higher. > > > As > > > > >>>> Streams > > > > >>>>>> side change was rolling out, the disadvantage of this behavior > > > > >>> becomes > > > > >>>>>> obvious, since on the application level it is hard to detect > > > broker > > > > >>>> version > > > > >>>>>> and do separate call paths for old and new transaction commit > > > > >>>> protocols. > > > > >>>>> > > > > >>>>> Hi Boyang, > > > > >>>>> > > > > >>>>> Applications are never supposed to detect broker versions, but > > they > > > > >> may > > > > >>>>> be required to detect the presence or absence of a feature. That > > > > >> would > > > > >>>>> happen if a new feature was introduced that couldn't be emulated > > > with > > > > >>>>> some combination of the features that previously existed. > > > > >>>>> > > > > >>>>> That was the case for the original EOS work. For example, > > > > >>>>> initTransactions will throw UnsupportedVersionException if the > > > > >> producer > > > > >>>>> tries to use it, but the broker doesn't support it. > > > > >>>>> > > > > >>>>>> > > > > >>>>>> Thus, we are planning to change the behavior from crashing to > > > > >>> silently > > > > >>>>>> downgrade to the older transactional offset commit protocol by > > > > >>>> resetting > > > > >>>>>> the consumer group metadata fields. Details in this PR > > > > >>>>>> <https://github.com/apache/kafka/pull/8375>. > > > > >>>>>> > > > > >>>>> > > > > >>>>> The big question is will using the old client code with the old > > RPC > > > > >> be > > > > >>>>> less safe than using the old client code with the old RPC? If it > > > > >> will, > > > > >>>>> there's a strong case to be made that streams should catch the > > > > >>>>> UnsupportedVersionException and fall back to its old behavior. > > > > >>>> > > > > >>>> Sorry, that should read "new client code with old RPC". > > Basically, > > > are > > > > >>> we > > > > >>>> making things worse for users with older brokers and newer > > clients? > > > > >>>> > > > > >>>> Colin > > > > >>>> > > > > >>>>> > > > > >>>>> best, > > > > >>>>> Colin > > > > >>>>> > > > > >>>>>> > > > > >>>>>> Let me know if you have any questions, thanks! > > > > >>>>>> > > > > >>>>>> On Thu, Mar 26, 2020 at 12:45 PM Matthias J. Sax < > > > mj...@apache.org > > > > >>> > > > > >>>> wrote: > > > > >>>>>> > > > > >>>>>>> One more change for KIP-447. > > > > >>>>>>> > > > > >>>>>>> Currently, Kafka Streams collects task-level metrics called > > > > >>>>>>> "commit-latency-[max|avg]". However, with KIP-447 tasks don't > > get > > > > >>>>>>> committed individually any longer, and thus this metrics do not > > > > >>> make > > > > >>>>>>> sense going forward. > > > > >>>>>>> > > > > >>>>>>> Therefore, we propose to remove those metrics in 2.6 release. > > > > >>>>>>> Deprecation does not make much sense, as there is just nothing > > to > > > > >>> be > > > > >>>>>>> recorded in a useful way any longer. > > > > >>>>>>> > > > > >>>>>>> I also updated the upgrade path section, as the upgrade path is > > > > >>>> actually > > > > >>>>>>> simpler as described originally. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> If there are any concerns, please let us know! > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> -Matthias > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> On 3/5/20 12:56 PM, Matthias J. Sax wrote: > > > > >>>>>>>> There is one more change to this KIP for the upgrade path of > > > > >>> Kafka > > > > >>>>>>>> Streams applications: > > > > >>>>>>>> > > > > >>>>>>>> We cannot detect broker versions reliable, and thus, we need > > > > >>> users > > > > >>>> to > > > > >>>>>>>> manually opt-in to the feature. Thus, we need to add a third > > > > >>> value > > > > >>>> for > > > > >>>>>>>> configuration parameter `processing.guarantee` that we call > > > > >>>>>>>> `exactly_once_beta` -- specifying this config will enable the > > > > >>>> producer > > > > >>>>>>>> per thread design. > > > > >>>>>>>> > > > > >>>>>>>> I updated the KIP accordingly. Please let us know if there are > > > > >>> any > > > > >>>>>>>> concerns. > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> -Matthias > > > > >>>>>>>> > > > > >>>>>>>> On 2/11/20 4:44 PM, Guozhang Wang wrote: > > > > >>>>>>>>> Boyang, > > > > >>>>>>>> > > > > >>>>>>>>> Thanks for the update. This change makes sense to me. > > > > >>>>>>>> > > > > >>>>>>>>> Guozhang > > > > >>>>>>>> > > > > >>>>>>>>> On Tue, Feb 11, 2020 at 11:37 AM Boyang Chen > > > > >>>>>>>>> <reluctanthero...@gmail.com> wrote: > > > > >>>>>>>> > > > > >>>>>>>>>> Hey there, > > > > >>>>>>>>>> > > > > >>>>>>>>>> we are adding a small change to the KIP-447 public API. The > > > > >>>>>>>>>> default value of > > > > >>>>>>>>>> ` > > transaction.abort.timed.out.transaction.cleanup.interval.ms > > > > >> ` > > > > >>>>>>>>>> shall be changed from 1 minute to 10 seconds. The goal here > > > > >> is > > > > >>> to > > > > >>>>>>>>>> trigger the expired transaction more frequently in order to > > > > >>>>>>>>>> reduce the consumer pending offset fetch wait time. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Let me know if you have further questions, thanks! > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Wed, Jan 8, 2020 at 3:44 PM Boyang Chen > > > > >>>>>>>>>> <reluctanthero...@gmail.com> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Thanks Guozhang for another review! I have addressed all > > the > > > > >>>>>>>>>>> javadoc changes for PendingTransactionException in the KIP. > > > > >>>>>>>>>>> For > > > > >>>>>>>>>> FENCED_INSTANCE_ID > > > > >>>>>>>>>>> the only thrown place would be on the new send offsets API, > > > > >>>>>>>>>>> which is also addressed. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thanks Matthias for the vote! As we have 3 binding votes > > > > >>>>>>>>>>> (Guozhang, > > > > >>>>>>>>>> Jason, > > > > >>>>>>>>>>> and Matthias), the KIP is officially accepted and prepared > > > > >> to > > > > >>>>>>>>>>> ship in > > > > >>>>>>>>>> 2.5. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Still feel free to put more thoughts on either discussion > > or > > > > >>>>>>>>>>> voting > > > > >>>>>>>>>> thread > > > > >>>>>>>>>>> to refine the KIP! > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Wed, Jan 8, 2020 at 3:15 PM Matthias J. Sax > > > > >>>>>>>>>>> <matth...@confluent.io> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> I just re-read the KIP. Overall I am +1 as well. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> Some minor comments (also apply to the Google design doc): > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> 1) As 2.4 was release, references should be updated to > > 2.5. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Addressed > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> 2) About the upgrade path, the KIP says: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> 2a) > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Broker must be upgraded to 2.4 first. This means the > > > > >>>>>>>>>>>> `inter.broker.protocol.version` (IBP) has to be set to the > > > > >>>>>>>>>>>> latest. Any produce request with higher version will > > > > >>>>>>>>>>>> automatically get fenced > > > > >>>>>>>>>> because > > > > >>>>>>>>>>>> of no support. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> From my understanding, this is not correct? After a broker > > > > >> is > > > > >>>>>>>>>>>> updated to the new binaries, it should accept new > > requests, > > > > >>>>>>>>>>>> even if IBP was not bumped yet? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Your understanding was correct, after some offline > > > > >> discussion > > > > >>>>>>>>>>>> we should > > > > >>>>>>>>>>> not worry about IBP in this case. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> 2b) > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> About the two rolling bounces for KS apps and the > > statement > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> one should never allow task producer and thread producer > > > > >>>>>>>>>>>>> under the > > > > >>>>>>>>>> same > > > > >>>>>>>>>>>> application group > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> In the second rolling bounce, we might actually mix both > > > > >> (ie, > > > > >>>>>>>>>>>> per-task and per-thread producers) but this is fine as > > > > >>>>>>>>>>>> explained in the KIP. The only case we cannot allow is, > > old > > > > >>>>>>>>>>>> per-task producers (without consumer generation fencing) > > to > > > > >>>>>>>>>>>> be mixed with per-thread producers (that rely solely on > > > > >>>>>>>>>>>> consumer generation fencing). > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Does this sound correct? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Correct, that's the purpose of doing 2 rolling bounce, > > > > >> where > > > > >>>>>>>>>>>> the first > > > > >>>>>>>>>>> one is to guarantee everyone's opt-in for generation > > > > >> fencing. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> 3) We should also document how users can use KS 2.5 > > > > >>>>>>>>>>>> applications against older brokers -- for this case, we > > > > >> need > > > > >>>>>>>>>>>> to stay on per-task producers and cannot use the new > > > > >> fencing > > > > >>>>>>>>>>>> mechanism. Currently, the KIP only describe a single way > > > > >> how > > > > >>>>>>>>>>>> users could make this work: by setting (and keeping) > > > > >>>>>>>>>>>> UPGRADE_FROM config to 2.4 (what might not be an ideal > > > > >>>>>>>>>>>> solution and might also not be clear by itself that people > > > > >>>>>>>>>>>> would need to do > > > > >>>>>>>>>> this)? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Yes exactly, at the moment we are actively working on a > > > > >> plan > > > > >>>>>>>>>>>> to acquire > > > > >>>>>>>>>>> broker's IBP during stream start-up and initialize based > > off > > > > >>>>>>>>>>> that information, so that user doesn't need to keep > > > > >>>>>>>>>>> UPGRADE_FROM simply for working with > > > > >>>>>>>>>> old > > > > >>>>>>>>>>> brokers. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> -Matthias > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On 9/18/19 4:41 PM, Boyang Chen wrote: > > > > >>>>>>>>>>>>> Bump this thread to see if someone could also review! > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen < > > > > >>>>>>>>>> reluctanthero...@gmail.com> > > > > >>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Thank you Jason! Addressed the comments. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Thank you Guozhang for explaining. I will document the > > > > >>>>>>>>>>>>>> timeout > > > > >>>>>>>>>> setting > > > > >>>>>>>>>>>>>> reasoning in the design doc. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang > > > > >>>>>>>>>>>>>> <wangg...@gmail.com> > > > > >>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen < > > > > >>>>>>>>>>>> reluctanthero...@gmail.com> > > > > >>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Thanks Guozhang, I have polished the design doc to > > > > >>>>>>>>>>>>>>>> make it sync > > > > >>>>>>>>>> with > > > > >>>>>>>>>>>>>>>> current KIP. As for overriding default timeout > > > > >>>>>>>>>>>>>>>> values, I guess it's > > > > >>>>>>>>>>>>>>> already > > > > >>>>>>>>>>>>>>>> stated in the KIP to set txn timeout to 10s, are you > > > > >>>>>>>>>>>>>>>> suggesting we > > > > >>>>>>>>>>>>>>> should > > > > >>>>>>>>>>>>>>>> also put down this recommendation on the KIP for > > > > >>>>>>>>>>>>>>>> non-stream EOS > > > > >>>>>>>>>>>> users? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> My comment is not for changing any produce / consumer > > > > >>>>>>>>>>>>>>>> default > > > > >>>>>>>>>> config > > > > >>>>>>>>>>>>>>> values, but for the Streams configs, to make sure that > > > > >>>>>>>>>>>>>>> our overridden config values respect the above rules. > > > > >>>>>>>>>>>>>>> That is, we check > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>>>> actual value used in the config if they are ever > > > > >>>>>>>>>>>>>>> overridden by > > > > >>>>>>>>>> users, > > > > >>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>> if the above were not true we can log a warning that it > > > > >>>>>>>>>>>>>>> may be risky > > > > >>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> encounter some unnecessary rebalances. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Again, this is not something we need to include in the > > > > >>>>>>>>>>>>>>> KIP since it > > > > >>>>>>>>>>>> is not > > > > >>>>>>>>>>>>>>> part of public APIs, just to emphasize that the > > > > >>>>>>>>>>>>>>> internal > > > > >>>>>>>>>>>> implementation > > > > >>>>>>>>>>>>>>> can have some safety guard like this. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Guozhang > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Boyang > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang > > > > >>>>>>>>>>>>>>>> <wangg...@gmail.com> > > > > >>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Hello Boyang, > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Just realized one thing about timeout > > > > >>>>>>>>>>>>>>>>> configurations that we > > > > >>>>>>>>>> should > > > > >>>>>>>>>>>>>>>>> consider including in this KIP as well: > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> 1) In Producer we have: max.block.ms (default value > > > > >>>>>>>>>>>>>>>>> 60sec), request.timeout (30sec), > > > > >>>>>>>>>>>>>>>>> delivery.timeout.ms (120sec), transaction.timeout > > > > >>>>>>>>>> (60sec) > > > > >>>>>>>>>>>>>>>>> 2) In Consumer we have: session.timeout (10sec), > > > > >>>>>>>>>>>>>>>>> request.timeout > > > > >>>>>>>>>>>>>>> (30sec), > > > > >>>>>>>>>>>>>>>>> default.api.timeout.ms (60sec). > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Within a transaction (i.e. after we've beginTxn), > > > > >>>>>>>>>>>>>>>>> we could > > > > >>>>>>>>>>>> potentially > > > > >>>>>>>>>>>>>>>> call > > > > >>>>>>>>>>>>>>>>> consumer blocking APIs that depend on > > > > >>>>>>>>>>>>>>>>> default.api.timeout.ms, and > > > > >>>>>>>>>>>>>>> call > > > > >>>>>>>>>>>>>>>>> producer blocking APIs that depend on max.block.ms. > > > > >>>>>>>>>>>>>>>>> Also, if the > > > > >>>>>>>>>>>>>>> user is > > > > >>>>>>>>>>>>>>>>> following a consumer->producer pattern, then it > > > > >>>>>>>>>>>>>>>>> could be kicked > > > > >>>>>>>>>> and > > > > >>>>>>>>>>>>>>>> fenced > > > > >>>>>>>>>>>>>>>>> either by txn or by consumer group session. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> So we want to make sure that in the caller, e.g. > > > > >>>>>>>>>>>>>>>>> Kafka Streams: > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> 1) transaction.timeout < max.block.ms 2) > > > > >>>>>>>>>>>>>>>>> transaction.timeout < delivery.timeout.ms 3) > > > > >>>>>>>>>>>>>>>>> transaction.timeout < default.api.timeout.ms 4) > > > > >>>>>>>>>>>>>>>>> transaction.timeout ~= default.api.timeout.ms (I > > > > >>>>>>>>>>>>>>>>> think this is > > > > >>>>>>>>>>>>>>>> already > > > > >>>>>>>>>>>>>>>>> mentioned in the KIP, just wanted to bring this up > > > > >>>>>>>>>>>>>>>>> again) > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> We do not need to override the default since not > > > > >>>>>>>>>>>>>>>>> every users are > > > > >>>>>>>>>>>>>>>> following > > > > >>>>>>>>>>>>>>>>> the consumer -> producer pattern, but in cases like > > > > >>>>>>>>>>>>>>>>> Streams where > > > > >>>>>>>>>> it > > > > >>>>>>>>>>>>>>> is > > > > >>>>>>>>>>>>>>>>> indeed the case, we should override the default > > > > >>>>>>>>>>>>>>>>> values to obey the > > > > >>>>>>>>>>>>>>> above > > > > >>>>>>>>>>>>>>>>> rules. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Guozhang > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang > > > > >>>>>>>>>>>>>>>>> <wangg...@gmail.com> > > > > >>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Thanks Boyang, I'm +1 on the KIP. > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Could you also update the detailed design doc > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>> > > > https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQh > > > > >>>>>>>> rEIkph9qRE/edit > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> which > > > > >>>>>>>>>>>>>>>>>> seems a bit out-dated with the latest proposal? > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> Guozhang > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen < > > > > >>>>>>>>>>>>>>>> reluctanthero...@gmail.com> > > > > >>>>>>>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> Hey all, > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> I would like to start the vote for KIP-447 < > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer > > > > >>>>>>>> +scalability+for+exactly+once+semantics > > > > >>>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> . > > > > >>>>>>>>>>>>>>>>>>> This is a very important step to improve Kafka > > > > >>>>>>>>>>>>>>>>>>> Streams > > > > >>>>>>>>>> scalability > > > > >>>>>>>>>>>>>>> in > > > > >>>>>>>>>>>>>>>>>>> exactly-once semantics, by avoiding linearly > > > > >>>>>>>>>>>>>>>>>>> increasing number > > > > >>>>>>>>>> of > > > > >>>>>>>>>>>>>>>>>>> producers with topic partition increases. > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>>> Thanks, Boyang > > > > >>>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>>> -- -- Guozhang > > > > >>>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> -- -- Guozhang > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> -- -- Guozhang > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > > > > > > Attachments: > > > > * signature.asc > > > > > > > > > -- > > -- Guozhang > > >