Thanks, Boyang.  That seems reasonable.

best,
Colin

On Mon, Mar 30, 2020, at 13:32, Boyang Chen wrote:
> Thanks Guozhang for the summary. If everyone agrees to the strategy here,
> which include:
> 
> 1. No affection to 2.5 release, maintaining the same crashing logic for new
> Producer API talking to old clients
> 2. Add an internal flag to silently downgrade protocol for Kafka Streams
> case only
> 
> I will go ahead to make the corresponding changes to unblock.
> 
> Boyang
> 
> On Mon, Mar 30, 2020 at 11:22 AM Guozhang Wang <wangg...@gmail.com> wrote:
> 
> > Hello Colin,
> >
> > Thanks for the context you provided with CreateAclsRequest to make the
> > point for compatibility policy, and I think they make sense -- given
> > KIP-447 is primarily motivated for Streams clients, we should still
> > consider it's potential compatibility pitfall for non-Streams clients.
> >
> > About the resolution here, personally I do not think that punt this KIP for
> > the next release is a good idea; instead I think we can make the change for
> > Streams only while keeping non-Streams to still throw --- Boyang have
> > proposed with the internal flag do to so, and I'd want to add that we
> > should emphasize on docs when UnsupportedVersion is thrown on producer it
> > might already be too late (as shown in the example in KIP wiki, in such a
> > consumer -> process -> producer pattern when producer throws some
> > processing may already been done with irreversible side-effects) so that
> > users are aware.
> >
> >
> > Guozhang
> >
> >
> > On Sun, Mar 29, 2020 at 9:07 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > Thanks for the explanations, all.  It seems to me that the proposed
> > silent
> > > downgrade change leaves non-streams applications with no way to safely
> > use
> > > KIP-447.  If they use it, it will sometimes be safe, but sometimes not,
> > > depending on what they are doing.
> > >
> > > This doesn't seem acceptable.  People should be able to rely on EOS when
> > > it exists.
> > >
> > > I agree that it is frustrating that the UnsupportedVersionException
> > > happens when the applications have already set up many things, rather
> > than
> > > being accessible when the application first starts up.  But this does not
> > > fundamentally change the compatibility policy.  It is difficult
> > (actually,
> > > it's impossible) to work around the lack of CreateAclsRequest or
> > > DeleteAclsRequest.  But this doesn't motivate us to return fake results
> > for
> > > the corresponding AdminClient calls when the RPCs are not present.  Some
> > > things can't be faked.
> > >
> > > It seems like the right thing to do here would be to expose the KIP-447
> > > functionality as a feature flag.  Then, applications can test if that
> > > feature flag is present before they perform a long setup process.  They
> > can
> > > also  decide if they want to provide a fallback path or not.
> > >
> > > Of course, feature flags won't make 2.5, so we need some more short-term
> > > solution here.  Either we punt this KIP to the next release, or we find
> > > some way to make it safe for everyone, not just Streams (possibly by only
> > > enabling it for streams?)
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Sat, Mar 28, 2020, at 01:31, Matthias J. Sax wrote:
> > > > I agree with Guozhang that the issue only arises for the
> > > > `read-process-write` pattern and thus we should guard at the read level
> > > > already and thus auto-downgrading the write path commitTx request seems
> > > > fine.
> > > >
> > > > Note that for producer-only and consumer-only apps the auto-downgrade
> > is
> > > > totally fine (and actually desired, especially for the consumer to
> > > > guarantee backward compatibility as there is not public API change) for
> > > > the corresponding request. For `read-process-write` with no
> > exactly-once
> > > > or the existing exactly-once pattern auto-downgrading is also fine for
> > > > both clients.
> > > >
> > > > The only issue I see atm might be, that the consumer side guard depends
> > > > on an internal config and there is not public API to enable this guard.
> > > > Hence, for non-Streams users there might be a gap: If they don't
> > upgrade
> > > > their brokers first but switch to the new exactly-once pattern they
> > > > loose exactly-once protection without any warning...
> > > >
> > > > I am frankly not sure how many users would actually build exactly-once
> > > > apps without using Streams and might be affected?
> > > >
> > > > To address the issue I consider the following two options:
> > > >
> > > >  - educate users about this internal consumer config
> > > >  - make the consumer config public
> > > >
> > > >
> > > > Thoughts?
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > >
> > > > On 3/27/20 10:12 PM, Guozhang Wang wrote:
> > > > > Hello Colin, Ismael,
> > > > >
> > > > > Thanks for your feedbacks, they are quite helpful. Just to provide
> > some
> > > > > context here about OffsetFetch:
> > > > >
> > > > > 1) When building the offset fetch request, we used to auto
> > "downgrade"
> > > the
> > > > > request by falling back the requireStable flag when broker supporter
> > > > > version is < 7.
> > > > >
> > > > >
> > >
> > https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/requests/OffsetFetchRequest.java#L82
> > > > >
> > > > > 2) The newly introduced config value "eos-beta" in KIP-447 requires
> > > brokers
> > > > > to be on version 2.5 or later, and if broker is not on newer version,
> > > we
> > > > > need to let the client to fail fast; in order to do so an internal
> > > config
> > > > > from consumer is exposed to Streams, which allows the build function
> > to
> > > > > throw the UnsupportedVersionException instead of auto downgrading
> > when
> > > > > building the offset-fetch that requireStable flag. This is only
> > turned
> > > on
> > > > > by Streams so that if broker is not on newer version when "eos-beta"
> > is
> > > > > enabled on the client, we will fail fast .
> > > > >
> > > > > In that sense, a new client would not use old RPC talking to the
> > > brokers --
> > > > > the key point is that, upon starting a task the offset fetch is
> > firstly
> > > > > sent before we start processing records, so that's where we should
> > > stop the
> > > > > client right away instead of silently reading unstable offsets which
> > > would
> > > > > already break the fencing guarantees if producers were not fenced on
> > > txn
> > > > > coordinator. In other words, we think that letting the overloaded
> > > > > producer#sendOffsetsToTxn to throw UnsupportedVersion exception is
> > > already
> > > > > too late, and hence we should not relying on that API to let clients
> > > detect
> > > > > if brokers are on older versions.
> > > > >
> > > > > For KIP-98, however, letting producer#initTxn to throw
> > > UnsupportedVersion
> > > > > makes total sense since that's the first API triggered before
> > > producers try
> > > > > to send any records.
> > > > >
> > > > > 3) With that in mind, I think letting the overloaded
> > > > > producer#sendOffsetsToTxn to not throw UnsupportedVersion is
> > > appropriate,
> > > > > since KIP-447 only applies to a consume -> process -> produce
> > scenario
> > > and
> > > > > does not apply to a producer-only scenario for transactional
> > messaging.
> > > > > Instead, the consumer API calls that trigger earlier would detect
> > older
> > > > > brokers. A side benefit of this is that, on the caller side (a.k.a
> > > Streams)
> > > > > we do not need to first detect the broker version and then decide
> > which
> > > > > overloaded `producer#sendOffsetsToTxn` to call based on that, which
> > > also
> > > > > means that we may eventually deprecate the old overloaded function
> > one
> > > day.
> > > > >
> > > > >
> > > > > Please let me know if you have any questions regarding this
> > rationale,
> > > and
> > > > > we can continue the discussion from there.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 27, 2020 at 9:21 PM Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > > >
> > > > >> I'm a bit puzzled. We added this feature because we thought it was
> > > useful.
> > > > >> And now we are saying that you don't know if you can rely on it
> > since
> > > the
> > > > >> downgrade happens silently. Can you provide more context on the
> > > OffsetFetch
> > > > >> downgrade? What is the implication of that?
> > > > >>
> > > > >> Ismael
> > > > >>
> > > > >> On Fri, Mar 27, 2020 at 7:30 PM Boyang Chen <
> > > reluctanthero...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Thanks Colin, I think the point of this change is to make the new
> > > client
> > > > >>> experience better while working with old brokers, when the upgrade
> > is
> > > > >>> happening on client side only. For Streams, it is very common to
> > have
> > > > >> more
> > > > >>> advanced client version. On code level, the path to call
> > transaction
> > > > >> commit
> > > > >>> is better to be unified instead of diverged with try-catch. As a
> > > matter
> > > > >> of
> > > > >>> fact, we have already implemented the downgrade of the OffsetFetch
> > > > >> protocol
> > > > >>> to adapt this broker version incompatibility. New transaction
> > commit
> > > > >>> protocol thus behaves inconsistent with what we did with other
> > > protocols.
> > > > >>>
> > > > >>> In terms of the impact for this last minute change, silent
> > downgrade
> > > has
> > > > >>> one downside which is the customized EOS applications could not
> > > alarm the
> > > > >>> user of semantic violation anymore by simply crashing. To be
> > honest,
> > > I'm
> > > > >>> not sure how severe this would affect customized community usages,
> > > as we
> > > > >>> don't have user stats for that yet :)
> > > > >>>
> > > > >>> Boyang
> > > > >>>
> > > > >>> On Fri, Mar 27, 2020 at 6:32 PM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > > >>>
> > > > >>>> On Fri, Mar 27, 2020, at 18:29, Colin McCabe wrote:
> > > > >>>>> On Fri, Mar 27, 2020, at 16:06, Boyang Chen wrote:
> > > > >>>>>> Hey there,
> > > > >>>>>>
> > > > >>>>>> we would like to address an improvement on the
> > > > >>>>>> *Producer#sendOffsetsToTransaction(offsets,
> > > > >>>>>> groupMetadata) *API. Previously we assume the calling of this
> > > > >>> function
> > > > >>>>>> would crash the app if broker is not on version 2.5.0 or higher.
> > > As
> > > > >>>> Streams
> > > > >>>>>> side change was rolling out, the disadvantage of this behavior
> > > > >>> becomes
> > > > >>>>>> obvious, since on the application level it is hard to detect
> > > broker
> > > > >>>> version
> > > > >>>>>> and do separate call paths for old and new transaction commit
> > > > >>>> protocols.
> > > > >>>>>
> > > > >>>>> Hi Boyang,
> > > > >>>>>
> > > > >>>>> Applications are never supposed to detect broker versions, but
> > they
> > > > >> may
> > > > >>>>> be required to detect the presence or absence of a feature.  That
> > > > >> would
> > > > >>>>> happen if a new feature was introduced that couldn't be emulated
> > > with
> > > > >>>>> some combination of the features that previously existed.
> > > > >>>>>
> > > > >>>>> That was the case for the original EOS work.  For example,
> > > > >>>>> initTransactions will throw UnsupportedVersionException if the
> > > > >> producer
> > > > >>>>> tries to use it, but the broker doesn't support it.
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> Thus, we are planning to change the behavior from crashing to
> > > > >>> silently
> > > > >>>>>> downgrade to the older transactional offset commit protocol by
> > > > >>>> resetting
> > > > >>>>>> the consumer group metadata fields. Details in this PR
> > > > >>>>>> <https://github.com/apache/kafka/pull/8375>.
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> The big question is will using the old client code with the old
> > RPC
> > > > >> be
> > > > >>>>> less safe than using the old client code with the old RPC?  If it
> > > > >> will,
> > > > >>>>> there's a strong case to be made that streams should catch the
> > > > >>>>> UnsupportedVersionException and fall back to its old behavior.
> > > > >>>>
> > > > >>>> Sorry, that should read "new client code with old RPC".
> > Basically,
> > > are
> > > > >>> we
> > > > >>>> making things worse for users with older brokers and newer
> > clients?
> > > > >>>>
> > > > >>>> Colin
> > > > >>>>
> > > > >>>>>
> > > > >>>>> best,
> > > > >>>>> Colin
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> Let me know if you have any questions, thanks!
> > > > >>>>>>
> > > > >>>>>> On Thu, Mar 26, 2020 at 12:45 PM Matthias J. Sax <
> > > mj...@apache.org
> > > > >>>
> > > > >>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> One more change for KIP-447.
> > > > >>>>>>>
> > > > >>>>>>> Currently, Kafka Streams collects task-level metrics called
> > > > >>>>>>> "commit-latency-[max|avg]". However, with KIP-447 tasks don't
> > get
> > > > >>>>>>> committed individually any longer, and thus this metrics do not
> > > > >>> make
> > > > >>>>>>> sense going forward.
> > > > >>>>>>>
> > > > >>>>>>> Therefore, we propose to remove those metrics in 2.6 release.
> > > > >>>>>>> Deprecation does not make much sense, as there is just nothing
> > to
> > > > >>> be
> > > > >>>>>>> recorded in a useful way any longer.
> > > > >>>>>>>
> > > > >>>>>>> I also updated the upgrade path section, as the upgrade path is
> > > > >>>> actually
> > > > >>>>>>> simpler as described originally.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> If there are any concerns, please let us know!
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> -Matthias
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On 3/5/20 12:56 PM, Matthias J. Sax wrote:
> > > > >>>>>>>> There is one more change to this KIP for the upgrade path of
> > > > >>> Kafka
> > > > >>>>>>>> Streams applications:
> > > > >>>>>>>>
> > > > >>>>>>>> We cannot detect broker versions reliable, and thus, we need
> > > > >>> users
> > > > >>>> to
> > > > >>>>>>>> manually opt-in to the feature. Thus, we need to add a third
> > > > >>> value
> > > > >>>> for
> > > > >>>>>>>> configuration parameter `processing.guarantee` that we call
> > > > >>>>>>>> `exactly_once_beta` -- specifying this config will enable the
> > > > >>>> producer
> > > > >>>>>>>> per thread design.
> > > > >>>>>>>>
> > > > >>>>>>>> I updated the KIP accordingly. Please let us know if there are
> > > > >>> any
> > > > >>>>>>>> concerns.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> -Matthias
> > > > >>>>>>>>
> > > > >>>>>>>> On 2/11/20 4:44 PM, Guozhang Wang wrote:
> > > > >>>>>>>>> Boyang,
> > > > >>>>>>>>
> > > > >>>>>>>>> Thanks for the update. This change makes sense to me.
> > > > >>>>>>>>
> > > > >>>>>>>>> Guozhang
> > > > >>>>>>>>
> > > > >>>>>>>>> On Tue, Feb 11, 2020 at 11:37 AM Boyang Chen
> > > > >>>>>>>>> <reluctanthero...@gmail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>>> Hey there,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> we are adding a small change to the KIP-447 public API. The
> > > > >>>>>>>>>> default value of
> > > > >>>>>>>>>> `
> > transaction.abort.timed.out.transaction.cleanup.interval.ms
> > > > >> `
> > > > >>>>>>>>>> shall be changed from 1 minute to 10 seconds. The goal here
> > > > >> is
> > > > >>> to
> > > > >>>>>>>>>> trigger the expired transaction more frequently in order to
> > > > >>>>>>>>>> reduce the consumer pending offset fetch wait time.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Let me know if you have further questions, thanks!
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Jan 8, 2020 at 3:44 PM Boyang Chen
> > > > >>>>>>>>>> <reluctanthero...@gmail.com> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Thanks Guozhang for another review! I have addressed all
> > the
> > > > >>>>>>>>>>> javadoc changes for PendingTransactionException in the KIP.
> > > > >>>>>>>>>>> For
> > > > >>>>>>>>>> FENCED_INSTANCE_ID
> > > > >>>>>>>>>>> the only thrown place would be on the new send offsets API,
> > > > >>>>>>>>>>> which is also addressed.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks Matthias for the vote! As we have 3 binding votes
> > > > >>>>>>>>>>> (Guozhang,
> > > > >>>>>>>>>> Jason,
> > > > >>>>>>>>>>> and Matthias), the KIP is officially accepted and prepared
> > > > >> to
> > > > >>>>>>>>>>> ship in
> > > > >>>>>>>>>> 2.5.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Still feel free to put more thoughts on either discussion
> > or
> > > > >>>>>>>>>>> voting
> > > > >>>>>>>>>> thread
> > > > >>>>>>>>>>> to refine the KIP!
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Wed, Jan 8, 2020 at 3:15 PM Matthias J. Sax
> > > > >>>>>>>>>>> <matth...@confluent.io> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> I just re-read the KIP. Overall I am +1 as well.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Some minor comments (also apply to the Google design doc):
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> 1) As 2.4 was release, references should be updated to
> > 2.5.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Addressed
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> 2) About the upgrade path, the KIP says:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> 2a)
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Broker must be upgraded to 2.4 first. This means the
> > > > >>>>>>>>>>>> `inter.broker.protocol.version` (IBP) has to be set to the
> > > > >>>>>>>>>>>> latest. Any produce request with higher version will
> > > > >>>>>>>>>>>> automatically get fenced
> > > > >>>>>>>>>> because
> > > > >>>>>>>>>>>> of no support.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> From my understanding, this is not correct? After a broker
> > > > >> is
> > > > >>>>>>>>>>>> updated to the new binaries, it should accept new
> > requests,
> > > > >>>>>>>>>>>> even if IBP was not bumped yet?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Your understanding was correct, after some offline
> > > > >> discussion
> > > > >>>>>>>>>>>> we should
> > > > >>>>>>>>>>> not worry about IBP in this case.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> 2b)
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> About the two rolling bounces for KS apps and the
> > statement
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> one should never allow task producer and thread producer
> > > > >>>>>>>>>>>>> under the
> > > > >>>>>>>>>> same
> > > > >>>>>>>>>>>> application group
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> In the second rolling bounce, we might actually mix both
> > > > >> (ie,
> > > > >>>>>>>>>>>> per-task and per-thread producers) but this is fine as
> > > > >>>>>>>>>>>> explained in the KIP. The only case we cannot allow is,
> > old
> > > > >>>>>>>>>>>> per-task producers (without consumer generation fencing)
> > to
> > > > >>>>>>>>>>>> be mixed with per-thread producers (that rely solely on
> > > > >>>>>>>>>>>> consumer generation fencing).
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Does this sound correct?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Correct, that's the purpose of doing 2 rolling bounce,
> > > > >> where
> > > > >>>>>>>>>>>> the first
> > > > >>>>>>>>>>> one is to guarantee everyone's opt-in for generation
> > > > >> fencing.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> 3) We should also document how users can use KS 2.5
> > > > >>>>>>>>>>>> applications against older brokers -- for this case, we
> > > > >> need
> > > > >>>>>>>>>>>> to stay on per-task producers and cannot use the new
> > > > >> fencing
> > > > >>>>>>>>>>>> mechanism. Currently, the KIP only describe a single way
> > > > >> how
> > > > >>>>>>>>>>>> users could make this work: by setting (and keeping)
> > > > >>>>>>>>>>>> UPGRADE_FROM config to 2.4 (what might not be an ideal
> > > > >>>>>>>>>>>> solution and might also not be clear by itself that people
> > > > >>>>>>>>>>>> would need to do
> > > > >>>>>>>>>> this)?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Yes exactly, at the moment we are actively working on a
> > > > >> plan
> > > > >>>>>>>>>>>> to acquire
> > > > >>>>>>>>>>> broker's IBP during stream start-up and initialize based
> > off
> > > > >>>>>>>>>>> that information, so that user doesn't need to keep
> > > > >>>>>>>>>>> UPGRADE_FROM simply for working with
> > > > >>>>>>>>>> old
> > > > >>>>>>>>>>> brokers.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> -Matthias
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On 9/18/19 4:41 PM, Boyang Chen wrote:
> > > > >>>>>>>>>>>>> Bump this thread to see if someone could also review!
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen <
> > > > >>>>>>>>>> reluctanthero...@gmail.com>
> > > > >>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Thank you Jason! Addressed the comments.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Thank you Guozhang for explaining. I will document the
> > > > >>>>>>>>>>>>>> timeout
> > > > >>>>>>>>>> setting
> > > > >>>>>>>>>>>>>> reasoning in the design doc.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang
> > > > >>>>>>>>>>>>>> <wangg...@gmail.com>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen <
> > > > >>>>>>>>>>>> reluctanthero...@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Thanks Guozhang, I have polished the design doc to
> > > > >>>>>>>>>>>>>>>> make it sync
> > > > >>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>> current KIP. As for overriding default timeout
> > > > >>>>>>>>>>>>>>>> values, I guess it's
> > > > >>>>>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>>>> stated in the KIP to set txn timeout to 10s, are you
> > > > >>>>>>>>>>>>>>>> suggesting we
> > > > >>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>> also put down this recommendation on the KIP for
> > > > >>>>>>>>>>>>>>>> non-stream EOS
> > > > >>>>>>>>>>>> users?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> My comment is not for changing any produce / consumer
> > > > >>>>>>>>>>>>>>>> default
> > > > >>>>>>>>>> config
> > > > >>>>>>>>>>>>>>> values, but for the Streams configs, to make sure that
> > > > >>>>>>>>>>>>>>> our overridden config values respect the above rules.
> > > > >>>>>>>>>>>>>>> That is, we check
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> actual value used in the config if they are ever
> > > > >>>>>>>>>>>>>>> overridden by
> > > > >>>>>>>>>> users,
> > > > >>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>> if the above were not true we can log a warning that it
> > > > >>>>>>>>>>>>>>> may be risky
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> encounter some unnecessary rebalances.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Again, this is not something we need to include in the
> > > > >>>>>>>>>>>>>>> KIP since it
> > > > >>>>>>>>>>>> is not
> > > > >>>>>>>>>>>>>>> part of public APIs, just to emphasize that the
> > > > >>>>>>>>>>>>>>> internal
> > > > >>>>>>>>>>>> implementation
> > > > >>>>>>>>>>>>>>> can have some safety guard like this.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Guozhang
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Boyang
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang
> > > > >>>>>>>>>>>>>>>> <wangg...@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Hello Boyang,
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Just realized one thing about timeout
> > > > >>>>>>>>>>>>>>>>> configurations that we
> > > > >>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>> consider including in this KIP as well:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> 1) In Producer we have: max.block.ms (default value
> > > > >>>>>>>>>>>>>>>>> 60sec), request.timeout (30sec),
> > > > >>>>>>>>>>>>>>>>> delivery.timeout.ms (120sec), transaction.timeout
> > > > >>>>>>>>>> (60sec)
> > > > >>>>>>>>>>>>>>>>> 2) In Consumer we have: session.timeout (10sec),
> > > > >>>>>>>>>>>>>>>>> request.timeout
> > > > >>>>>>>>>>>>>>> (30sec),
> > > > >>>>>>>>>>>>>>>>> default.api.timeout.ms (60sec).
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Within a transaction (i.e. after we've beginTxn),
> > > > >>>>>>>>>>>>>>>>> we could
> > > > >>>>>>>>>>>> potentially
> > > > >>>>>>>>>>>>>>>> call
> > > > >>>>>>>>>>>>>>>>> consumer blocking APIs that depend on
> > > > >>>>>>>>>>>>>>>>> default.api.timeout.ms, and
> > > > >>>>>>>>>>>>>>> call
> > > > >>>>>>>>>>>>>>>>> producer blocking APIs that depend on max.block.ms.
> > > > >>>>>>>>>>>>>>>>> Also, if the
> > > > >>>>>>>>>>>>>>> user is
> > > > >>>>>>>>>>>>>>>>> following a consumer->producer pattern, then it
> > > > >>>>>>>>>>>>>>>>> could be kicked
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> fenced
> > > > >>>>>>>>>>>>>>>>> either by txn or by consumer group session.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> So we want to make sure that in the caller, e.g.
> > > > >>>>>>>>>>>>>>>>> Kafka Streams:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> 1) transaction.timeout < max.block.ms 2)
> > > > >>>>>>>>>>>>>>>>> transaction.timeout < delivery.timeout.ms 3)
> > > > >>>>>>>>>>>>>>>>> transaction.timeout < default.api.timeout.ms 4)
> > > > >>>>>>>>>>>>>>>>> transaction.timeout ~= default.api.timeout.ms (I
> > > > >>>>>>>>>>>>>>>>> think this is
> > > > >>>>>>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>>>>> mentioned in the KIP, just wanted to bring this up
> > > > >>>>>>>>>>>>>>>>> again)
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> We do not need to override the default since not
> > > > >>>>>>>>>>>>>>>>> every users are
> > > > >>>>>>>>>>>>>>>> following
> > > > >>>>>>>>>>>>>>>>> the consumer -> producer pattern, but in cases like
> > > > >>>>>>>>>>>>>>>>> Streams where
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>> indeed the case, we should override the default
> > > > >>>>>>>>>>>>>>>>> values to obey the
> > > > >>>>>>>>>>>>>>> above
> > > > >>>>>>>>>>>>>>>>> rules.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Guozhang
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang
> > > > >>>>>>>>>>>>>>>>> <wangg...@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Thanks Boyang, I'm +1 on the KIP.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Could you also update the detailed design doc
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>
> > > https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQh
> > > > >>>>>>>> rEIkph9qRE/edit
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>> which
> > > > >>>>>>>>>>>>>>>>>> seems a bit out-dated with the latest proposal?
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Guozhang
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen <
> > > > >>>>>>>>>>>>>>>> reluctanthero...@gmail.com>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Hey all,
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> I would like to start the vote for KIP-447 <
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer
> > > > >>>>>>>> +scalability+for+exactly+once+semantics
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>> .
> > > > >>>>>>>>>>>>>>>>>>> This is a very important step to improve Kafka
> > > > >>>>>>>>>>>>>>>>>>> Streams
> > > > >>>>>>>>>> scalability
> > > > >>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>> exactly-once semantics, by avoiding linearly
> > > > >>>>>>>>>>>>>>>>>>> increasing number
> > > > >>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>> producers with topic partition increases.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Thanks, Boyang
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> -- -- Guozhang
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> -- -- Guozhang
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> -- -- Guozhang
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > Attachments:
> > > > * signature.asc
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Reply via email to