Hi Guozhang,

I thought that the assignor will always be consulted when the next
heartbeat request is constructed. In other words,
`PartitionAssignor#metadata` will be called for every heartbeat. This
gives the opportunity for the assignor to enforce a rebalance by
setting the reason to a non-zero value or by changing the bytes. Do
you think that this is not sufficient? Are you concerned by the delay?

Best,
David

On Fri, Sep 9, 2022 at 7:10 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
> Hello David,
>
> One of Jun's comments make me thinking:
>
> ```
> In this case, a new assignment is triggered by the client side
> assignor. When constructing the HB, the consumer will always consult
> the client side assignor and propagate the information to the group
> coordinator. In other words, we don't expect users to call
> Consumer#enforceRebalance anymore.
> ```
>
> As I looked at the current PartitionAssignor's interface, we actually do
> not have a way yet to instruct how to construct the next HB request, e.g.
> when the assignor wants to enforce a new rebalance with a new assignment,
> we'd need some customizable APIs inside the PartitionAssignor to indicate
> the next HB telling broker about so. WDYT about adding such an API on the
> PartitionAssignor?
>
>
> Guozhang
>
>
> On Tue, Sep 6, 2022 at 6:09 AM David Jacot <dja...@confluent.io.invalid>
> wrote:
>
> > Hi Jun,
> >
> > I have updated the KIP to include your feedback. I have also tried to
> > clarify the parts which were not cleared.
> >
> > Best,
> > David
> >
> > On Fri, Sep 2, 2022 at 4:18 PM David Jacot <dja...@confluent.io> wrote:
> > >
> > > Hi Jun,
> > >
> > > Thanks for your feedback. Let me start by answering your questions
> > > inline and I will update the KIP next week.
> > >
> > > > Thanks for the KIP. Overall, the main benefits of the KIP seem to be
> > fewer
> > > > RPCs during rebalance and more efficient support of wildcard. A few
> > > > comments below.
> > >
> > > I would also add that the KIP removes the global sync barrier in the
> > > protocol which is essential to improve group stability and
> > > scalability, and the KIP also simplifies the client by moving most of
> > > the logic to the server side.
> > >
> > > > 30. ConsumerGroupHeartbeatRequest
> > > > 30.1 ServerAssignor is a singleton. Do we plan to support rolling
> > changing
> > > > of the partition assignor in the consumers?
> > >
> > > Definitely. The group coordinator will use the assignor used by a
> > > majority of the members. This allows the group to move from one
> > > assignor to another by a roll. This is explained in the Assignor
> > > Selection chapter.
> > >
> > > > 30.2 For each field, could you explain whether it's required in every
> > > > request or the scenarios when it needs to be filled? For example, it's
> > not
> > > > clear to me when TopicPartitions needs to be filled.
> > >
> > > The client is expected to set those fields in case of a connection
> > > issue (e.g. timeout) or when the fields have changed since the last
> > > HB. The server populates those fields as long as the member is not
> > > fully reconciled - the member should acknowledge that it has the
> > > expected epoch and assignment. I will clarify this in the KIP.
> > >
> > > > 31. In the current consumer protocol, the rack affinity between the
> > client
> > > > and the broker is only considered during fetching, but not during
> > assigning
> > > > partitions to consumers. Sometimes, once the assignment is made, there
> > is
> > > > no opportunity for read affinity because no replicas of assigned
> > partitions
> > > > are close to the member. I am wondering if we should use this
> > opportunity
> > > > to address this by including rack in GroupMember.
> > >
> > > That's an interesting idea. I don't see any issue with adding the rack
> > > to the members. I will do so.
> > >
> > > > 32. On the metric side, often, it's useful to know how busy a group
> > > > coordinator is. By moving the event loop model, it seems that we could
> > add
> > > > a metric that tracks the fraction of the time the event loop is doing
> > the
> > > > actual work.
> > >
> > > That's a great idea. I will add it. Thanks.
> > >
> > > > 33. Could we add a section on coordinator failover handling? For
> > example,
> > > > does it need to trigger the check if any group with the wildcard
> > > > subscription now has a new matching topic?
> > >
> > > Sure. When the new group coordinator takes over, it has to:
> > > * Setup the session timeouts.
> > > * Trigger a new assignment if a client side assignor is used. We don't
> > > store the information about the member selected to run the assignment
> > > so we have to start a new one.
> > > * Update the topics metadata, verify the wildcard subscriptions, and
> > > trigger a rebalance if needed.
> > >
> > > > 34. ConsumerGroupMetadataValue, ConsumerGroupPartitionMetadataValue,
> > > > ConsumerGroupMemberMetadataValue: Could we document what the epoch
> > field
> > > > reflects? For example, does the epoch in ConsumerGroupMetadataValue
> > reflect
> > > > the latest group epoch? What about the one in
> > > > ConsumerGroupPartitionMetadataValue and
> > ConsumerGroupMemberMetadataValue?
> > >
> > > Sure. I will clarify that but it is always the latest group epoch.
> > > When the group state is updated, the group epoch is bumped so we use
> > > that one for all the change records related to the update.
> > >
> > > > 35. "the group coordinator will ensure that the following invariants
> > are
> > > > met: ... All members exists." It's possible for a member not to get any
> > > > assigned partitions, right?
> > >
> > > That's right. Here I meant that the members provided by the assignor
> > > in the assignment must exist in the group. The assignor can not make
> > > up new member ids.
> > >
> > > > 36. "He can rejoins the group with a member epoch equals to 0": When
> > would
> > > > a consumer rejoin and what member id would be used?
> > >
> > > A member is expected to abandon all its partitions and rejoins when it
> > > receives the FENCED_MEMBER_EPOCH error. In this case, the group
> > > coordinator will have removed the member from the group. The member
> > > can rejoin the group with the same member id but with 0 as epoch. Let
> > > me see if I can clarify this in the KIP.
> > >
> > > > 37. "Instead, power users will have the ability to trigger a
> > reassignment
> > > > by either providing a non-zero reason or by updating the assignor
> > > > metadata." Hmm, this seems to be conflicting with the deprecation of
> > > > Consumer#enforeRebalance.
> > >
> > > In this case, a new assignment is triggered by the client side
> > > assignor. When constructing the HB, the consumer will always consult
> > > the client side assignor and propagate the information to the group
> > > coordinator. In other words, we don't expect users to call
> > > Consumer#enforceRebalance anymore.
> > >
> > > > 38. The reassignment examples are nice. But the section seems to have
> > > > multiple typos.
> > > > 38.1 When the group transitions to epoch 2, B immediately gets into
> > > > "epoch=1, partitions=[foo-2]", which seems incorrect.
> > > > 38.2 When the group transitions to epoch 3, C seems to get into
> > epoch=3,
> > > > partitions=[foo-1] too early.
> > > > 38.3 After A transitions to epoch 3, C still has A - epoch=2,
> > > > partitions=[foo-0].
> > >
> > > Sorry for that! I will revise them.
> > >
> > > > 39. Rolling upgrade of consumers: Do we support the upgrade from any
> > old
> > > > version to new one?
> > >
> > > We will support upgrading from the consumer protocol version 3,
> > > introduced in KIP-792. KIP-792 is not implemented yet so the earliest
> > > version is unknown at the moment. This is explained in the migration
> > > plan chapter.
> > >
> > > Thanks again for your feedback, Jun. I will update the KIP based on it
> > > next week.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Sep 1, 2022 at 9:07 PM Jun Rao <j...@confluent.io.invalid> wrote:
> > > >
> > > > Hi, David,
> > > >
> > > > Thanks for the KIP. Overall, the main benefits of the KIP seem to be
> > fewer
> > > > RPCs during rebalance and more efficient support of wildcard. A few
> > > > comments below.
> > > >
> > > > 30. ConsumerGroupHeartbeatRequest
> > > > 30.1 ServerAssignor is a singleton. Do we plan to support rolling
> > changing
> > > > of the partition assignor in the consumers?
> > > > 30.2 For each field, could you explain whether it's required in every
> > > > request or the scenarios when it needs to be filled? For example, it's
> > not
> > > > clear to me when TopicPartitions needs to be filled.
> > > >
> > > > 31. In the current consumer protocol, the rack affinity between the
> > client
> > > > and the broker is only considered during fetching, but not during
> > assigning
> > > > partitions to consumers. Sometimes, once the assignment is made, there
> > is
> > > > no opportunity for read affinity because no replicas of assigned
> > partitions
> > > > are close to the member. I am wondering if we should use this
> > opportunity
> > > > to address this by including rack in GroupMember.
> > > >
> > > > 32. On the metric side, often, it's useful to know how busy a group
> > > > coordinator is. By moving the event loop model, it seems that we could
> > add
> > > > a metric that tracks the fraction of the time the event loop is doing
> > the
> > > > actual work.
> > > >
> > > > 33. Could we add a section on coordinator failover handling? For
> > example,
> > > > does it need to trigger the check if any group with the wildcard
> > > > subscription now has a new matching topic?
> > > >
> > > > 34. ConsumerGroupMetadataValue, ConsumerGroupPartitionMetadataValue,
> > > > ConsumerGroupMemberMetadataValue: Could we document what the epoch
> > field
> > > > reflects? For example, does the epoch in ConsumerGroupMetadataValue
> > reflect
> > > > the latest group epoch? What about the one in
> > > > ConsumerGroupPartitionMetadataValue and
> > ConsumerGroupMemberMetadataValue?
> > > >
> > > > 35. "the group coordinator will ensure that the following invariants
> > are
> > > > met: ... All members exists." It's possible for a member not to get any
> > > > assigned partitions, right?
> > > >
> > > > 36. "He can rejoins the group with a member epoch equals to 0": When
> > would
> > > > a consumer rejoin and what member id would be used?
> > > >
> > > > 37. "Instead, power users will have the ability to trigger a
> > reassignment
> > > > by either providing a non-zero reason or by updating the assignor
> > > > metadata." Hmm, this seems to be conflicting with the deprecation of
> > > > Consumer#enforeRebalance.
> > > >
> > > > 38. The reassignment examples are nice. But the section seems to have
> > > > multiple typos.
> > > > 38.1 When the group transitions to epoch 2, B immediately gets into
> > > > "epoch=1, partitions=[foo-2]", which seems incorrect.
> > > > 38.2 When the group transitions to epoch 3, C seems to get into
> > epoch=3,
> > > > partitions=[foo-1] too early.
> > > > 38.3 After A transitions to epoch 3, C still has A - epoch=2,
> > > > partitions=[foo-0].
> > > >
> > > > 39. Rolling upgrade of consumers: Do we support the upgrade from any
> > old
> > > > version to new one?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Aug 29, 2022 at 9:20 AM David Jacot
> > <dja...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The KIP states that we will re-implement the coordinator in Java. I
> > > > > discussed this offline with a few folks and folks are concerned that
> > > > > we could introduce many regressions in the old protocol if we do so.
> > > > > Therefore, I am going to remove this statement from the KIP. It is an
> > > > > implementation detail after all so it does not have to be decided at
> > > > > this stage. We will likely start by trying to refactor the current
> > > > > implementation as a first step.
> > > > >
> > > > > Cheers,
> > > > > David
> > > > >
> > > > > On Mon, Aug 29, 2022 at 3:52 PM David Jacot <dja...@confluent.io>
> > wrote:
> > > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > > 1.1. I think the state machine are: "Empty, assigning,
> > reconciling,
> > > > > stable,
> > > > > > > dead" mentioned in Consumer Group States section, right?
> > > > > >
> > > > > > This sentence does not refer to those group states but rather to a
> > > > > > state machine replication (SMR). This refers to the entire state of
> > > > > > group coordinator which is replicated via the log layer. I will
> > > > > > clarify this in the KIP.
> > > > > >
> > > > > > > 1.2. What do you mean "each state machine is modelled as an event
> > > > > loop"?
> > > > > >
> > > > > > The idea is to follow a model similar to the new quorum
> > controller. We
> > > > > > will have N threads to process events. Each __consumer_offsets
> > > > > > partition is assigned to a unique thread and all the events (e.g.
> > > > > > requests, callbacks, etc.) are processed by this thread. This
> > simplify
> > > > > > concurrency and will enable us to do simulation testing for the
> > group
> > > > > > coordinator.
> > > > > >
> > > > > > > 1.3. Why do we need a state machine per *__consumer_offsets*
> > > > > partitions?
> > > > > > > Not a state machine "per consumer group" owned by a group
> > coordinator?
> > > > > For
> > > > > > > example, if one group coordinator owns 2 consumer groups, and
> > both
> > > > > exist in
> > > > > > > *__consumer_offsets-0*, will we have 1 state machine for it, or
> > 2?
> > > > > >
> > > > > > See 1.1. The confusion comes from there, I think.
> > > > > >
> > > > > > > 1.4. I know the "*group.coordinator.threads" *is the number of
> > threads
> > > > > used
> > > > > > > to run the state machines. But I'm wondering if the purpose of
> > the
> > > > > threads
> > > > > > > is only to keep the state of each consumer group (or
> > > > > *__consumer_offsets*
> > > > > > > partitions?), and no heavy computation, why should we need
> > > > > multi-threads
> > > > > > > here?
> > > > > >
> > > > > > See 1.2. The idea is to have an ability to shard the processing as
> > the
> > > > > > computation could be heavy.
> > > > > >
> > > > > > > 2.1. The consumer session timeout, why does the default session
> > > > > timeout not
> > > > > > > locate between min (45s) and max(60s)? I thought the min/max
> > session
> > > > > > > timeout is to define lower/upper bound of it, no?
> > > > > > >
> > > > > > > group.consumer.session.timeout.ms int 30s The timeout to detect
> > client
> > > > > > > failures when using the consumer group protocol.
> > > > > > > group.consumer.min.session.timeout.ms int 45s The minimum
> > session
> > > > > timeout.
> > > > > > > group.consumer.max.session.timeout.ms int 60s The maximum
> > session
> > > > > timeout.
> > > > > >
> > > > > > This is indeed a mistake. The default session timeout should be 45s
> > > > > > (the current default).
> > > > > >
> > > > > > > 2.2. The default server side assignor are [range, uniform],
> > which means
> > > > > > > we'll default to "range" assignor. I'd like to know why not
> > uniform
> > > > > one? I
> > > > > > > thought usually users will choose uniform assignor (former sticky
> > > > > assinor)
> > > > > > > for better evenly distribution. Any other reason we choose range
> > > > > assignor
> > > > > > > as default?
> > > > > > > group.consumer.assignors List range, uniform The server side
> > assignors.
> > > > > >
> > > > > > The order on the server side has no influence because the client
> > must
> > > > > > chose the selector that he wants to use. There is no default in the
> > > > > > current proposal. If the assignor is not specified by the client,
> > the
> > > > > > request is rejected. The default client value for
> > > > > > `group.remote.assignor` is `uniform` though.
> > > > > >
> > > > > > Thanks for your very good comments, Luke. I hope that my answers
> > help
> > > > > > to clarify things. I will update the KIP as well based on your
> > > > > > feedback.
> > > > > >
> > > > > > Cheers,
> > > > > > David
> > > > > >
> > > > > > On Mon, Aug 22, 2022 at 9:29 AM Luke Chen <show...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > > Thanks for the update.
> > > > > > >
> > > > > > > Some more questions:
> > > > > > > 1. In Group Coordinator section, you mentioned:
> > > > > > > > The new group coordinator will have a state machine per
> > > > > > > *__consumer_offsets* partitions, where each state machine is
> > modelled
> > > > > as an
> > > > > > > event loop. Those state machines will be executed in
> > > > > > > *group.coordinator.threads* threads.
> > > > > > >
> > > > > > > 1.1. I think the state machine are: "Empty, assigning,
> > reconciling,
> > > > > stable,
> > > > > > > dead" mentioned in Consumer Group States section, right?
> > > > > > > 1.2. What do you mean "each state machine is modelled as an event
> > > > > loop"?
> > > > > > > 1.3. Why do we need a state machine per *__consumer_offsets*
> > > > > partitions?
> > > > > > > Not a state machine "per consumer group" owned by a group
> > coordinator?
> > > > > For
> > > > > > > example, if one group coordinator owns 2 consumer groups, and
> > both
> > > > > exist in
> > > > > > > *__consumer_offsets-0*, will we have 1 state machine for it, or
> > 2?
> > > > > > > 1.4. I know the "*group.coordinator.threads" *is the number of
> > threads
> > > > > used
> > > > > > > to run the state machines. But I'm wondering if the purpose of
> > the
> > > > > threads
> > > > > > > is only to keep the state of each consumer group (or
> > > > > *__consumer_offsets*
> > > > > > > partitions?), and no heavy computation, why should we need
> > > > > multi-threads
> > > > > > > here?
> > > > > > >
> > > > > > > 2. For the default value in the new configs:
> > > > > > > 2.1. The consumer session timeout, why does the default session
> > > > > timeout not
> > > > > > > locate between min (45s) and max(60s)? I thought the min/max
> > session
> > > > > > > timeout is to define lower/upper bound of it, no?
> > > > > > >
> > > > > > > group.consumer.session.timeout.ms int 30s The timeout to detect
> > client
> > > > > > > failures when using the consumer group protocol.
> > > > > > > group.consumer.min.session.timeout.ms int 45s The minimum
> > session
> > > > > timeout.
> > > > > > > group.consumer.max.session.timeout.ms int 60s The maximum
> > session
> > > > > timeout.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2.2. The default server side assignor are [range, uniform],
> > which means
> > > > > > > we'll default to "range" assignor. I'd like to know why not
> > uniform
> > > > > one? I
> > > > > > > thought usually users will choose uniform assignor (former sticky
> > > > > assinor)
> > > > > > > for better evenly distribution. Any other reason we choose range
> > > > > assignor
> > > > > > > as default?
> > > > > > > group.consumer.assignors List range, uniform The server side
> > assignors.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 22, 2022 at 2:10 PM Luke Chen <show...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi Sagar,
> > > > > > > >
> > > > > > > > I have some thoughts about Kafka Connect integrating with
> > KIP-848,
> > > > > but I
> > > > > > > > think we should have a separate discussion thread for the Kafka
> > > > > Connect
> > > > > > > > KIP: Integrating Kafka Connect With New Consumer Rebalance
> > Protocol
> > > > > [1],
> > > > > > > > and let this discussion thread focus on consumer rebalance
> > protocol,
> > > > > WDYT?
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Aug 12, 2022 at 9:31 PM Sagar <
> > sagarmeansoc...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> Thank you Guozhang/David for the feedback. Looks like there's
> > > > > agreement on
> > > > > > > >> using separate APIs for Connect. I would revisit the doc and
> > see
> > > > > what
> > > > > > > >> changes are to be made.
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >> Sagar.
> > > > > > > >>
> > > > > > > >> On Tue, Aug 9, 2022 at 7:11 PM David Jacot
> > > > > <dja...@confluent.io.invalid>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Sagar,
> > > > > > > >> >
> > > > > > > >> > Thanks for the feedback and the document. That's really
> > helpful. I
> > > > > > > >> > will take a look at it.
> > > > > > > >> >
> > > > > > > >> > Overall, it seems to me that both Connect and the Consumer
> > could
> > > > > share
> > > > > > > >> > the same underlying "engine". The main difference is that
> > the
> > > > > Consumer
> > > > > > > >> > assigns topic-partitions to members whereas Connect assigns
> > tasks
> > > > > to
> > > > > > > >> > workers. I see two ways to move forward:
> > > > > > > >> > 1) We extend the new proposed APIs to support different
> > resource
> > > > > types
> > > > > > > >> > (e.g. partitions, tasks, etc.); or
> > > > > > > >> > 2) We use new dedicated APIs for Connect. The dedicated APIs
> > > > > would be
> > > > > > > >> > similar to the new ones but different on the
> > content/resources and
> > > > > > > >> > they would rely on the same engine on the coordinator side.
> > > > > > > >> >
> > > > > > > >> > I personally lean towards 2) because I am not a fan of
> > > > > overcharging
> > > > > > > >> > APIs to serve different purposes. That being said, I am not
> > > > > opposed to
> > > > > > > >> > 1) if we can find an elegant way to do it.
> > > > > > > >> >
> > > > > > > >> > I think that we can continue to discuss it here for now in
> > order
> > > > > to
> > > > > > > >> > ensure that this KIP is compatible with what we will do for
> > > > > Connect in
> > > > > > > >> > the future.
> > > > > > > >> >
> > > > > > > >> > Best,
> > > > > > > >> > David
> > > > > > > >> >
> > > > > > > >> > On Mon, Aug 8, 2022 at 2:41 PM David Jacot <
> > dja...@confluent.io>
> > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > Hi all,
> > > > > > > >> > >
> > > > > > > >> > > I am back from vacation. I will go through and address
> > your
> > > > > comments
> > > > > > > >> > > in the coming days. Thanks for your feedback.
> > > > > > > >> > >
> > > > > > > >> > > Cheers,
> > > > > > > >> > > David
> > > > > > > >> > >
> > > > > > > >> > > On Wed, Aug 3, 2022 at 10:05 PM Gregory Harris <
> > > > > gharris1...@gmail.com
> > > > > > > >> >
> > > > > > > >> > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > Hey All!
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks for the KIP, it's wonderful to see cooperative
> > > > > rebalancing
> > > > > > > >> > making it
> > > > > > > >> > > > down the stack!
> > > > > > > >> > > >
> > > > > > > >> > > > I had a few questions:
> > > > > > > >> > > >
> > > > > > > >> > > > 1. The 'Rejected Alternatives' section describes how
> > member
> > > > > epoch
> > > > > > > >> > should
> > > > > > > >> > > > advance in step with the group epoch and assignment
> > epoch
> > > > > values. I
> > > > > > > >> > think
> > > > > > > >> > > > that this is a good idea for the reasons described in
> > the
> > > > > KIP. When
> > > > > > > >> the
> > > > > > > >> > > > protocol is incrementally assigning partitions to a
> > worker,
> > > > > what
> > > > > > > >> member
> > > > > > > >> > > > epoch does each incremental assignment use? Are member
> > epochs
> > > > > > > >> re-used,
> > > > > > > >> > and
> > > > > > > >> > > > a single member epoch can correspond to multiple
> > different
> > > > > > > >> > (monotonically
> > > > > > > >> > > > larger) assignments?
> > > > > > > >> > > >
> > > > > > > >> > > > 2. Is the Assignor's 'Reason' field opaque to the group
> > > > > > > >> coordinator? If
> > > > > > > >> > > > not, should custom client-side assignor implementations
> > > > > interact
> > > > > > > >> with
> > > > > > > >> > the
> > > > > > > >> > > > Reason field, and how is its common meaning agreed
> > upon? If
> > > > > so, what
> > > > > > > >> > is the
> > > > > > > >> > > > benefit of a distinct Reason field over including such
> > > > > functionality
> > > > > > > >> > in the
> > > > > > > >> > > > opaque metadata?
> > > > > > > >> > > >
> > > > > > > >> > > > 3. The following is included in the KIP: "Thanks to
> > this, the
> > > > > input
> > > > > > > >> of
> > > > > > > >> > the
> > > > > > > >> > > > client side assignor is entirely driven by the group
> > > > > coordinator.
> > > > > > > >> The
> > > > > > > >> > > > consumer is no longer responsible for maintaining any
> > state
> > > > > besides
> > > > > > > >> its
> > > > > > > >> > > > assigned partitions." Does this mean that the
> > client-side
> > > > > assignor
> > > > > > > >> MAY
> > > > > > > >> > > > incorporate additional non-Metadata state (such as
> > partition
> > > > > > > >> > throughput,
> > > > > > > >> > > > cpu/memory metrics, config topics, etc), or that
> > additional
> > > > > > > >> > non-Metadata
> > > > > > > >> > > > state SHOULD NOT be used?
> > > > > > > >> > > >
> > > > > > > >> > > > 4. I see that there are separate classes
> > > > > > > >> > > > for
> > org.apache.kafka.server.group.consumer.PartitionAssignor
> > > > > > > >> > > > and org.apache.kafka.clients.consumer.PartitionAssignor
> > that
> > > > > seem to
> > > > > > > >> > > > overlap significantly. Is it possible for these two
> > > > > implementations
> > > > > > > >> to
> > > > > > > >> > be
> > > > > > > >> > > > unified? This would serve to promote feature parity of
> > > > > server-side
> > > > > > > >> and
> > > > > > > >> > > > client-side assignors, and would also facilitate
> > operational
> > > > > > > >> > flexibility in
> > > > > > > >> > > > certain situations. For example, if a server-side
> > assignor
> > > > > has some
> > > > > > > >> > poor
> > > > > > > >> > > > behavior and needs a patch, deploying the patched
> > assignor to
> > > > > the
> > > > > > > >> > client
> > > > > > > >> > > > and switching one consumer group to a client-side
> > assignor
> > > > > may be
> > > > > > > >> > faster
> > > > > > > >> > > > and less risky than patching all of the brokers. With
> > the
> > > > > currently
> > > > > > > >> > > > proposed distinct APIs, a non-trivial reimplementation
> > would
> > > > > have
> > > > > > > >> to be
> > > > > > > >> > > > assembled, and if the two APIs have diverged
> > significantly,
> > > > > then it
> > > > > > > >> is
> > > > > > > >> > > > possible that a reimplementation would not be possible.
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > > Greg Harris
> > > > > > > >> > > > gharris1...@gmail.com
> > > > > > > >> > > > github.com/gharris1727
> > > > > > > >> > > >
> > > > > > > >> > > > On Wed, Aug 3, 2022 at 8:39 AM Sagar <
> > > > > sagarmeansoc...@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi Guozhang/David,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I created a confluence page to discuss how Connect
> > would
> > > > > need to
> > > > > > > >> > change
> > > > > > > >> > > > > based on the new rebalance protocol. Here's the page:
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> >
> > > > > > > >>
> > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol
> > > > > > > >> > > > >
> > > > > > > >> > > > > It's also pretty longish and I have tried to keep a
> > format
> > > > > > > >> similar to
> > > > > > > >> > > > > KIP-848. Let me know what you think. Also, do you
> > think this
> > > > > > > >> should
> > > > > > > >> > be
> > > > > > > >> > > > > moved to a separate discussion thread or is this one
> > fine?
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thanks!
> > > > > > > >> > > > > Sagar.
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Tue, Jul 26, 2022 at 7:37 AM Sagar <
> > > > > sagarmeansoc...@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Hello Guozhang,
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Thank you so much for the doc on Kafka Streams.
> > Sure, I
> > > > > would do
> > > > > > > >> > the
> > > > > > > >> > > > > > analysis and come up with such a document.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Thanks!
> > > > > > > >> > > > > > Sagar.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Tue, Jul 26, 2022 at 4:47 AM Guozhang Wang <
> > > > > > > >> wangg...@gmail.com>
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >> Hello Sagar,
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> It would be great if you could come back with some
> > > > > analysis on
> > > > > > > >> > how to
> > > > > > > >> > > > > >> implement the Connect side integration with the new
> > > > > protocol;
> > > > > > > >> so
> > > > > > > >> > far
> > > > > > > >> > > > > >> besides leveraging on the new "protocol type" we
> > did not
> > > > > yet
> > > > > > > >> think
> > > > > > > >> > > > > through
> > > > > > > >> > > > > >> the Connect side implementations. For Streams
> > here's a
> > > > > draft of
> > > > > > > >> > > > > >> integration
> > > > > > > >> > > > > >> plan:
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >>
> > > > > > > >> > > > >
> > > > > > > >> >
> > > > > > > >>
> > > > >
> > https://docs.google.com/document/d/17PNz2sGoIvGyIzz8vLyJTJTU2rqnD_D9uHJnH9XARjU/edit#heading=h.pdgirmi57dvn
> > > > > > > >> > > > > >> just FYI for your analysis on Connect.
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> On Tue, Jul 19, 2022 at 10:48 PM Sagar <
> > > > > > > >> sagarmeansoc...@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> > Hi David,
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > Thank you for your response. The reason I thought
> > > > > connect can
> > > > > > > >> > also fit
> > > > > > > >> > > > > >> into
> > > > > > > >> > > > > >> > this new scheme is that even today the connect
> > uses a
> > > > > > > >> > > > > WorkerCoordinator
> > > > > > > >> > > > > >> > extending from AbstractCoordinator to empower
> > > > > rebalances of
> > > > > > > >> > > > > >> > tasks/connectors. The WorkerCoordinator sets the
> > > > > > > >> protocolType()
> > > > > > > >> > to
> > > > > > > >> > > > > >> connect
> > > > > > > >> > > > > >> > and uses the metadata() method by plumbing into
> > > > > > > >> > > > > >> JoinGroupRequestProtocol.
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > I think the changes to support connect would be
> > > > > similar at a
> > > > > > > >> > high
> > > > > > > >> > > > > level
> > > > > > > >> > > > > >> to
> > > > > > > >> > > > > >> > the changes in streams mainly because of the
> > Client
> > > > > side
> > > > > > > >> > assignors
> > > > > > > >> > > > > being
> > > > > > > >> > > > > >> > used in both. At an implementation level, we
> > might
> > > > > need to
> > > > > > > >> make
> > > > > > > >> > a lot
> > > > > > > >> > > > > of
> > > > > > > >> > > > > >> > changes to get onto this new assignment protocol
> > like
> > > > > > > >> enhancing
> > > > > > > >> > the
> > > > > > > >> > > > > >> > JoinGroup request/response and SyncGroup and
> > using
> > > > > > > >> > > > > >> ConsumerGroupHeartbeat
> > > > > > > >> > > > > >> > API etc again on similar lines to streams (or
> > there
> > > > > might be
> > > > > > > >> > > > > >> deviations). I
> > > > > > > >> > > > > >> > would try to perform a detailed analysis of the
> > same
> > > > > and we
> > > > > > > >> > can have
> > > > > > > >> > > > > a
> > > > > > > >> > > > > >> > separate discussion thread for that as that would
> > > > > derail this
> > > > > > > >> > > > > discussion
> > > > > > > >> > > > > >> > thread. Let me know if that sounds good to you.
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > Thanks!
> > > > > > > >> > > > > >> > Sagar.
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > On Fri, Jul 15, 2022 at 5:47 PM David Jacot
> > > > > > > >> > > > > <dja...@confluent.io.invalid
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > wrote:
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > > Hi Sagar,
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Thanks for your comments.
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > 1) Yes. That refers to `Assignment#error`.
> > Sure, I
> > > > > can
> > > > > > > >> > mention it.
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > 2) The idea is to transition C from his current
> > > > > assignment
> > > > > > > >> to
> > > > > > > >> > his
> > > > > > > >> > > > > >> > > target assignment when he can move to epoch 3.
> > When
> > > > > that
> > > > > > > >> > happens,
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > member assignment is updated and persisted
> > with all
> > > > > its
> > > > > > > >> > assigned
> > > > > > > >> > > > > >> > > partitions even if they are not all revoked
> > yet. In
> > > > > other
> > > > > > > >> > words, the
> > > > > > > >> > > > > >> > > member assignment becomes the target
> > assignment.
> > > > > This is
> > > > > > > >> > basically
> > > > > > > >> > > > > an
> > > > > > > >> > > > > >> > > optimization to avoid having to write all the
> > > > > changes to
> > > > > > > >> the
> > > > > > > >> > log.
> > > > > > > >> > > > > The
> > > > > > > >> > > > > >> > > examples are based on the persisted state so I
> > > > > understand
> > > > > > > >> the
> > > > > > > >> > > > > >> > > confusion. Let me see if I can improve this in
> > the
> > > > > > > >> > description.
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > 3) Regarding Connect, it could reuse the
> > protocol
> > > > > with a
> > > > > > > >> > client side
> > > > > > > >> > > > > >> > > assignor if it fits in the protocol. The
> > assignment
> > > > > is
> > > > > > > >> about
> > > > > > > >> > > > > >> > > topicid-partitions + metadata, could Connect
> > fit
> > > > > into this?
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Best,
> > > > > > > >> > > > > >> > > David
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > On Fri, Jul 15, 2022 at 1:55 PM Sagar <
> > > > > > > >> > sagarmeansoc...@gmail.com>
> > > > > > > >> > > > > >> wrote:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Hi David,
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks for the KIP. I just had minor
> > observations:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > 1) In the Assignment Error section in Client
> > Side
> > > > > mode
> > > > > > > >> > Assignment
> > > > > > > >> > > > > >> > > process,
> > > > > > > >> > > > > >> > > > you mentioned => `In this case, the client
> > side
> > > > > assignor
> > > > > > > >> can
> > > > > > > >> > > > > return
> > > > > > > >> > > > > >> an
> > > > > > > >> > > > > >> > > > error to the group coordinator`. In this
> > case are
> > > > > you
> > > > > > > >> > referring to
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > Assignor returning an AssignmentError that's
> > > > > listed down
> > > > > > > >> > towards
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > end?
> > > > > > > >> > > > > >> > > > If yes, do you think it would make sense to
> > > > > mention this
> > > > > > > >> > > > > explicitly
> > > > > > > >> > > > > >> > here?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > 2) In the Case Studies section, I have a
> > slight
> > > > > > > >> confusion,
> > > > > > > >> > not
> > > > > > > >> > > > > sure
> > > > > > > >> > > > > >> if
> > > > > > > >> > > > > >> > > > others have the same. Consider this step:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > When B heartbeats, the group coordinator
> > > > > transitions him
> > > > > > > >> to
> > > > > > > >> > epoch
> > > > > > > >> > > > > 3
> > > > > > > >> > > > > >> > > because
> > > > > > > >> > > > > >> > > > B has no partitions to revoke. It persists
> > the
> > > > > change and
> > > > > > > >> > reply.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >    - Group (epoch=3)
> > > > > > > >> > > > > >> > > >       - A
> > > > > > > >> > > > > >> > > >       - B
> > > > > > > >> > > > > >> > > >       - C
> > > > > > > >> > > > > >> > > >    - Target Assignment (epoch=3)
> > > > > > > >> > > > > >> > > >       - A - partitions=[foo-0]
> > > > > > > >> > > > > >> > > >       - B - partitions=[foo-2]
> > > > > > > >> > > > > >> > > >       - C - partitions=[foo-1]
> > > > > > > >> > > > > >> > > >    - Member Assignment
> > > > > > > >> > > > > >> > > >       - A - epoch=2, partitions=[foo-0,
> > foo-1]
> > > > > > > >> > > > > >> > > >       - B - epoch=3, partitions=[foo-2]
> > > > > > > >> > > > > >> > > >       - C - epoch=3, partitions=[foo-1]
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > When C heartbeats, it transitions to epoch 3
> > but
> > > > > cannot
> > > > > > > >> get
> > > > > > > >> > foo-1
> > > > > > > >> > > > > >> yet.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Here,it's mentioned that member C can't get
> > the
> > > > > foo-1
> > > > > > > >> > partition
> > > > > > > >> > > > > yet,
> > > > > > > >> > > > > >> > but
> > > > > > > >> > > > > >> > > > based on the description above, it seems it
> > > > > already has
> > > > > > > >> it.
> > > > > > > >> > Do you
> > > > > > > >> > > > > >> > think
> > > > > > > >> > > > > >> > > it
> > > > > > > >> > > > > >> > > > would be better to remove it and populate it
> > only
> > > > > when it
> > > > > > > >> > actually
> > > > > > > >> > > > > >> gets
> > > > > > > >> > > > > >> > > it?
> > > > > > > >> > > > > >> > > > I see this in a lot of other places, so have
> > I
> > > > > > > >> understood it
> > > > > > > >> > > > > >> > incorrectly
> > > > > > > >> > > > > >> > > ?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Regarding connect , it might be out of scope
> > of
> > > > > this
> > > > > > > >> > discussion,
> > > > > > > >> > > > > but
> > > > > > > >> > > > > >> > from
> > > > > > > >> > > > > >> > > > what I understood it would probably be
> > running in
> > > > > client
> > > > > > > >> > side
> > > > > > > >> > > > > >> assignor
> > > > > > > >> > > > > >> > > mode
> > > > > > > >> > > > > >> > > > even on the new rebalance protocol as it has
> > its
> > > > > own
> > > > > > > >> Custom
> > > > > > > >> > > > > >> > > Assignors(Eager
> > > > > > > >> > > > > >> > > > and IncrementalCooperative).
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks!
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Sagar.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > On Fri, Jul 15, 2022 at 5:00 PM David Jacot
> > > > > > > >> > > > > >> > <dja...@confluent.io.invalid
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > wrote:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > > Thanks Hector! Our goal is to move forward
> > with
> > > > > > > >> > specialized API
> > > > > > > >> > > > > >> > > > > instead of relying on one generic API. For
> > > > > Connect, we
> > > > > > > >> > can apply
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > exact same pattern and reuse/share the core
> > > > > > > >> > implementation on
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > server side. For the schema registry, I
> > think
> > > > > that we
> > > > > > > >> > should
> > > > > > > >> > > > > >> consider
> > > > > > > >> > > > > >> > > > > having a tailored API to do simple
> > > > > membership/leader
> > > > > > > >> > election.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Best,
> > > > > > > >> > > > > >> > > > > David
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > On Fri, Jul 15, 2022 at 10:22 AM Ismael
> > Juma <
> > > > > > > >> > ism...@juma.me.uk
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >> > > wrote:
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Three quick comments:
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 1. Regarding java.util.regex.Pattern vs
> > > > > > > >> > > > > >> com.google.re2j.Pattern, we
> > > > > > > >> > > > > >> > > > > should
> > > > > > > >> > > > > >> > > > > > document the differences in more detail
> > before
> > > > > > > >> deciding
> > > > > > > >> > one
> > > > > > > >> > > > > way
> > > > > > > >> > > > > >> or
> > > > > > > >> > > > > >> > > > > another.
> > > > > > > >> > > > > >> > > > > > That said, if people pass
> > > > > java.util.regex.Pattern,
> > > > > > > >> they
> > > > > > > >> > expect
> > > > > > > >> > > > > >> > their
> > > > > > > >> > > > > >> > > > > > semantics to be honored. If we are doing
> > > > > something
> > > > > > > >> > different,
> > > > > > > >> > > > > >> then
> > > > > > > >> > > > > >> > we
> > > > > > > >> > > > > >> > > > > > should consider adding an overload with
> > our own
> > > > > > > >> Pattern
> > > > > > > >> > class
> > > > > > > >> > > > > (I
> > > > > > > >> > > > > >> > > don't
> > > > > > > >> > > > > >> > > > > > think we'd want to expose re2j's at this
> > > > > point).
> > > > > > > >> > > > > >> > > > > > 2. Regarding topic ids, any major new
> > protocol
> > > > > should
> > > > > > > >> > > > > integrate
> > > > > > > >> > > > > >> > fully
> > > > > > > >> > > > > >> > > > > with
> > > > > > > >> > > > > >> > > > > > it and should handle the topic
> > recreation case
> > > > > > > >> > correctly.
> > > > > > > >> > > > > That's
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > main
> > > > > > > >> > > > > >> > > > > > part we need to handle. I agree with
> > David
> > > > > that we'd
> > > > > > > >> > want to
> > > > > > > >> > > > > add
> > > > > > > >> > > > > >> > > topic
> > > > > > > >> > > > > >> > > > > ids
> > > > > > > >> > > > > >> > > > > > to the relevant protocols that don't
> > have it
> > > > > yet and
> > > > > > > >> > that we
> > > > > > > >> > > > > can
> > > > > > > >> > > > > >> > > probably
> > > > > > > >> > > > > >> > > > > > focus on the internals versus adding new
> > APIs
> > > > > to the
> > > > > > > >> > Java
> > > > > > > >> > > > > >> Consumer
> > > > > > > >> > > > > >> > > > > (unless
> > > > > > > >> > > > > >> > > > > > we find that adding new APIs is required
> > for
> > > > > > > >> reasonable
> > > > > > > >> > > > > >> semantics).
> > > > > > > >> > > > > >> > > > > > 3. I am still not sure about the
> > coordinator
> > > > > storing
> > > > > > > >> the
> > > > > > > >> > > > > >> configs.
> > > > > > > >> > > > > >> > > It's
> > > > > > > >> > > > > >> > > > > > powerful for configs to be centralized
> > in the
> > > > > > > >> metadata
> > > > > > > >> > log for
> > > > > > > >> > > > > >> > > various
> > > > > > > >> > > > > >> > > > > > reasons (auditability, visibility,
> > consistency,
> > > > > > > >> etc.).
> > > > > > > >> > > > > >> Similarly, I
> > > > > > > >> > > > > >> > > am
> > > > > > > >> > > > > >> > > > > not
> > > > > > > >> > > > > >> > > > > > sure about automatically deleting
> > configs in a
> > > > > way
> > > > > > > >> that
> > > > > > > >> > they
> > > > > > > >> > > > > >> cannot
> > > > > > > >> > > > > >> > > be
> > > > > > > >> > > > > >> > > > > > recovered. A good property for modern
> > systems
> > > > > is to
> > > > > > > >> > minimize
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > number
> > > > > > > >> > > > > >> > > > > of
> > > > > > > >> > > > > >> > > > > > unrecoverable data loss scenarios.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Ismael
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > On Wed, Jul 13, 2022 at 3:47 PM David
> > Jacot
> > > > > > > >> > > > > >> > > <dja...@confluent.io.invalid
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > wrote:
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > > Thanks Guozhang. My answers are below:
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > 1) the migration path, especially
> > the last
> > > > > step
> > > > > > > >> when
> > > > > > > >> > > > > clients
> > > > > > > >> > > > > >> > > flip the
> > > > > > > >> > > > > >> > > > > > > flag
> > > > > > > >> > > > > >> > > > > > > > to enable the new protocol, in which
> > we
> > > > > would
> > > > > > > >> have a
> > > > > > > >> > > > > window
> > > > > > > >> > > > > >> > where
> > > > > > > >> > > > > >> > > > > both
> > > > > > > >> > > > > >> > > > > > > new
> > > > > > > >> > > > > >> > > > > > > > protocols / rpcs and old protocols /
> > rpcs
> > > > > are
> > > > > > > >> used
> > > > > > > >> > by
> > > > > > > >> > > > > >> members
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > same
> > > > > > > >> > > > > >> > > > > > > > group. How the coordinator could
> > "mimic"
> > > > > the old
> > > > > > > >> > behavior
> > > > > > > >> > > > > >> while
> > > > > > > >> > > > > >> > > > > using the
> > > > > > > >> > > > > >> > > > > > > > new protocol is something we need to
> > > > > present
> > > > > > > >> about.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Noted. I just published a new version
> > of KIP
> > > > > which
> > > > > > > >> > includes
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > > > > details about this. See the "Supporting
> > > > > Online
> > > > > > > >> > Consumer
> > > > > > > >> > > > > Group
> > > > > > > >> > > > > >> > > Upgrade"
> > > > > > > >> > > > > >> > > > > > > and the "Compatibility, Deprecation,
> > and
> > > > > Migration
> > > > > > > >> > Plan". I
> > > > > > > >> > > > > >> think
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > > > I have to think through a few cases
> > now but
> > > > > the
> > > > > > > >> > overall idea
> > > > > > > >> > > > > >> and
> > > > > > > >> > > > > >> > > > > > > mechanism should be understandable.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > 2) the usage of topic ids. So far as
> > > > > KIP-516 the
> > > > > > > >> > topic ids
> > > > > > > >> > > > > >> are
> > > > > > > >> > > > > >> > > only
> > > > > > > >> > > > > >> > > > > used
> > > > > > > >> > > > > >> > > > > > > as
> > > > > > > >> > > > > >> > > > > > > > part of RPCs and admin client, but
> > they
> > > > > are not
> > > > > > > >> > exposed
> > > > > > > >> > > > > via
> > > > > > > >> > > > > >> any
> > > > > > > >> > > > > >> > > > > public
> > > > > > > >> > > > > >> > > > > > > APIs
> > > > > > > >> > > > > >> > > > > > > > to consumers yet. I think the
> > question is,
> > > > > first
> > > > > > > >> > should we
> > > > > > > >> > > > > >> let
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > consumer
> > > > > > > >> > > > > >> > > > > > > > client to be maintaining the names
> > -> ids
> > > > > mapping
> > > > > > > >> > itself
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > fully
> > > > > > > >> > > > > >> > > > > > > leverage
> > > > > > > >> > > > > >> > > > > > > > on all the augmented existing RPCs
> > and the
> > > > > new
> > > > > > > >> RPCs
> > > > > > > >> > with
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > topic
> > > > > > > >> > > > > >> > > > > ids;
> > > > > > > >> > > > > >> > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > secondly, should we ever consider
> > exposing
> > > > > the
> > > > > > > >> > topic ids
> > > > > > > >> > > > > in
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > consumer
> > > > > > > >> > > > > >> > > > > > > > public APIs as well (both
> > > > > subscribe/assign, as
> > > > > > > >> well
> > > > > > > >> > as in
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > rebalance
> > > > > > > >> > > > > >> > > > > > > > listener for cases like topic
> > > > > > > >> > deletion-and-recreation).
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > a) Assuming that we would include
> > converting
> > > > > all
> > > > > > > >> the
> > > > > > > >> > offsets
> > > > > > > >> > > > > >> > > related
> > > > > > > >> > > > > >> > > > > > > RPCs to using topic ids in this KIP,
> > the
> > > > > consumer
> > > > > > > >> > would be
> > > > > > > >> > > > > >> able
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > > > fully operate with topic ids. That
> > being
> > > > > said, it
> > > > > > > >> > still has
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > > provide
> > > > > > > >> > > > > >> > > > > > > the topics names in various APIs so
> > having a
> > > > > > > >> mapping
> > > > > > > >> > in the
> > > > > > > >> > > > > >> > > consumer
> > > > > > > >> > > > > >> > > > > > > seems inevitable to me.
> > > > > > > >> > > > > >> > > > > > > b) I don't have a strong opinion on
> > this.
> > > > > Here I
> > > > > > > >> > wonder if
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > > goes
> > > > > > > >> > > > > >> > > > > > > beyond the scope of this KIP. I would
> > rather
> > > > > focus
> > > > > > > >> on
> > > > > > > >> > the
> > > > > > > >> > > > > >> > internals
> > > > > > > >> > > > > >> > > > > > > here and we can consider this
> > separately if
> > > > > we see
> > > > > > > >> > value in
> > > > > > > >> > > > > >> doing
> > > > > > > >> > > > > >> > > it.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Coming back to Ismael's point about
> > using
> > > > > topic ids
> > > > > > > >> > in the
> > > > > > > >> > > > > >> > > > > > > ConsumerGroupHeartbeatRequest, I think
> > that
> > > > > there
> > > > > > > >> is
> > > > > > > >> > one
> > > > > > > >> > > > > >> > advantage
> > > > > > > >> > > > > >> > > in
> > > > > > > >> > > > > >> > > > > > > favour of it. The consumer will have
> > the
> > > > > > > >> opportunity
> > > > > > > >> > to
> > > > > > > >> > > > > >> validate
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > > > the topics exists before passing them
> > into
> > > > > the
> > > > > > > >> group
> > > > > > > >> > > > > rebalance
> > > > > > > >> > > > > >> > > > > > > protocol. Obviously, the coordinator
> > will
> > > > > also
> > > > > > > >> notice
> > > > > > > >> > it but
> > > > > > > >> > > > > >> it
> > > > > > > >> > > > > >> > > does
> > > > > > > >> > > > > >> > > > > > > not really have a way to reject an
> > invalid
> > > > > topic in
> > > > > > > >> > the
> > > > > > > >> > > > > >> response.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > I'm agreeing with David on all other
> > minor
> > > > > > > >> questions
> > > > > > > >> > > > > except
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > `subscribe(Pattern)` question:
> > personally
> > > > > I think
> > > > > > > >> > it's not
> > > > > > > >> > > > > >> > > necessary
> > > > > > > >> > > > > >> > > > > to
> > > > > > > >> > > > > >> > > > > > > > deprecate the subscribe API with
> > Pattern,
> > > > > but
> > > > > > > >> > instead we
> > > > > > > >> > > > > >> still
> > > > > > > >> > > > > >> > > use
> > > > > > > >> > > > > >> > > > > > > Pattern
> > > > > > > >> > > > > >> > > > > > > > while just documenting that our
> > > > > subscription may
> > > > > > > >> be
> > > > > > > >> > > > > >> rejected by
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > server.
> > > > > > > >> > > > > >> > > > > > > > Since the incompatible case is a
> > very rare
> > > > > > > >> scenario
> > > > > > > >> > I felt
> > > > > > > >> > > > > >> > using
> > > > > > > >> > > > > >> > > an
> > > > > > > >> > > > > >> > > > > > > > overloaded `String` based
> > subscription may
> > > > > be
> > > > > > > >> more
> > > > > > > >> > > > > >> vulnerable
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > various
> > > > > > > >> > > > > >> > > > > > > > invalid regexes.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > That could work. I have to look at the
> > > > > differences
> > > > > > > >> > between
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > two
> > > > > > > >> > > > > >> > > > > > > engines to better understand the
> > potential
> > > > > issues.
> > > > > > > >> My
> > > > > > > >> > > > > >> > > understanding is
> > > > > > > >> > > > > >> > > > > > > that would work for all the basic
> > regular
> > > > > > > >> > expressions. The
> > > > > > > >> > > > > >> > > differences
> > > > > > > >> > > > > >> > > > > > > between the two are mainly about the
> > various
> > > > > > > >> character
> > > > > > > >> > > > > >> classes. I
> > > > > > > >> > > > > >> > > > > > > wonder what other people think about
> > this.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Best,
> > > > > > > >> > > > > >> > > > > > > David
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > On Tue, Jul 12, 2022 at 11:28 PM
> > Guozhang
> > > > > Wang <
> > > > > > > >> > > > > >> > wangg...@gmail.com
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Thanks David! I think on the high
> > level
> > > > > there are
> > > > > > > >> > two meta
> > > > > > > >> > > > > >> > > points we
> > > > > > > >> > > > > >> > > > > need
> > > > > > > >> > > > > >> > > > > > > > to concretize a bit more:
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > 1) the migration path, especially
> > the last
> > > > > step
> > > > > > > >> when
> > > > > > > >> > > > > clients
> > > > > > > >> > > > > >> > > flip the
> > > > > > > >> > > > > >> > > > > > > flag
> > > > > > > >> > > > > >> > > > > > > > to enable the new protocol, in which
> > we
> > > > > would
> > > > > > > >> have a
> > > > > > > >> > > > > window
> > > > > > > >> > > > > >> > where
> > > > > > > >> > > > > >> > > > > both
> > > > > > > >> > > > > >> > > > > > > new
> > > > > > > >> > > > > >> > > > > > > > protocols / rpcs and old protocols /
> > rpcs
> > > > > are
> > > > > > > >> used
> > > > > > > >> > by
> > > > > > > >> > > > > >> members
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > same
> > > > > > > >> > > > > >> > > > > > > > group. How the coordinator could
> > "mimic"
> > > > > the old
> > > > > > > >> > behavior
> > > > > > > >> > > > > >> while
> > > > > > > >> > > > > >> > > > > using the
> > > > > > > >> > > > > >> > > > > > > > new protocol is something we need to
> > > > > present
> > > > > > > >> about.
> > > > > > > >> > > > > >> > > > > > > > 2) the usage of topic ids. So far as
> > > > > KIP-516 the
> > > > > > > >> > topic ids
> > > > > > > >> > > > > >> are
> > > > > > > >> > > > > >> > > only
> > > > > > > >> > > > > >> > > > > used
> > > > > > > >> > > > > >> > > > > > > as
> > > > > > > >> > > > > >> > > > > > > > part of RPCs and admin client, but
> > they
> > > > > are not
> > > > > > > >> > exposed
> > > > > > > >> > > > > via
> > > > > > > >> > > > > >> any
> > > > > > > >> > > > > >> > > > > public
> > > > > > > >> > > > > >> > > > > > > APIs
> > > > > > > >> > > > > >> > > > > > > > to consumers yet. I think the
> > question is,
> > > > > first
> > > > > > > >> > should we
> > > > > > > >> > > > > >> let
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > consumer
> > > > > > > >> > > > > >> > > > > > > > client to be maintaining the names
> > -> ids
> > > > > mapping
> > > > > > > >> > itself
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > fully
> > > > > > > >> > > > > >> > > > > > > leverage
> > > > > > > >> > > > > >> > > > > > > > on all the augmented existing RPCs
> > and the
> > > > > new
> > > > > > > >> RPCs
> > > > > > > >> > with
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > topic
> > > > > > > >> > > > > >> > > > > ids;
> > > > > > > >> > > > > >> > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > secondly, should we ever consider
> > exposing
> > > > > the
> > > > > > > >> > topic ids
> > > > > > > >> > > > > in
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > consumer
> > > > > > > >> > > > > >> > > > > > > > public APIs as well (both
> > > > > subscribe/assign, as
> > > > > > > >> well
> > > > > > > >> > as in
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > rebalance
> > > > > > > >> > > > > >> > > > > > > > listener for cases like topic
> > > > > > > >> > deletion-and-recreation).
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > I'm agreeing with David on all other
> > minor
> > > > > > > >> questions
> > > > > > > >> > > > > except
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > `subscribe(Pattern)` question:
> > personally
> > > > > I think
> > > > > > > >> > it's not
> > > > > > > >> > > > > >> > > necessary
> > > > > > > >> > > > > >> > > > > to
> > > > > > > >> > > > > >> > > > > > > > deprecate the subscribe API with
> > Pattern,
> > > > > but
> > > > > > > >> > instead we
> > > > > > > >> > > > > >> still
> > > > > > > >> > > > > >> > > use
> > > > > > > >> > > > > >> > > > > > > Pattern
> > > > > > > >> > > > > >> > > > > > > > while just documenting that our
> > > > > subscription may
> > > > > > > >> be
> > > > > > > >> > > > > >> rejected by
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > server.
> > > > > > > >> > > > > >> > > > > > > > Since the incompatible case is a
> > very rare
> > > > > > > >> scenario
> > > > > > > >> > I felt
> > > > > > > >> > > > > >> > using
> > > > > > > >> > > > > >> > > an
> > > > > > > >> > > > > >> > > > > > > > overloaded `String` based
> > subscription may
> > > > > be
> > > > > > > >> more
> > > > > > > >> > > > > >> vulnerable
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > various
> > > > > > > >> > > > > >> > > > > > > > invalid regexes.
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Guozhang
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > On Tue, Jul 12, 2022 at 5:23 AM
> > David Jacot
> > > > > > > >> > > > > >> > > > > <dja...@confluent.io.invalid
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Hi Ismael,
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Thanks for your feedback. Let me
> > answer
> > > > > your
> > > > > > > >> > questions
> > > > > > > >> > > > > >> > inline.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 1. I think it's premature to
> > talk about
> > > > > > > >> target
> > > > > > > >> > > > > versions
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > > > > > deprecation
> > > > > > > >> > > > > >> > > > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > > > removal of the existing group
> > protocol.
> > > > > > > >> Unlike
> > > > > > > >> > KRaft,
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > > > > affects a
> > > > > > > >> > > > > >> > > > > > > core
> > > > > > > >> > > > > >> > > > > > > > > > client protocol and hence
> > > > > deprecation/removal
> > > > > > > >> > will be
> > > > > > > >> > > > > >> > heavily
> > > > > > > >> > > > > >> > > > > > > dependent
> > > > > > > >> > > > > >> > > > > > > > > on
> > > > > > > >> > > > > >> > > > > > > > > > how quickly applications migrate
> > to
> > > > > the new
> > > > > > > >> > protocol.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > That makes sense. I will remove it.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 2. The KIP says we intend to
> > release
> > > > > this in
> > > > > > > >> > 4.x, but
> > > > > > > >> > > > > it
> > > > > > > >> > > > > >> > > wasn't
> > > > > > > >> > > > > >> > > > > made
> > > > > > > >> > > > > >> > > > > > > > > clear
> > > > > > > >> > > > > >> > > > > > > > > > why. If we added that as a way to
> > > > > estimate
> > > > > > > >> when
> > > > > > > >> > we'd
> > > > > > > >> > > > > >> > > deprecate
> > > > > > > >> > > > > >> > > > > and
> > > > > > > >> > > > > >> > > > > > > remove
> > > > > > > >> > > > > >> > > > > > > > > > the group protocol, I also
> > suggest
> > > > > removing
> > > > > > > >> > this part.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Let me explain my reasoning. As
> > > > > explained, I
> > > > > > > >> plan
> > > > > > > >> > to
> > > > > > > >> > > > > >> rewrite
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > group
> > > > > > > >> > > > > >> > > > > > > > > coordinator in Java while we
> > implement
> > > > > the new
> > > > > > > >> > protocol.
> > > > > > > >> > > > > >> This
> > > > > > > >> > > > > >> > > means
> > > > > > > >> > > > > >> > > > > > > > > that the internals will be slightly
> > > > > different
> > > > > > > >> > (e.g.
> > > > > > > >> > > > > >> threading
> > > > > > > >> > > > > >> > > > > model).
> > > > > > > >> > > > > >> > > > > > > > > Therefore, I wanted to tighten the
> > > > > switch from
> > > > > > > >> > the old
> > > > > > > >> > > > > >> group
> > > > > > > >> > > > > >> > > > > > > > > coordinator to the new group
> > coordinator
> > > > > to a
> > > > > > > >> > major
> > > > > > > >> > > > > >> release.
> > > > > > > >> > > > > >> > > The
> > > > > > > >> > > > > >> > > > > > > > > alternative would be to use a flag
> > to do
> > > > > the
> > > > > > > >> > switch
> > > > > > > >> > > > > >> instead
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > > > relying
> > > > > > > >> > > > > >> > > > > > > > > on the software upgrade.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 3. We need to flesh out the
> > details of
> > > > > the
> > > > > > > >> > migration
> > > > > > > >> > > > > >> story.
> > > > > > > >> > > > > >> > > It
> > > > > > > >> > > > > >> > > > > sounds
> > > > > > > >> > > > > >> > > > > > > > > like
> > > > > > > >> > > > > >> > > > > > > > > > we're saying we will support
> > online
> > > > > > > >> migrations.
> > > > > > > >> > Is
> > > > > > > >> > > > > that
> > > > > > > >> > > > > >> > > correct?
> > > > > > > >> > > > > >> > > > > We
> > > > > > > >> > > > > >> > > > > > > > > should
> > > > > > > >> > > > > >> > > > > > > > > > explain this in detail. It could
> > also
> > > > > be done
> > > > > > > >> > as a
> > > > > > > >> > > > > >> separate
> > > > > > > >> > > > > >> > > KIP,
> > > > > > > >> > > > > >> > > > > if
> > > > > > > >> > > > > >> > > > > > > it's
> > > > > > > >> > > > > >> > > > > > > > > > easier.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Yes, we will support online
> > migrations
> > > > > for the
> > > > > > > >> > group.
> > > > > > > >> > > > > That
> > > > > > > >> > > > > >> > > means
> > > > > > > >> > > > > >> > > > > that
> > > > > > > >> > > > > >> > > > > > > > > a group using the old protocol
> > will be
> > > > > able to
> > > > > > > >> > switch to
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > new
> > > > > > > >> > > > > >> > > > > > > > > protocol.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Let me briefly explain how that
> > will work
> > > > > > > >> though.
> > > > > > > >> > It is
> > > > > > > >> > > > > >> > > basically a
> > > > > > > >> > > > > >> > > > > > > > > four step process:
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > 1. The cluster must be upgraded or
> > > > > rolled to a
> > > > > > > >> > software
> > > > > > > >> > > > > >> > > supporting
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > new group coordinator. Both the
> > old and
> > > > > the new
> > > > > > > >> > > > > >> coordinator
> > > > > > > >> > > > > >> > > will
> > > > > > > >> > > > > >> > > > > > > > > support the old protocol and rely
> > on the
> > > > > same
> > > > > > > >> > persisted
> > > > > > > >> > > > > >> > > metadata so
> > > > > > > >> > > > > >> > > > > > > > > they can work together. This point
> > is an
> > > > > > > >> offline
> > > > > > > >> > > > > >> migration.
> > > > > > > >> > > > > >> > We
> > > > > > > >> > > > > >> > > > > cannot
> > > > > > > >> > > > > >> > > > > > > > > do this one live because it would
> > require
> > > > > > > >> > shutting down
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > current
> > > > > > > >> > > > > >> > > > > > > > > coordinator and starting up the
> > new one
> > > > > and
> > > > > > > >> that
> > > > > > > >> > would
> > > > > > > >> > > > > >> cause
> > > > > > > >> > > > > >> > > > > > > > > unavailabilities.
> > > > > > > >> > > > > >> > > > > > > > > 2. The cluster's metadata
> > version/IBP
> > > > > must be
> > > > > > > >> > upgraded
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> X
> > > > > > > >> > > > > >> > in
> > > > > > > >> > > > > >> > > > > order
> > > > > > > >> > > > > >> > > > > > > > > to enable the new protocol. This
> > cannot
> > > > > be done
> > > > > > > >> > before
> > > > > > > >> > > > > 1)
> > > > > > > >> > > > > >> is
> > > > > > > >> > > > > >> > > > > > > > > terminated because the old
> > coordinator
> > > > > doesn't
> > > > > > > >> > support
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > new
> > > > > > > >> > > > > >> > > > > > > > > protocol.
> > > > > > > >> > > > > >> > > > > > > > > 3. The consumers must be upgraded
> > to a
> > > > > version
> > > > > > > >> > > > > supporting
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > online
> > > > > > > >> > > > > >> > > > > > > > > migration (must have KIP-792). If
> > the
> > > > > consumer
> > > > > > > >> is
> > > > > > > >> > > > > already
> > > > > > > >> > > > > >> > > there.
> > > > > > > >> > > > > >> > > > > > > > > Nothing must be done at this point.
> > > > > > > >> > > > > >> > > > > > > > > 4. The consumers must be rolled
> > with the
> > > > > > > >> feature
> > > > > > > >> > flag
> > > > > > > >> > > > > >> turned
> > > > > > > >> > > > > >> > > on.
> > > > > > > >> > > > > >> > > > > The
> > > > > > > >> > > > > >> > > > > > > > > consumer group is automatically
> > > > > converted when
> > > > > > > >> > the first
> > > > > > > >> > > > > >> > > consumer
> > > > > > > >> > > > > >> > > > > > > > > using the new protocol joins the
> > group.
> > > > > While
> > > > > > > >> the
> > > > > > > >> > > > > members
> > > > > > > >> > > > > >> > > using the
> > > > > > > >> > > > > >> > > > > > > > > old protocol are being upgraded,
> > the old
> > > > > > > >> protocol
> > > > > > > >> > is
> > > > > > > >> > > > > >> proxied
> > > > > > > >> > > > > >> > > into
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > new one.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Let me clarify all of this in the
> > KIP.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 4. I am happy that we are
> > pushing the
> > > > > pattern
> > > > > > > >> > > > > >> subscriptions
> > > > > > > >> > > > > >> > > to
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > server,
> > > > > > > >> > > > > >> > > > > > > > > > but it seems like there could be
> > some
> > > > > tricky
> > > > > > > >> > > > > >> compatibility
> > > > > > > >> > > > > >> > > > > issues.
> > > > > > > >> > > > > >> > > > > > > Will
> > > > > > > >> > > > > >> > > > > > > > > we
> > > > > > > >> > > > > >> > > > > > > > > > have a mechanism for users to
> > detect
> > > > > that
> > > > > > > >> they
> > > > > > > >> > need to
> > > > > > > >> > > > > >> > update
> > > > > > > >> > > > > >> > > > > their
> > > > > > > >> > > > > >> > > > > > > regex
> > > > > > > >> > > > > >> > > > > > > > > > before switching to the new
> > protocol?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > I think that I am a bit more
> > optimistic
> > > > > than
> > > > > > > >> you
> > > > > > > >> > on this
> > > > > > > >> > > > > >> > > point. I
> > > > > > > >> > > > > >> > > > > > > > > believe that the majority of the
> > cases
> > > > > are
> > > > > > > >> simple
> > > > > > > >> > > > > regexes
> > > > > > > >> > > > > >> > which
> > > > > > > >> > > > > >> > > > > should
> > > > > > > >> > > > > >> > > > > > > > > work with the new engine. The
> > > > > coordinator will
> > > > > > > >> > verify
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > regex
> > > > > > > >> > > > > >> > > > > anyway
> > > > > > > >> > > > > >> > > > > > > > > and reject the consumer if the
> > regex is
> > > > > not
> > > > > > > >> valid.
> > > > > > > >> > > > > Coming
> > > > > > > >> > > > > >> > back
> > > > > > > >> > > > > >> > > to
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > migration path, in the worst case,
> > the
> > > > > first
> > > > > > > >> > upgraded
> > > > > > > >> > > > > >> > consumer
> > > > > > > >> > > > > >> > > > > joining
> > > > > > > >> > > > > >> > > > > > > > > the group will be rejected. This
> > should
> > > > > be used
> > > > > > > >> > as the
> > > > > > > >> > > > > >> last
> > > > > > > >> > > > > >> > > > > defence, I
> > > > > > > >> > > > > >> > > > > > > > > would say.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > One way for customers to validate
> > their
> > > > > regex
> > > > > > > >> > before
> > > > > > > >> > > > > >> > upgrading
> > > > > > > >> > > > > >> > > > > their
> > > > > > > >> > > > > >> > > > > > > > > prod would be to test them with
> > another
> > > > > group.
> > > > > > > >> For
> > > > > > > >> > > > > >> instance,
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > > > > > could be done in a pre-prod
> > environment.
> > > > > > > >> Another
> > > > > > > >> > way
> > > > > > > >> > > > > >> would be
> > > > > > > >> > > > > >> > > to
> > > > > > > >> > > > > >> > > > > > > > > extend the consumer-group tool to
> > > > > provide a
> > > > > > > >> regex
> > > > > > > >> > > > > >> validation
> > > > > > > >> > > > > >> > > > > > > > > mechanism. Would this be enough in
> > your
> > > > > > > >> opinion?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 5. Related to the last question,
> > will
> > > > > the
> > > > > > > >> Java
> > > > > > > >> > client
> > > > > > > >> > > > > >> allow
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > users to
> > > > > > > >> > > > > >> > > > > > > > > > stick with the current regex
> > engine for
> > > > > > > >> > compatibility
> > > > > > > >> > > > > >> > > reasons?
> > > > > > > >> > > > > >> > > > > For
> > > > > > > >> > > > > >> > > > > > > > > example,
> > > > > > > >> > > > > >> > > > > > > > > > it may be handy to keep using
> > client
> > > > > based
> > > > > > > >> > regex at
> > > > > > > >> > > > > >> first
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > keep
> > > > > > > >> > > > > >> > > > > > > > > > migrations simple and then
> > migrate to
> > > > > server
> > > > > > > >> > based
> > > > > > > >> > > > > >> regexes
> > > > > > > >> > > > > >> > > as a
> > > > > > > >> > > > > >> > > > > > > second
> > > > > > > >> > > > > >> > > > > > > > > step.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > I understand your point but I am
> > > > > concerned that
> > > > > > > >> > this
> > > > > > > >> > > > > would
> > > > > > > >> > > > > >> > > allow
> > > > > > > >> > > > > >> > > > > users
> > > > > > > >> > > > > >> > > > > > > > > to actually stay in this mode. That
> > > > > would go
> > > > > > > >> > against our
> > > > > > > >> > > > > >> goal
> > > > > > > >> > > > > >> > > of
> > > > > > > >> > > > > >> > > > > > > > > simplifying the client because we
> > would
> > > > > have to
> > > > > > > >> > continue
> > > > > > > >> > > > > >> > > monitoring
> > > > > > > >> > > > > >> > > > > > > > > the metadata on the client side. I
> > would
> > > > > rather
> > > > > > > >> > not do
> > > > > > > >> > > > > >> this.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 6. When we say that the group
> > > > > coordinator
> > > > > > > >> will
> > > > > > > >> > be
> > > > > > > >> > > > > >> > > responsible for
> > > > > > > >> > > > > >> > > > > > > storing
> > > > > > > >> > > > > >> > > > > > > > > > the configurations and that the
> > > > > > > >> configurations
> > > > > > > >> > will be
> > > > > > > >> > > > > >> > > deleted
> > > > > > > >> > > > > >> > > > > when
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > group is deleted. Will a
> > transition to
> > > > > DEAD
> > > > > > > >> > trigger
> > > > > > > >> > > > > >> > deletion
> > > > > > > >> > > > > >> > > of
> > > > > > > >> > > > > >> > > > > > > > > > configurations?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > That's right. The configurations
> > will be
> > > > > > > >> deleted
> > > > > > > >> > when
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > group is
> > > > > > > >> > > > > >> > > > > > > > > deleted. They go together.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 7. Will the choice to store the
> > > > > configs in
> > > > > > > >> the
> > > > > > > >> > group
> > > > > > > >> > > > > >> > > coordinator
> > > > > > > >> > > > > >> > > > > > > make it
> > > > > > > >> > > > > >> > > > > > > > > > harder to list all cluster
> > configs and
> > > > > their
> > > > > > > >> > values?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > I don't think so. The group
> > > > > configurations are
> > > > > > > >> > overrides
> > > > > > > >> > > > > >> of
> > > > > > > >> > > > > >> > > cluster
> > > > > > > >> > > > > >> > > > > > > > > configs. If you want to know all
> > the
> > > > > overrides
> > > > > > > >> > though,
> > > > > > > >> > > > > you
> > > > > > > >> > > > > >> > > would
> > > > > > > >> > > > > >> > > > > have
> > > > > > > >> > > > > >> > > > > > > > > to ask all the group coordinators.
> > You
> > > > > cannot
> > > > > > > >> > rely on
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > metadata
> > > > > > > >> > > > > >> > > > > log
> > > > > > > >> > > > > >> > > > > > > > > for instance.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 8. How would someone configure a
> > group
> > > > > before
> > > > > > > >> > starting
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > consumers?
> > > > > > > >> > > > > >> > > > > > > > > Have
> > > > > > > >> > > > > >> > > > > > > > > > we considered allowing the
> > explicit
> > > > > creation
> > > > > > > >> of
> > > > > > > >> > > > > groups?
> > > > > > > >> > > > > >> > > > > > > Alternatively,
> > > > > > > >> > > > > >> > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > configs could be decoupled from
> > the
> > > > > group
> > > > > > > >> > lifecycle.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Yes. The group will be
> > automatically
> > > > > created in
> > > > > > > >> > this
> > > > > > > >> > > > > case.
> > > > > > > >> > > > > >> > > However,
> > > > > > > >> > > > > >> > > > > > > > > the configs will be lost after the
> > > > > retention
> > > > > > > >> > period of
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > group
> > > > > > > >> > > > > >> > > > > > > > > passes.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 9. Will the Consumer.subscribe
> > method
> > > > > for the
> > > > > > > >> > Java
> > > > > > > >> > > > > >> client
> > > > > > > >> > > > > >> > > still
> > > > > > > >> > > > > >> > > > > take
> > > > > > > >> > > > > >> > > > > > > a
> > > > > > > >> > > > > >> > > > > > > > > > `java.util.regex.Pattern` of do
> > we
> > > > > have to
> > > > > > > >> > introduce
> > > > > > > >> > > > > an
> > > > > > > >> > > > > >> > > overload?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > That's a very group question. I
> > forgot
> > > > > about
> > > > > > > >> that
> > > > > > > >> > one.
> > > > > > > >> > > > > As
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > > > > > `java.util.regex.Pattern` is not
> > fully
> > > > > > > >> compatible
> > > > > > > >> > with
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > engine
> > > > > > > >> > > > > >> > > > > that
> > > > > > > >> > > > > >> > > > > > > > > we plan to use, it might be better
> > to
> > > > > deprecate
> > > > > > > >> > it and
> > > > > > > >> > > > > >> use an
> > > > > > > >> > > > > >> > > > > overload
> > > > > > > >> > > > > >> > > > > > > > > which takes a string. We would
> > rely on
> > > > > the
> > > > > > > >> server
> > > > > > > >> > side
> > > > > > > >> > > > > >> > > validation.
> > > > > > > >> > > > > >> > > > > > > > > During the migration, I think that
> > we
> > > > > could
> > > > > > > >> still
> > > > > > > >> > try to
> > > > > > > >> > > > > >> > > toString
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > regex and use it. That should
> > work, I
> > > > > think, in
> > > > > > > >> > the
> > > > > > > >> > > > > >> majority
> > > > > > > >> > > > > >> > > of the
> > > > > > > >> > > > > >> > > > > > > > > cases.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 10. I agree with Justine that we
> > > > > should be
> > > > > > > >> > clearer
> > > > > > > >> > > > > about
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > > > reason
> > > > > > > >> > > > > >> > > > > > > to
> > > > > > > >> > > > > >> > > > > > > > > > switch to IBP/metadata.version
> > from the
> > > > > > > >> feature
> > > > > > > >> > flag.
> > > > > > > >> > > > > >> Maybe
> > > > > > > >> > > > > >> > > we
> > > > > > > >> > > > > >> > > > > mean
> > > > > > > >> > > > > >> > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > we
> > > > > > > >> > > > > >> > > > > > > > > > can switch the default for the
> > feature
> > > > > flag
> > > > > > > >> to
> > > > > > > >> > true
> > > > > > > >> > > > > >> based
> > > > > > > >> > > > > >> > on
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > > > metadata.version once we want to
> > make
> > > > > it the
> > > > > > > >> > default.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > My plan was to use that feature
> > flag
> > > > > mainly
> > > > > > > >> > during the
> > > > > > > >> > > > > >> > > development
> > > > > > > >> > > > > >> > > > > > > > > phase. I should not have mentioned
> > it, I
> > > > > think,
> > > > > > > >> > because
> > > > > > > >> > > > > we
> > > > > > > >> > > > > >> > > could
> > > > > > > >> > > > > >> > > > > use
> > > > > > > >> > > > > >> > > > > > > > > an internal config for it.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 11. Some of the protocol APIs
> > don't
> > > > > mention
> > > > > > > >> the
> > > > > > > >> > > > > required
> > > > > > > >> > > > > >> > > ACLs, it
> > > > > > > >> > > > > >> > > > > > > would
> > > > > > > >> > > > > >> > > > > > > > > be
> > > > > > > >> > > > > >> > > > > > > > > > good to add that for consistency.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Noted.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 12. It is a bit odd that
> > > > > > > >> ConsumerGroupHeartbeat
> > > > > > > >> > > > > requires
> > > > > > > >> > > > > >> > > "Read
> > > > > > > >> > > > > >> > > > > Group"
> > > > > > > >> > > > > >> > > > > > > > > even
> > > > > > > >> > > > > >> > > > > > > > > > though it seems to do more than
> > > > > reading.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > I agree. This is how the current
> > > > > protocol works
> > > > > > > >> > though.
> > > > > > > >> > > > > We
> > > > > > > >> > > > > >> > only
> > > > > > > >> > > > > >> > > > > > > > > require "Read Group" to join a
> > group. We
> > > > > could
> > > > > > > >> > consider
> > > > > > > >> > > > > >> > > changing
> > > > > > > >> > > > > >> > > > > this
> > > > > > > >> > > > > >> > > > > > > > > but I am not sure that it is worth
> > it.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 13. How is topic recreation
> > handled by
> > > > > the
> > > > > > > >> > consumer
> > > > > > > >> > > > > with
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > new
> > > > > > > >> > > > > >> > > > > > > group
> > > > > > > >> > > > > >> > > > > > > > > > protocol? It would be good to
> > have a
> > > > > section
> > > > > > > >> on
> > > > > > > >> > this.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Noted. From a protocol
> > perspective, the
> > > > > new
> > > > > > > >> topic
> > > > > > > >> > will
> > > > > > > >> > > > > >> have a
> > > > > > > >> > > > > >> > > new
> > > > > > > >> > > > > >> > > > > > > > > topic id so it will treat it like a
> > > > > topic with
> > > > > > > >> a
> > > > > > > >> > > > > different
> > > > > > > >> > > > > >> > > name.
> > > > > > > >> > > > > >> > > > > The
> > > > > > > >> > > > > >> > > > > > > > > only issue is that the fetch/commit
> > > > > offsets
> > > > > > > >> APIs
> > > > > > > >> > do not
> > > > > > > >> > > > > >> > support
> > > > > > > >> > > > > >> > > > > topic
> > > > > > > >> > > > > >> > > > > > > > > IDs so the consumer would reuse the
> > > > > offsets
> > > > > > > >> based
> > > > > > > >> > on the
> > > > > > > >> > > > > >> > same.
> > > > > > > >> > > > > >> > > I
> > > > > > > >> > > > > >> > > > > think
> > > > > > > >> > > > > >> > > > > > > > > that we should update those APIs
> > as well
> > > > > in
> > > > > > > >> order
> > > > > > > >> > to be
> > > > > > > >> > > > > >> > > consistent
> > > > > > > >> > > > > >> > > > > end
> > > > > > > >> > > > > >> > > > > > > > > to end. That would strengthen the
> > > > > semantics of
> > > > > > > >> the
> > > > > > > >> > > > > >> consumer.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 14. The KIP mentions we will
> > write the
> > > > > new
> > > > > > > >> > coordinator
> > > > > > > >> > > > > >> in
> > > > > > > >> > > > > >> > > Java.
> > > > > > > >> > > > > >> > > > > Even
> > > > > > > >> > > > > >> > > > > > > > > though
> > > > > > > >> > > > > >> > > > > > > > > > this is an implementation
> > detail, do
> > > > > we plan
> > > > > > > >> to
> > > > > > > >> > have a
> > > > > > > >> > > > > >> new
> > > > > > > >> > > > > >> > > gradle
> > > > > > > >> > > > > >> > > > > > > module
> > > > > > > >> > > > > >> > > > > > > > > > for it?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Yes.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 15. Do we have a scalability
> > goal when
> > > > > it
> > > > > > > >> comes
> > > > > > > >> > to how
> > > > > > > >> > > > > >> many
> > > > > > > >> > > > > >> > > > > members
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > new
> > > > > > > >> > > > > >> > > > > > > > > > group protocol can support?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > We don't have numbers at the
> > moment. The
> > > > > > > >> protocol
> > > > > > > >> > should
> > > > > > > >> > > > > >> > > support
> > > > > > > >> > > > > >> > > > > 1000s
> > > > > > > >> > > > > >> > > > > > > > > of members per group. We will
> > measure
> > > > > this when
> > > > > > > >> > we have
> > > > > > > >> > > > > a
> > > > > > > >> > > > > >> > first
> > > > > > > >> > > > > >> > > > > > > > > implementation. Note that we might
> > have
> > > > > other
> > > > > > > >> > > > > bottlenecks
> > > > > > > >> > > > > >> > down
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > > road (e.g. offset commits).
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > 16. Did we consider having
> > > > > SubscribedTopidIds
> > > > > > > >> > instead
> > > > > > > >> > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > SubscribedTopicNames in
> > > > > > > >> > ConsumerGroupHeartbeatRequest?
> > > > > > > >> > > > > >> Is
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > > > idea
> > > > > > > >> > > > > >> > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > > since we have to resolve the
> > regex on
> > > > > the
> > > > > > > >> > server, we
> > > > > > > >> > > > > >> can do
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > same
> > > > > > > >> > > > > >> > > > > > > for
> > > > > > > >> > > > > >> > > > > > > > > > the topic name? The difference
> > is that
> > > > > > > >> sending
> > > > > > > >> > the
> > > > > > > >> > > > > >> regex is
> > > > > > > >> > > > > >> > > more
> > > > > > > >> > > > > >> > > > > > > > > efficient
> > > > > > > >> > > > > >> > > > > > > > > > whereas sending the topic names
> > is less
> > > > > > > >> > efficient.
> > > > > > > >> > > > > >> > > Furthermore,
> > > > > > > >> > > > > >> > > > > > > delete
> > > > > > > >> > > > > >> > > > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > > > recreation is easier to handle
> > if we
> > > > > have
> > > > > > > >> topic
> > > > > > > >> > ids.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > The idea was to consolidate the
> > metadata
> > > > > lookup
> > > > > > > >> > on the
> > > > > > > >> > > > > >> server
> > > > > > > >> > > > > >> > > for
> > > > > > > >> > > > > >> > > > > both
> > > > > > > >> > > > > >> > > > > > > > > paths but I do agree with your
> > point. As
> > > > > a
> > > > > > > >> second
> > > > > > > >> > > > > though,
> > > > > > > >> > > > > >> > using
> > > > > > > >> > > > > >> > > > > topic
> > > > > > > >> > > > > >> > > > > > > > > ids may be better here for the
> > delete and
> > > > > > > >> > recreation
> > > > > > > >> > > > > case.
> > > > > > > >> > > > > >> > > Also, I
> > > > > > > >> > > > > >> > > > > > > > > suppose that we may allow users to
> > > > > subscribe
> > > > > > > >> with
> > > > > > > >> > topic
> > > > > > > >> > > > > >> ids
> > > > > > > >> > > > > >> > in
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > > future because that is the only
> > way to be
> > > > > > > >> really
> > > > > > > >> > robust
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > > topic
> > > > > > > >> > > > > >> > > > > > > > > re-creation.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Best,
> > > > > > > >> > > > > >> > > > > > > > > David
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > On Tue, Jul 12, 2022 at 1:38 PM
> > David
> > > > > Jacot <
> > > > > > > >> > > > > >> > > dja...@confluent.io>
> > > > > > > >> > > > > >> > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Hi Justine,
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Thanks for your comments. Please
> > find
> > > > > my
> > > > > > > >> answers
> > > > > > > >> > > > > below.
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > - Yes, the new protocol relies on
> > > > > topic IDs
> > > > > > > >> > with the
> > > > > > > >> > > > > >> > > exception
> > > > > > > >> > > > > >> > > > > of the
> > > > > > > >> > > > > >> > > > > > > > > > topic names based in the
> > > > > > > >> > > > > ConsumerGroupHeartbeatRequest.
> > > > > > > >> > > > > >> I
> > > > > > > >> > > > > >> > am
> > > > > > > >> > > > > >> > > not
> > > > > > > >> > > > > >> > > > > sure
> > > > > > > >> > > > > >> > > > > > > > > > if using topic names is the
> > right call
> > > > > here.
> > > > > > > >> I
> > > > > > > >> > need to
> > > > > > > >> > > > > >> > think
> > > > > > > >> > > > > >> > > > > about it
> > > > > > > >> > > > > >> > > > > > > > > > a little more. Obviously, the
> > KIP does
> > > > > not
> > > > > > > >> > change the
> > > > > > > >> > > > > >> > > > > fetch/commit
> > > > > > > >> > > > > >> > > > > > > > > > offsets RPCs to use topic IDs.
> > This
> > > > > may be
> > > > > > > >> > something
> > > > > > > >> > > > > >> that
> > > > > > > >> > > > > >> > we
> > > > > > > >> > > > > >> > > > > should
> > > > > > > >> > > > > >> > > > > > > > > > include though as it would give
> > better
> > > > > > > >> overall
> > > > > > > >> > > > > >> guarantee in
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > > > producer.
> > > > > > > >> > > > > >> > > > > > > > > > - You're right. I think that I
> > should
> > > > > not
> > > > > > > >> have
> > > > > > > >> > > > > mentioned
> > > > > > > >> > > > > >> > this
> > > > > > > >> > > > > >> > > > > flag at
> > > > > > > >> > > > > >> > > > > > > > > > all. I will remove it. We can
> > use an
> > > > > internal
> > > > > > > >> > > > > >> configuration
> > > > > > > >> > > > > >> > > while
> > > > > > > >> > > > > >> > > > > > > > > > developing the feature.
> > > > > > > >> > > > > >> > > > > > > > > > - Both cluster types will be
> > > > > supported. The
> > > > > > > >> > change is
> > > > > > > >> > > > > >> > > > > orthogonal. The
> > > > > > > >> > > > > >> > > > > > > > > > only requirement is that the
> > cluster
> > > > > uses
> > > > > > > >> topic
> > > > > > > >> > IDs.
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Best,
> > > > > > > >> > > > > >> > > > > > > > > > David
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > On Mon, Jul 11, 2022 at 9:53 PM
> > > > > Guozhang
> > > > > > > >> Wang <
> > > > > > > >> > > > > >> > > > > wangg...@gmail.com>
> > > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Hi Ismael,
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Thanks for the feedback. Here
> > are
> > > > > some
> > > > > > > >> replies
> > > > > > > >> > > > > inlined
> > > > > > > >> > > > > >> > > below:
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > On Sat, Jul 9, 2022 at 2:53 AM
> > > > > Ismael Juma
> > > > > > > >> <
> > > > > > > >> > > > > >> > > ism...@juma.me.uk>
> > > > > > > >> > > > > >> > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Thanks for the KIP. This has
> > the
> > > > > > > >> potential
> > > > > > > >> > to be a
> > > > > > > >> > > > > >> > great
> > > > > > > >> > > > > >> > > > > > > > > improvement. A few
> > > > > > > >> > > > > >> > > > > > > > > > > > initial questions/comments:
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 1. I think it's premature to
> > talk
> > > > > about
> > > > > > > >> > target
> > > > > > > >> > > > > >> versions
> > > > > > > >> > > > > >> > > for
> > > > > > > >> > > > > >> > > > > > > > > deprecation and
> > > > > > > >> > > > > >> > > > > > > > > > > > removal of the existing group
> > > > > protocol.
> > > > > > > >> > Unlike
> > > > > > > >> > > > > >> KRaft,
> > > > > > > >> > > > > >> > > this
> > > > > > > >> > > > > >> > > > > > > affects a
> > > > > > > >> > > > > >> > > > > > > > > core
> > > > > > > >> > > > > >> > > > > > > > > > > > client protocol and hence
> > > > > > > >> > deprecation/removal will
> > > > > > > >> > > > > >> be
> > > > > > > >> > > > > >> > > heavily
> > > > > > > >> > > > > >> > > > > > > > > dependent on
> > > > > > > >> > > > > >> > > > > > > > > > > > how quickly applications
> > migrate
> > > > > to the
> > > > > > > >> new
> > > > > > > >> > > > > >> protocol.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Yeah I agree with you. I think
> > we can
> > > > > > > >> remove
> > > > > > > >> > the
> > > > > > > >> > > > > >> proposed
> > > > > > > >> > > > > >> > > > > timeline
> > > > > > > >> > > > > >> > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > `Compatibility, Deprecation,
> > and
> > > > > Migration
> > > > > > > >> > Plan` and
> > > > > > > >> > > > > >> > > instead
> > > > > > > >> > > > > >> > > > > just
> > > > > > > >> > > > > >> > > > > > > state
> > > > > > > >> > > > > >> > > > > > > > > > > that we will decide in the
> > future
> > > > > about
> > > > > > > >> when
> > > > > > > >> > we
> > > > > > > >> > > > > would
> > > > > > > >> > > > > >> > > > > deprecate old
> > > > > > > >> > > > > >> > > > > > > > > > > protocol and behaviors.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 2. The KIP says we intend to
> > > > > release this
> > > > > > > >> > in 4.x,
> > > > > > > >> > > > > >> but
> > > > > > > >> > > > > >> > it
> > > > > > > >> > > > > >> > > > > wasn't
> > > > > > > >> > > > > >> > > > > > > made
> > > > > > > >> > > > > >> > > > > > > > > clear
> > > > > > > >> > > > > >> > > > > > > > > > > > why. If we added that as a
> > way to
> > > > > > > >> estimate
> > > > > > > >> > when
> > > > > > > >> > > > > we'd
> > > > > > > >> > > > > >> > > > > deprecate
> > > > > > > >> > > > > >> > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > > remove
> > > > > > > >> > > > > >> > > > > > > > > > > > the group protocol, I also
> > suggest
> > > > > > > >> removing
> > > > > > > >> > this
> > > > > > > >> > > > > >> part.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > I think that's not specifically
> > > > > related to
> > > > > > > >> the
> > > > > > > >> > > > > >> > > > > deprecation/removal
> > > > > > > >> > > > > >> > > > > > > > > timeline
> > > > > > > >> > > > > >> > > > > > > > > > > plan, but it's more for client
> > > > > upgrades.
> > > > > > > >> I.e.
> > > > > > > >> > the
> > > > > > > >> > > > > >> > > broker-side
> > > > > > > >> > > > > >> > > > > > > > > > > implementation may be done
> > first,
> > > > > and then
> > > > > > > >> the
> > > > > > > >> > > > > client
> > > > > > > >> > > > > >> > side,
> > > > > > > >> > > > > >> > > > > and we
> > > > > > > >> > > > > >> > > > > > > > > would
> > > > > > > >> > > > > >> > > > > > > > > > > only mark it as "released" by
> > the
> > > > > time
> > > > > > > >> clients
> > > > > > > >> > > > > >> > > implementations
> > > > > > > >> > > > > >> > > > > are
> > > > > > > >> > > > > >> > > > > > > > > done. At
> > > > > > > >> > > > > >> > > > > > > > > > > that time, to enable the
> > feature the
> > > > > > > >> clients
> > > > > > > >> > need to
> > > > > > > >> > > > > >> > first
> > > > > > > >> > > > > >> > > > > swap-in
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > bytecode with a rolling bounce
> > and
> > > > > then set
> > > > > > > >> > the flag
> > > > > > > >> > > > > >> > with a
> > > > > > > >> > > > > >> > > > > second
> > > > > > > >> > > > > >> > > > > > > > > rolling
> > > > > > > >> > > > > >> > > > > > > > > > > bounce, and hence we feel it's
> > > > > better to be
> > > > > > > >> > released
> > > > > > > >> > > > > >> in a
> > > > > > > >> > > > > >> > > major
> > > > > > > >> > > > > >> > > > > > > > > version.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 3. We need to flesh out the
> > > > > details of
> > > > > > > >> the
> > > > > > > >> > > > > migration
> > > > > > > >> > > > > >> > > story.
> > > > > > > >> > > > > >> > > > > It
> > > > > > > >> > > > > >> > > > > > > > > sounds like
> > > > > > > >> > > > > >> > > > > > > > > > > > we're saying we will support
> > online
> > > > > > > >> > migrations. Is
> > > > > > > >> > > > > >> that
> > > > > > > >> > > > > >> > > > > correct?
> > > > > > > >> > > > > >> > > > > > > We
> > > > > > > >> > > > > >> > > > > > > > > should
> > > > > > > >> > > > > >> > > > > > > > > > > > explain this in detail. It
> > could
> > > > > also be
> > > > > > > >> > done as a
> > > > > > > >> > > > > >> > > separate
> > > > > > > >> > > > > >> > > > > KIP,
> > > > > > > >> > > > > >> > > > > > > if
> > > > > > > >> > > > > >> > > > > > > > > it's
> > > > > > > >> > > > > >> > > > > > > > > > > > easier.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Yes I think that's the part we
> > can
> > > > > be more
> > > > > > > >> > concrete
> > > > > > > >> > > > > >> about
> > > > > > > >> > > > > >> > > for
> > > > > > > >> > > > > >> > > > > sure
> > > > > > > >> > > > > >> > > > > > > (and
> > > > > > > >> > > > > >> > > > > > > > > > > this is related to your
> > question 2)
> > > > > above).
> > > > > > > >> > We will
> > > > > > > >> > > > > >> work
> > > > > > > >> > > > > >> > on
> > > > > > > >> > > > > >> > > > > making
> > > > > > > >> > > > > >> > > > > > > it
> > > > > > > >> > > > > >> > > > > > > > > more
> > > > > > > >> > > > > >> > > > > > > > > > > explicit in parallel as we
> > solicit
> > > > > more
> > > > > > > >> > feedback.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 4. I am happy that we are
> > pushing
> > > > > the
> > > > > > > >> > pattern
> > > > > > > >> > > > > >> > > subscriptions
> > > > > > > >> > > > > >> > > > > to
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > server,
> > > > > > > >> > > > > >> > > > > > > > > > > > but it seems like there
> > could be
> > > > > some
> > > > > > > >> tricky
> > > > > > > >> > > > > >> > > compatibility
> > > > > > > >> > > > > >> > > > > > > issues.
> > > > > > > >> > > > > >> > > > > > > > > Will we
> > > > > > > >> > > > > >> > > > > > > > > > > > have a mechanism for users to
> > > > > detect that
> > > > > > > >> > they
> > > > > > > >> > > > > need
> > > > > > > >> > > > > >> to
> > > > > > > >> > > > > >> > > update
> > > > > > > >> > > > > >> > > > > > > their
> > > > > > > >> > > > > >> > > > > > > > > regex
> > > > > > > >> > > > > >> > > > > > > > > > > > before switching to the new
> > > > > protocol?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Yes I think we need some
> > tooling for
> > > > > > > >> non-java
> > > > > > > >> > client
> > > > > > > >> > > > > >> > users
> > > > > > > >> > > > > >> > > to
> > > > > > > >> > > > > >> > > > > sort
> > > > > > > >> > > > > >> > > > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > "dry-run" the client before
> > > > > switching to
> > > > > > > >> the
> > > > > > > >> > new
> > > > > > > >> > > > > >> > protocol.
> > > > > > > >> > > > > >> > > I
> > > > > > > >> > > > > >> > > > > do not
> > > > > > > >> > > > > >> > > > > > > > > have a
> > > > > > > >> > > > > >> > > > > > > > > > > specific idea on top of my head
> > > > > though,
> > > > > > > >> maybe
> > > > > > > >> > others
> > > > > > > >> > > > > >> like
> > > > > > > >> > > > > >> > > @Matt
> > > > > > > >> > > > > >> > > > > > > > > Howlett can
> > > > > > > >> > > > > >> > > > > > > > > > > chime-in here?
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 5. Related to the last
> > question,
> > > > > will the
> > > > > > > >> > Java
> > > > > > > >> > > > > >> client
> > > > > > > >> > > > > >> > > allow
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > users to
> > > > > > > >> > > > > >> > > > > > > > > > > > stick with the current regex
> > > > > engine for
> > > > > > > >> > > > > >> compatibility
> > > > > > > >> > > > > >> > > > > reasons?
> > > > > > > >> > > > > >> > > > > > > For
> > > > > > > >> > > > > >> > > > > > > > > example,
> > > > > > > >> > > > > >> > > > > > > > > > > > it may be handy to keep using
> > > > > client
> > > > > > > >> based
> > > > > > > >> > regex
> > > > > > > >> > > > > at
> > > > > > > >> > > > > >> > > first to
> > > > > > > >> > > > > >> > > > > keep
> > > > > > > >> > > > > >> > > > > > > > > > > > migrations simple and then
> > migrate
> > > > > to
> > > > > > > >> > server based
> > > > > > > >> > > > > >> > > regexes
> > > > > > > >> > > > > >> > > > > as a
> > > > > > > >> > > > > >> > > > > > > > > second
> > > > > > > >> > > > > >> > > > > > > > > > > > step.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Honestly I have not thought
> > about
> > > > > that for
> > > > > > > >> > java
> > > > > > > >> > > > > >> clients,
> > > > > > > >> > > > > >> > > and
> > > > > > > >> > > > > >> > > > > we can
> > > > > > > >> > > > > >> > > > > > > > > discuss
> > > > > > > >> > > > > >> > > > > > > > > > > that. What kind of
> > compatibility
> > > > > issues do
> > > > > > > >> > you have
> > > > > > > >> > > > > in
> > > > > > > >> > > > > >> > > mind?
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 6. When we say that the group
> > > > > coordinator
> > > > > > > >> > will be
> > > > > > > >> > > > > >> > > > > responsible for
> > > > > > > >> > > > > >> > > > > > > > > storing
> > > > > > > >> > > > > >> > > > > > > > > > > > the configurations and that
> > the
> > > > > > > >> > configurations
> > > > > > > >> > > > > will
> > > > > > > >> > > > > >> be
> > > > > > > >> > > > > >> > > > > deleted
> > > > > > > >> > > > > >> > > > > > > when
> > > > > > > >> > > > > >> > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > group is deleted. Will a
> > > > > transition to
> > > > > > > >> DEAD
> > > > > > > >> > > > > trigger
> > > > > > > >> > > > > >> > > deletion
> > > > > > > >> > > > > >> > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > > configurations?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Yes, since the DEAD state is an
> > > > > ending
> > > > > > > >> state
> > > > > > > >> > (we
> > > > > > > >> > > > > would
> > > > > > > >> > > > > >> > only
> > > > > > > >> > > > > >> > > > > > > transit to
> > > > > > > >> > > > > >> > > > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > > > state when the group is EMPTY
> > and
> > > > > also all
> > > > > > > >> of
> > > > > > > >> > its
> > > > > > > >> > > > > >> > metadata
> > > > > > > >> > > > > >> > > are
> > > > > > > >> > > > > >> > > > > > > gone),
> > > > > > > >> > > > > >> > > > > > > > > once
> > > > > > > >> > > > > >> > > > > > > > > > > it's transited to DEAD this
> > group
> > > > > would
> > > > > > > >> never
> > > > > > > >> > be
> > > > > > > >> > > > > >> revived.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 7. Will the choice to store
> > the
> > > > > configs
> > > > > > > >> in
> > > > > > > >> > the
> > > > > > > >> > > > > group
> > > > > > > >> > > > > >> > > > > coordinator
> > > > > > > >> > > > > >> > > > > > > > > make it
> > > > > > > >> > > > > >> > > > > > > > > > > > harder to list all cluster
> > configs
> > > > > and
> > > > > > > >> their
> > > > > > > >> > > > > values?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > That's a good question, and our
> > > > > thoughts
> > > > > > > >> are
> > > > > > > >> > that
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > so-called
> > > > > > > >> > > > > >> > > > > > > "group
> > > > > > > >> > > > > >> > > > > > > > > > > configurations" are overrides
> > of the
> > > > > > > >> > cluster-level
> > > > > > > >> > > > > >> > > > > configurations
> > > > > > > >> > > > > >> > > > > > > > > > > customized per group so when an
> > > > > admin list
> > > > > > > >> > cluster
> > > > > > > >> > > > > >> > configs
> > > > > > > >> > > > > >> > > it's
> > > > > > > >> > > > > >> > > > > > > okay to
> > > > > > > >> > > > > >> > > > > > > > > > > list just the cluster-level
> > > > > "defaults", not
> > > > > > > >> > showing
> > > > > > > >> > > > > >> any
> > > > > > > >> > > > > >> > > > > per-group
> > > > > > > >> > > > > >> > > > > > > > > > > customizations.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 8. How would someone
> > configure a
> > > > > group
> > > > > > > >> > before
> > > > > > > >> > > > > >> starting
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > > > consumers? Have
> > > > > > > >> > > > > >> > > > > > > > > > > > we considered allowing the
> > explicit
> > > > > > > >> > creation of
> > > > > > > >> > > > > >> groups?
> > > > > > > >> > > > > >> > > > > > > > > Alternatively, the
> > > > > > > >> > > > > >> > > > > > > > > > > > configs could be decoupled
> > from
> > > > > the group
> > > > > > > >> > > > > lifecycle.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > The configs can be created
> > before
> > > > > the group
> > > > > > > >> > itself
> > > > > > > >> > > > > as
> > > > > > > >> > > > > >> an
> > > > > > > >> > > > > >> > > > > > > independent
> > > > > > > >> > > > > >> > > > > > > > > entity
> > > > > > > >> > > > > >> > > > > > > > > > > --- of course, this requires
> > the
> > > > > > > >> corresponding
> > > > > > > >> > > > > >> request to
> > > > > > > >> > > > > >> > > be
> > > > > > > >> > > > > >> > > > > > > routed to
> > > > > > > >> > > > > >> > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > right coordinator based on the
> > group
> > > > > id ---
> > > > > > > >> > the only
> > > > > > > >> > > > > >> > thing
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > > > > > differs is,
> > > > > > > >> > > > > >> > > > > > > > > > > when the group itself is gone
> > we
> > > > > also check
> > > > > > > >> > if there
> > > > > > > >> > > > > >> are
> > > > > > > >> > > > > >> > > any
> > > > > > > >> > > > > >> > > > > > > > > configuration
> > > > > > > >> > > > > >> > > > > > > > > > > entities related to that group
> > and
> > > > > delete
> > > > > > > >> as
> > > > > > > >> > well.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Admittedly this indeed
> > introduces an
> > > > > > > >> > asymmetry on
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > creation
> > > > > > > >> > > > > >> > > > > /
> > > > > > > >> > > > > >> > > > > > > > > deletion
> > > > > > > >> > > > > >> > > > > > > > > > > lifecycles of the config
> > entities,
> > > > > and we
> > > > > > > >> > would like
> > > > > > > >> > > > > >> to
> > > > > > > >> > > > > >> > > hear
> > > > > > > >> > > > > >> > > > > > > everyone's
> > > > > > > >> > > > > >> > > > > > > > > > > feelings whether we should aim
> > for
> > > > > symmetry
> > > > > > > >> > i.e.
> > > > > > > >> > > > > >> totally
> > > > > > > >> > > > > >> > > > > decouple
> > > > > > > >> > > > > >> > > > > > > group
> > > > > > > >> > > > > >> > > > > > > > > > > configs and hence not delete
> > them at
> > > > > all
> > > > > > > >> when
> > > > > > > >> > the
> > > > > > > >> > > > > >> group
> > > > > > > >> > > > > >> > is
> > > > > > > >> > > > > >> > > > > gone,
> > > > > > > >> > > > > >> > > > > > > but
> > > > > > > >> > > > > >> > > > > > > > > always
> > > > > > > >> > > > > >> > > > > > > > > > > require explicit deletion
> > operations
> > > > > by
> > > > > > > >> > themselves.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 9. Will the
> > Consumer.subscribe
> > > > > method for
> > > > > > > >> > the Java
> > > > > > > >> > > > > >> > client
> > > > > > > >> > > > > >> > > > > still
> > > > > > > >> > > > > >> > > > > > > take
> > > > > > > >> > > > > >> > > > > > > > > a
> > > > > > > >> > > > > >> > > > > > > > > > > > `java.util.regex.Pattern` of
> > do we
> > > > > have
> > > > > > > >> to
> > > > > > > >> > > > > >> introduce an
> > > > > > > >> > > > > >> > > > > overload?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > I think we do not need to
> > introduce
> > > > > an
> > > > > > > >> > overload, but
> > > > > > > >> > > > > >> I'm
> > > > > > > >> > > > > >> > > all
> > > > > > > >> > > > > >> > > > > ears
> > > > > > > >> > > > > >> > > > > > > if
> > > > > > > >> > > > > >> > > > > > > > > there
> > > > > > > >> > > > > >> > > > > > > > > > > may be some compatibility
> > issues
> > > > > that we
> > > > > > > >> may
> > > > > > > >> > > > > overlook.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 10. I agree with Justine
> > that we
> > > > > should
> > > > > > > >> be
> > > > > > > >> > clearer
> > > > > > > >> > > > > >> > about
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > > > reason
> > > > > > > >> > > > > >> > > > > > > > > to
> > > > > > > >> > > > > >> > > > > > > > > > > > switch to
> > IBP/metadata.version
> > > > > from the
> > > > > > > >> > feature
> > > > > > > >> > > > > >> flag.
> > > > > > > >> > > > > >> > > Maybe
> > > > > > > >> > > > > >> > > > > we
> > > > > > > >> > > > > >> > > > > > > mean
> > > > > > > >> > > > > >> > > > > > > > > that we
> > > > > > > >> > > > > >> > > > > > > > > > > > can switch the default for
> > the
> > > > > feature
> > > > > > > >> flag
> > > > > > > >> > to
> > > > > > > >> > > > > true
> > > > > > > >> > > > > >> > > based on
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > metadata.version once we
> > want to
> > > > > make it
> > > > > > > >> the
> > > > > > > >> > > > > >> default.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > 11. Some of the protocol APIs
> > don't
> > > > > mention
> > > > > > > >> > the
> > > > > > > >> > > > > >> required
> > > > > > > >> > > > > >> > > ACLs,
> > > > > > > >> > > > > >> > > > > it
> > > > > > > >> > > > > >> > > > > > > > > would be
> > > > > > > >> > > > > >> > > > > > > > > > > > good to add that for
> > consistency.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Ack, we can certainly do that.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 12. It is a bit odd that
> > > > > > > >> > ConsumerGroupHeartbeat
> > > > > > > >> > > > > >> > requires
> > > > > > > >> > > > > >> > > > > "Read
> > > > > > > >> > > > > >> > > > > > > > > Group" even
> > > > > > > >> > > > > >> > > > > > > > > > > > though it seems to do more
> > than
> > > > > reading.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > I had that thought myself as
> > well,
> > > > > but in
> > > > > > > >> the
> > > > > > > >> > end we
> > > > > > > >> > > > > >> > could
> > > > > > > >> > > > > >> > > not
> > > > > > > >> > > > > >> > > > > > > find a
> > > > > > > >> > > > > >> > > > > > > > > > > better alternative: adding
> > Write
> > > > > Group
> > > > > > > >> seems
> > > > > > > >> > an
> > > > > > > >> > > > > >> overkill
> > > > > > > >> > > > > >> > > here
> > > > > > > >> > > > > >> > > > > > > since we
> > > > > > > >> > > > > >> > > > > > > > > do
> > > > > > > >> > > > > >> > > > > > > > > > > not have it elsewhere (we only
> > have
> > > > > Read /
> > > > > > > >> > Delete
> > > > > > > >> > > > > and
> > > > > > > >> > > > > >> > > Describe
> > > > > > > >> > > > > >> > > > > on
> > > > > > > >> > > > > >> > > > > > > > > groups so
> > > > > > > >> > > > > >> > > > > > > > > > > far). Would like to hear others
> > > > > thoughts.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 13. How is topic recreation
> > > > > handled by
> > > > > > > >> the
> > > > > > > >> > > > > consumer
> > > > > > > >> > > > > >> > with
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > new
> > > > > > > >> > > > > >> > > > > > > > > group
> > > > > > > >> > > > > >> > > > > > > > > > > > protocol? It would be good
> > to have
> > > > > a
> > > > > > > >> > section on
> > > > > > > >> > > > > >> this.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > You mean with regex
> > subscription
> > > > > right? Yes
> > > > > > > >> > we can
> > > > > > > >> > > > > >> add a
> > > > > > > >> > > > > >> > > > > section
> > > > > > > >> > > > > >> > > > > > > about
> > > > > > > >> > > > > >> > > > > > > > > > > that, but basically the idea
> > is that
> > > > > > > >> consumer
> > > > > > > >> > would
> > > > > > > >> > > > > be
> > > > > > > >> > > > > >> > > totally
> > > > > > > >> > > > > >> > > > > > > > > agnostic in
> > > > > > > >> > > > > >> > > > > > > > > > > the new protocol as it's
> > handled all
> > > > > by the
> > > > > > > >> > brokers.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 14. The KIP mentions we will
> > write
> > > > > the
> > > > > > > >> new
> > > > > > > >> > > > > >> coordinator
> > > > > > > >> > > > > >> > in
> > > > > > > >> > > > > >> > > > > Java.
> > > > > > > >> > > > > >> > > > > > > Even
> > > > > > > >> > > > > >> > > > > > > > > though
> > > > > > > >> > > > > >> > > > > > > > > > > > this is an implementation
> > detail,
> > > > > do we
> > > > > > > >> > plan to
> > > > > > > >> > > > > >> have a
> > > > > > > >> > > > > >> > > new
> > > > > > > >> > > > > >> > > > > gradle
> > > > > > > >> > > > > >> > > > > > > > > module
> > > > > > > >> > > > > >> > > > > > > > > > > > for it?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > We have not thought about
> > that. But
> > > > > I think
> > > > > > > >> > the
> > > > > > > >> > > > > answer
> > > > > > > >> > > > > >> > > should
> > > > > > > >> > > > > >> > > > > be
> > > > > > > >> > > > > >> > > > > > > yes.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 15. Do we have a scalability
> > goal
> > > > > when it
> > > > > > > >> > comes to
> > > > > > > >> > > > > >> how
> > > > > > > >> > > > > >> > > many
> > > > > > > >> > > > > >> > > > > > > members
> > > > > > > >> > > > > >> > > > > > > > > the new
> > > > > > > >> > > > > >> > > > > > > > > > > > group protocol can support?
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Within a group, I think we
> > should
> > > > > shoot for
> > > > > > > >> > 1000s of
> > > > > > > >> > > > > >> > > members.
> > > > > > > >> > > > > >> > > > > But
> > > > > > > >> > > > > >> > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > > > scalability goals also depend
> > on the
> > > > > offset
> > > > > > > >> > > > > management
> > > > > > > >> > > > > >> > > (commit,
> > > > > > > >> > > > > >> > > > > > > fetch)
> > > > > > > >> > > > > >> > > > > > > > > > > capabilities of the coordinator
> > > > > which we
> > > > > > > >> did
> > > > > > > >> > not
> > > > > > > >> > > > > >> cover in
> > > > > > > >> > > > > >> > > this
> > > > > > > >> > > > > >> > > > > > > KIP, so
> > > > > > > >> > > > > >> > > > > > > > > it's
> > > > > > > >> > > > > >> > > > > > > > > > > hard to give a number that
> > applies
> > > > > > > >> > universally.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 16. Did we consider having
> > > > > > > >> > SubscribedTopidIds
> > > > > > > >> > > > > >> instead
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > > > > > > > > > > SubscribedTopicNames in
> > > > > > > >> > > > > >> ConsumerGroupHeartbeatRequest?
> > > > > > > >> > > > > >> > > Is the
> > > > > > > >> > > > > >> > > > > > > idea
> > > > > > > >> > > > > >> > > > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > > > > since we have to resolve the
> > regex
> > > > > on the
> > > > > > > >> > server,
> > > > > > > >> > > > > we
> > > > > > > >> > > > > >> > can
> > > > > > > >> > > > > >> > > do
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > same
> > > > > > > >> > > > > >> > > > > > > > > for
> > > > > > > >> > > > > >> > > > > > > > > > > > the topic name? The
> > difference is
> > > > > that
> > > > > > > >> > sending the
> > > > > > > >> > > > > >> > regex
> > > > > > > >> > > > > >> > > is
> > > > > > > >> > > > > >> > > > > more
> > > > > > > >> > > > > >> > > > > > > > > efficient
> > > > > > > >> > > > > >> > > > > > > > > > > > whereas sending the topic
> > names is
> > > > > less
> > > > > > > >> > efficient.
> > > > > > > >> > > > > >> > > > > Furthermore,
> > > > > > > >> > > > > >> > > > > > > > > delete and
> > > > > > > >> > > > > >> > > > > > > > > > > > recreation is easier to
> > handle if
> > > > > we have
> > > > > > > >> > topic
> > > > > > > >> > > > > ids.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > The main reason to still let
> > the
> > > > > clients
> > > > > > > >> send
> > > > > > > >> > names
> > > > > > > >> > > > > >> is to
> > > > > > > >> > > > > >> > > keep
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > reasoning of names -> ids on
> > the
> > > > > broker /
> > > > > > > >> > admin
> > > > > > > >> > > > > client
> > > > > > > >> > > > > >> > > only.
> > > > > > > >> > > > > >> > > > > Note
> > > > > > > >> > > > > >> > > > > > > that
> > > > > > > >> > > > > >> > > > > > > > > > > although we added topic id in
> > > > > KIP-516, we
> > > > > > > >> > never
> > > > > > > >> > > > > >> > > implemented the
> > > > > > > >> > > > > >> > > > > > > logic
> > > > > > > >> > > > > >> > > > > > > > > on
> > > > > > > >> > > > > >> > > > > > > > > > > consumer/producers leveraging
> > the
> > > > > related
> > > > > > > >> > newer
> > > > > > > >> > > > > >> versioned
> > > > > > > >> > > > > >> > > RPCs,
> > > > > > > >> > > > > >> > > > > > > > > instead we
> > > > > > > >> > > > > >> > > > > > > > > > > just set the topic id as empty
> > UUID.
> > > > > We
> > > > > > > >> want
> > > > > > > >> > to keep
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > > > > > consumer/producer
> > > > > > > >> > > > > >> > > > > > > > > > > to be thin and only delegate
> > the
> > > > > reasoning
> > > > > > > >> on
> > > > > > > >> > broker
> > > > > > > >> > > > > >> and
> > > > > > > >> > > > > >> > > > > > > potentially
> > > > > > > >> > > > > >> > > > > > > > > admin
> > > > > > > >> > > > > >> > > > > > > > > > > clients.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > > > > Ismael
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > On Wed, Jul 6, 2022 at 10:45
> > AM
> > > > > David
> > > > > > > >> Jacot
> > > > > > > >> > > > > >> > > > > > > > > <dja...@confluent.io.invalid>
> > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Hi all,
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > I would like to start a
> > > > > discussion
> > > > > > > >> thread
> > > > > > > >> > on
> > > > > > > >> > > > > >> KIP-848:
> > > > > > > >> > > > > >> > > The
> > > > > > > >> > > > > >> > > > > Next
> > > > > > > >> > > > > >> > > > > > > > > > > > > Generation of the Consumer
> > > > > Rebalance
> > > > > > > >> > Protocol.
> > > > > > > >> > > > > >> With
> > > > > > > >> > > > > >> > > this
> > > > > > > >> > > > > >> > > > > KIP,
> > > > > > > >> > > > > >> > > > > > > we
> > > > > > > >> > > > > >> > > > > > > > > aim
> > > > > > > >> > > > > >> > > > > > > > > > > > > to make the rebalance
> > protocol
> > > > > (for
> > > > > > > >> > consumers)
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > > reliable,
> > > > > > > >> > > > > >> > > > > > > more
> > > > > > > >> > > > > >> > > > > > > > > > > > > scalable, easier to
> > implement for
> > > > > > > >> > clients, and
> > > > > > > >> > > > > >> easier
> > > > > > > >> > > > > >> > > to
> > > > > > > >> > > > > >> > > > > debug
> > > > > > > >> > > > > >> > > > > > > for
> > > > > > > >> > > > > >> > > > > > > > > > > > > operators.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > The KIP is here:
> > > > > > > >> > > > > >> > > > >
> > https://cwiki.apache.org/confluence/x/HhD1D.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Please take a look and let
> > me
> > > > > know what
> > > > > > > >> > you
> > > > > > > >> > > > > think.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Best,
> > > > > > > >> > > > > >> > > > > > > > > > > > > David
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > PS: I will be away from
> > July
> > > > > 18th to
> > > > > > > >> > August 8th.
> > > > > > > >> > > > > >> That
> > > > > > > >> > > > > >> > > gives
> > > > > > > >> > > > > >> > > > > > > you a
> > > > > > > >> > > > > >> > > > > > > > > bit
> > > > > > > >> > > > > >> > > > > > > > > > > > > of time to read and digest
> > this
> > > > > long
> > > > > > > >> KIP.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > --
> > > > > > > >> > > > > >> > > > > > > > > > > -- Guozhang
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > --
> > > > > > > >> > > > > >> > > > > > > > -- Guozhang
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> --
> > > > > > > >> > > > > >> -- Guozhang
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > >
> >
>
>
> --
> -- Guozhang

Reply via email to