Hi Luke,

Thanks for your comments. Please find my answers below.

> 1. Will the member reconciling jump to the latest target assignment? For
> example, member epoch are A:1, B:2, but now the group is in target
> assignment epoch 4. Will the members jump to the target assignment result
> in epoch 4 directly, or they should first sync to epoch 2, then 3, and 4? I
> tried to find out if there's any problem if we jump to 4, but I can't. So,
> I think we'll go to epoch 4 directly, right?

That's right. The reconciliation always uses the latest target
assignment so members might skip epochs. The only constraint to jump
to a new epoch is that the member should have revoked the required
partitions. In your example, if A and B do not have to revoke any
partitions, they would be moved to epoch 4 immediately by the
coordinator. Otherwise, the coordinator will ask them to revoke
partitions first and then move them to epoch 4 when the revocation is
acknowledged.

> 2. For the ConsumerGroupPrepareAssignment and
> ConsumerGroupInstallAssignment API, when member epoch doesn't match, it'll
> retry (and hope) in next heartbeat response, the member epoch will be
> bumped. But in ConsumerGroupHeartbeat, Upon receiving the
> FENCED_MEMBER_EPOCH error, the consumer abandon all its partitions and
> rejoins with the same member id and the epoch 0, unless the partitions
> owned by the members are a subset of the target partitions. I'm wondering
> why can't we return the latest member epoch to the member in
> ConsumerGroupHeartbeat response no matter if the owned partitions are
> subset of target partitions or not. After all, the member is still one of
> the group member, the assignment for the member is still valid, right? This
> way, the ConsumerGroupPrepareAssignment and ConsumerGroupInstallAssignment
> API can piggyback it to retry in next heartbeat response. WDYT?

As explained by Guozhang, the error semantic is different based on the
API. For the *PrepareAssignment and the *InstallAssignment, it means
that the member epoch is stale and that the client should retry when
it gets a newer epoch whereas for the ConsumerGroupHeartbeat, it means
that the client has been fenced by the coordinator and thus should
abandon all its partitions and rejoin. I do agree with Guozhang that
it would be better to use different errors in order to be clear. It is
misleading otherwise.

Regarding your second point, I think that we can make the API
idempotent here. There are three cases: 1) If a client sends a
ConsumerGroupHeartbeat request with an older member epoch but with
partitions which are a subset of the expected target partitions, we
can move it to the current member epoch. 2) If the member epoch is
smaller and the partitions differ, it is very likely that the member
owns partitions which are owned by other members now so it seems to me
that we should treat it as a fatal error in this case. We could be a
bit smarter and check whether the partitions are really owned by other
members and if they are not we could proceed. The issue in this case
is that the member won't be able to revoke the partitions and to
commit the offsets with a stale member epoch before the coordinator
moves it to the target epoch. 3) If the member epoch is larger, we
should fence the member because this should never happen.

> 3. When receiving COMPUTE_ASSIGNMENT error in ConsumerGroupHeartbeat API,
> the consumer starts the assignment process. But what if somehow the member
> didn't send out ConsumerGroupPrepareAssignment request, what would we do? I
> think we'll wait for rebalance timeout and kick the member out of the
> group, right? And then, what will the group coordinator do? I think it'll
> select another member to be the leader and return COMPUTE_ASSIGNMENT error
> in that member's heartbeat response, right? Maybe we should add that into
> KIP.

When the member is selected to perform the assignment, the timer
starts ticking so the coordinator gives the rebalance timeout to the
member to perform it. If it does not complete the full process by that
time, the coordinator will pick another member to run the assignment.
I believe that we should not fence the member based on this though. I
was thinking that fencing would be only done via the
ConsumerGroupHeartbeat API and the session timeout. I will clarify
this in the KIP.

> Some typos:
> 4. In "Assignment Process" section:
> If the selected assignor does exist, the group coordinator will reject the
> heartbeat with an UNSUPPORTED_ASSIGNOR error.
> -> I think it should be "does not exist"
>
> 5. In "JoinGroup Handling" section:
> If the member has revoked all its partitions or the required partitions,
> the member can transition to its next epoch. The current assignment become
> the current assignment.
> -> I think it should be "The target assignment becomes the current
> assignment", right?
>
> 6. In "ConsumerGroupDescribe API" section:
> When the group coordinator handle a ConsumerGroupPrepareAssignmentRequest
> request:
> -> It should be "handle a ConsumerGroupDescribe request"

Thanks for reporting the typos. I will fix them.

Best,
David

On Tue, Aug 9, 2022 at 3:40 PM David Jacot <dja...@confluent.io> wrote:
>
> Hi Sagar,
>
> Thanks for the feedback and the document. That's really helpful. I
> will take a look at it.
>
> Overall, it seems to me that both Connect and the Consumer could share
> the same underlying "engine". The main difference is that the Consumer
> assigns topic-partitions to members whereas Connect assigns tasks to
> workers. I see two ways to move forward:
> 1) We extend the new proposed APIs to support different resource types
> (e.g. partitions, tasks, etc.); or
> 2) We use new dedicated APIs for Connect. The dedicated APIs would be
> similar to the new ones but different on the content/resources and
> they would rely on the same engine on the coordinator side.
>
> I personally lean towards 2) because I am not a fan of overcharging
> APIs to serve different purposes. That being said, I am not opposed to
> 1) if we can find an elegant way to do it.
>
> I think that we can continue to discuss it here for now in order to
> ensure that this KIP is compatible with what we will do for Connect in
> the future.
>
> Best,
> David
>
> On Mon, Aug 8, 2022 at 2:41 PM David Jacot <dja...@confluent.io> wrote:
> >
> > Hi all,
> >
> > I am back from vacation. I will go through and address your comments
> > in the coming days. Thanks for your feedback.
> >
> > Cheers,
> > David
> >
> > On Wed, Aug 3, 2022 at 10:05 PM Gregory Harris <gharris1...@gmail.com> 
> > wrote:
> > >
> > > Hey All!
> > >
> > > Thanks for the KIP, it's wonderful to see cooperative rebalancing making 
> > > it
> > > down the stack!
> > >
> > > I had a few questions:
> > >
> > > 1. The 'Rejected Alternatives' section describes how member epoch should
> > > advance in step with the group epoch and assignment epoch values. I think
> > > that this is a good idea for the reasons described in the KIP. When the
> > > protocol is incrementally assigning partitions to a worker, what member
> > > epoch does each incremental assignment use? Are member epochs re-used, and
> > > a single member epoch can correspond to multiple different (monotonically
> > > larger) assignments?
> > >
> > > 2. Is the Assignor's 'Reason' field opaque to the group coordinator? If
> > > not, should custom client-side assignor implementations interact with the
> > > Reason field, and how is its common meaning agreed upon? If so, what is 
> > > the
> > > benefit of a distinct Reason field over including such functionality in 
> > > the
> > > opaque metadata?
> > >
> > > 3. The following is included in the KIP: "Thanks to this, the input of the
> > > client side assignor is entirely driven by the group coordinator. The
> > > consumer is no longer responsible for maintaining any state besides its
> > > assigned partitions." Does this mean that the client-side assignor MAY
> > > incorporate additional non-Metadata state (such as partition throughput,
> > > cpu/memory metrics, config topics, etc), or that additional non-Metadata
> > > state SHOULD NOT be used?
> > >
> > > 4. I see that there are separate classes
> > > for org.apache.kafka.server.group.consumer.PartitionAssignor
> > > and org.apache.kafka.clients.consumer.PartitionAssignor that seem to
> > > overlap significantly. Is it possible for these two implementations to be
> > > unified? This would serve to promote feature parity of server-side and
> > > client-side assignors, and would also facilitate operational flexibility 
> > > in
> > > certain situations. For example, if a server-side assignor has some poor
> > > behavior and needs a patch, deploying the patched assignor to the client
> > > and switching one consumer group to a client-side assignor may be faster
> > > and less risky than patching all of the brokers. With the currently
> > > proposed distinct APIs, a non-trivial reimplementation would have to be
> > > assembled, and if the two APIs have diverged significantly, then it is
> > > possible that a reimplementation would not be possible.
> > >
> > > --
> > > Greg Harris
> > > gharris1...@gmail.com
> > > github.com/gharris1727
> > >
> > > On Wed, Aug 3, 2022 at 8:39 AM Sagar <sagarmeansoc...@gmail.com> wrote:
> > >
> > > > Hi Guozhang/David,
> > > >
> > > > I created a confluence page to discuss how Connect would need to change
> > > > based on the new rebalance protocol. Here's the page:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol
> > > >
> > > > It's also pretty longish and I have tried to keep a format similar to
> > > > KIP-848. Let me know what you think. Also, do you think this should be
> > > > moved to a separate discussion thread or is this one fine?
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Tue, Jul 26, 2022 at 7:37 AM Sagar <sagarmeansoc...@gmail.com> wrote:
> > > >
> > > > > Hello Guozhang,
> > > > >
> > > > > Thank you so much for the doc on Kafka Streams. Sure, I would do the
> > > > > analysis and come up with such a document.
> > > > >
> > > > > Thanks!
> > > > > Sagar.
> > > > >
> > > > > On Tue, Jul 26, 2022 at 4:47 AM Guozhang Wang <wangg...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Hello Sagar,
> > > > >>
> > > > >> It would be great if you could come back with some analysis on how to
> > > > >> implement the Connect side integration with the new protocol; so far
> > > > >> besides leveraging on the new "protocol type" we did not yet think
> > > > through
> > > > >> the Connect side implementations. For Streams here's a draft of
> > > > >> integration
> > > > >> plan:
> > > > >>
> > > > >>
> > > > https://docs.google.com/document/d/17PNz2sGoIvGyIzz8vLyJTJTU2rqnD_D9uHJnH9XARjU/edit#heading=h.pdgirmi57dvn
> > > > >> just FYI for your analysis on Connect.
> > > > >>
> > > > >> On Tue, Jul 19, 2022 at 10:48 PM Sagar <sagarmeansoc...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > Hi David,
> > > > >> >
> > > > >> > Thank you for your response. The reason I thought connect can also 
> > > > >> > fit
> > > > >> into
> > > > >> > this new scheme is that even today the connect uses a
> > > > WorkerCoordinator
> > > > >> > extending from AbstractCoordinator to empower rebalances of
> > > > >> > tasks/connectors. The WorkerCoordinator sets the protocolType() to
> > > > >> connect
> > > > >> > and uses the metadata() method by plumbing into
> > > > >> JoinGroupRequestProtocol.
> > > > >> >
> > > > >> > I think the changes to support connect would be similar at a high
> > > > level
> > > > >> to
> > > > >> > the changes in streams mainly because of the Client side assignors
> > > > being
> > > > >> > used in both. At an implementation level, we might need to make a 
> > > > >> > lot
> > > > of
> > > > >> > changes to get onto this new assignment protocol like enhancing the
> > > > >> > JoinGroup request/response and SyncGroup and using
> > > > >> ConsumerGroupHeartbeat
> > > > >> > API etc again on similar lines to streams (or there might be
> > > > >> deviations). I
> > > > >> > would try to perform a detailed analysis of the same  and we can 
> > > > >> > have
> > > > a
> > > > >> > separate discussion thread for that as that would derail this
> > > > discussion
> > > > >> > thread. Let me know if that sounds good to you.
> > > > >> >
> > > > >> > Thanks!
> > > > >> > Sagar.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Jul 15, 2022 at 5:47 PM David Jacot
> > > > <dja...@confluent.io.invalid
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Sagar,
> > > > >> > >
> > > > >> > > Thanks for your comments.
> > > > >> > >
> > > > >> > > 1) Yes. That refers to `Assignment#error`. Sure, I can mention 
> > > > >> > > it.
> > > > >> > >
> > > > >> > > 2) The idea is to transition C from his current assignment to his
> > > > >> > > target assignment when he can move to epoch 3. When that happens,
> > > > the
> > > > >> > > member assignment is updated and persisted with all its assigned
> > > > >> > > partitions even if they are not all revoked yet. In other words, 
> > > > >> > > the
> > > > >> > > member assignment becomes the target assignment. This is 
> > > > >> > > basically
> > > > an
> > > > >> > > optimization to avoid having to write all the changes to the log.
> > > > The
> > > > >> > > examples are based on the persisted state so I understand the
> > > > >> > > confusion. Let me see if I can improve this in the description.
> > > > >> > >
> > > > >> > > 3) Regarding Connect, it could reuse the protocol with a client 
> > > > >> > > side
> > > > >> > > assignor if it fits in the protocol. The assignment is about
> > > > >> > > topicid-partitions + metadata, could Connect fit into this?
> > > > >> > >
> > > > >> > > Best,
> > > > >> > > David
> > > > >> > >
> > > > >> > > On Fri, Jul 15, 2022 at 1:55 PM Sagar <sagarmeansoc...@gmail.com>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > Hi David,
> > > > >> > > >
> > > > >> > > > Thanks for the KIP. I just had minor observations:
> > > > >> > > >
> > > > >> > > > 1) In the Assignment Error section in Client Side mode 
> > > > >> > > > Assignment
> > > > >> > > process,
> > > > >> > > > you mentioned => `In this case, the client side assignor can
> > > > return
> > > > >> an
> > > > >> > > > error to the group coordinator`. In this case are you 
> > > > >> > > > referring to
> > > > >> the
> > > > >> > > > Assignor returning an AssignmentError that's listed down 
> > > > >> > > > towards
> > > > the
> > > > >> > end?
> > > > >> > > > If yes, do you think it would make sense to mention this
> > > > explicitly
> > > > >> > here?
> > > > >> > > >
> > > > >> > > > 2) In the Case Studies section, I have a slight confusion, not
> > > > sure
> > > > >> if
> > > > >> > > > others have the same. Consider this step:
> > > > >> > > >
> > > > >> > > > When B heartbeats, the group coordinator transitions him to 
> > > > >> > > > epoch
> > > > 3
> > > > >> > > because
> > > > >> > > > B has no partitions to revoke. It persists the change and 
> > > > >> > > > reply.
> > > > >> > > >
> > > > >> > > >    - Group (epoch=3)
> > > > >> > > >       - A
> > > > >> > > >       - B
> > > > >> > > >       - C
> > > > >> > > >    - Target Assignment (epoch=3)
> > > > >> > > >       - A - partitions=[foo-0]
> > > > >> > > >       - B - partitions=[foo-2]
> > > > >> > > >       - C - partitions=[foo-1]
> > > > >> > > >    - Member Assignment
> > > > >> > > >       - A - epoch=2, partitions=[foo-0, foo-1]
> > > > >> > > >       - B - epoch=3, partitions=[foo-2]
> > > > >> > > >       - C - epoch=3, partitions=[foo-1]
> > > > >> > > >
> > > > >> > > > When C heartbeats, it transitions to epoch 3 but cannot get 
> > > > >> > > > foo-1
> > > > >> yet.
> > > > >> > > >
> > > > >> > > > Here,it's mentioned that member C can't get the foo-1 partition
> > > > yet,
> > > > >> > but
> > > > >> > > > based on the description above, it seems it already has it. Do 
> > > > >> > > > you
> > > > >> > think
> > > > >> > > it
> > > > >> > > > would be better to remove it and populate it only when it 
> > > > >> > > > actually
> > > > >> gets
> > > > >> > > it?
> > > > >> > > > I see this in a lot of other places, so have I understood it
> > > > >> > incorrectly
> > > > >> > > ?
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Regarding connect , it might be out of scope of this 
> > > > >> > > > discussion,
> > > > but
> > > > >> > from
> > > > >> > > > what I understood it would probably be running in client side
> > > > >> assignor
> > > > >> > > mode
> > > > >> > > > even on the new rebalance protocol as it has its own Custom
> > > > >> > > Assignors(Eager
> > > > >> > > > and IncrementalCooperative).
> > > > >> > > >
> > > > >> > > > Thanks!
> > > > >> > > >
> > > > >> > > > Sagar.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Fri, Jul 15, 2022 at 5:00 PM David Jacot
> > > > >> > <dja...@confluent.io.invalid
> > > > >> > > >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Thanks Hector! Our goal is to move forward with specialized 
> > > > >> > > > > API
> > > > >> > > > > instead of relying on one generic API. For Connect, we can 
> > > > >> > > > > apply
> > > > >> the
> > > > >> > > > > exact same pattern and reuse/share the core implementation on
> > > > the
> > > > >> > > > > server side. For the schema registry, I think that we should
> > > > >> consider
> > > > >> > > > > having a tailored API to do simple membership/leader 
> > > > >> > > > > election.
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > > David
> > > > >> > > > >
> > > > >> > > > > On Fri, Jul 15, 2022 at 10:22 AM Ismael Juma 
> > > > >> > > > > <ism...@juma.me.uk
> > > > >
> > > > >> > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > Three quick comments:
> > > > >> > > > > >
> > > > >> > > > > > 1. Regarding java.util.regex.Pattern vs
> > > > >> com.google.re2j.Pattern, we
> > > > >> > > > > should
> > > > >> > > > > > document the differences in more detail before deciding one
> > > > way
> > > > >> or
> > > > >> > > > > another.
> > > > >> > > > > > That said, if people pass java.util.regex.Pattern, they 
> > > > >> > > > > > expect
> > > > >> > their
> > > > >> > > > > > semantics to be honored. If we are doing something 
> > > > >> > > > > > different,
> > > > >> then
> > > > >> > we
> > > > >> > > > > > should consider adding an overload with our own Pattern 
> > > > >> > > > > > class
> > > > (I
> > > > >> > > don't
> > > > >> > > > > > think we'd want to expose re2j's at this point).
> > > > >> > > > > > 2. Regarding topic ids, any major new protocol should
> > > > integrate
> > > > >> > fully
> > > > >> > > > > with
> > > > >> > > > > > it and should handle the topic recreation case correctly.
> > > > That's
> > > > >> > the
> > > > >> > > main
> > > > >> > > > > > part we need to handle. I agree with David that we'd want 
> > > > >> > > > > > to
> > > > add
> > > > >> > > topic
> > > > >> > > > > ids
> > > > >> > > > > > to the relevant protocols that don't have it yet and that 
> > > > >> > > > > > we
> > > > can
> > > > >> > > probably
> > > > >> > > > > > focus on the internals versus adding new APIs to the Java
> > > > >> Consumer
> > > > >> > > > > (unless
> > > > >> > > > > > we find that adding new APIs is required for reasonable
> > > > >> semantics).
> > > > >> > > > > > 3. I am still not sure about the coordinator storing the
> > > > >> configs.
> > > > >> > > It's
> > > > >> > > > > > powerful for configs to be centralized in the metadata log 
> > > > >> > > > > > for
> > > > >> > > various
> > > > >> > > > > > reasons (auditability, visibility, consistency, etc.).
> > > > >> Similarly, I
> > > > >> > > am
> > > > >> > > > > not
> > > > >> > > > > > sure about automatically deleting configs in a way that 
> > > > >> > > > > > they
> > > > >> cannot
> > > > >> > > be
> > > > >> > > > > > recovered. A good property for modern systems is to 
> > > > >> > > > > > minimize
> > > > the
> > > > >> > > number
> > > > >> > > > > of
> > > > >> > > > > > unrecoverable data loss scenarios.
> > > > >> > > > > >
> > > > >> > > > > > Ismael
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Jul 13, 2022 at 3:47 PM David Jacot
> > > > >> > > <dja...@confluent.io.invalid
> > > > >> > > > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Thanks Guozhang. My answers are below:
> > > > >> > > > > > >
> > > > >> > > > > > > > 1) the migration path, especially the last step when
> > > > clients
> > > > >> > > flip the
> > > > >> > > > > > > flag
> > > > >> > > > > > > > to enable the new protocol, in which we would have a
> > > > window
> > > > >> > where
> > > > >> > > > > both
> > > > >> > > > > > > new
> > > > >> > > > > > > > protocols / rpcs and old protocols / rpcs are used by
> > > > >> members
> > > > >> > of
> > > > >> > > the
> > > > >> > > > > same
> > > > >> > > > > > > > group. How the coordinator could "mimic" the old 
> > > > >> > > > > > > > behavior
> > > > >> while
> > > > >> > > > > using the
> > > > >> > > > > > > > new protocol is something we need to present about.
> > > > >> > > > > > >
> > > > >> > > > > > > Noted. I just published a new version of KIP which 
> > > > >> > > > > > > includes
> > > > >> more
> > > > >> > > > > > > details about this. See the "Supporting Online Consumer
> > > > Group
> > > > >> > > Upgrade"
> > > > >> > > > > > > and the "Compatibility, Deprecation, and Migration 
> > > > >> > > > > > > Plan". I
> > > > >> think
> > > > >> > > that
> > > > >> > > > > > > I have to think through a few cases now but the overall 
> > > > >> > > > > > > idea
> > > > >> and
> > > > >> > > > > > > mechanism should be understandable.
> > > > >> > > > > > >
> > > > >> > > > > > > > 2) the usage of topic ids. So far as KIP-516 the topic 
> > > > >> > > > > > > > ids
> > > > >> are
> > > > >> > > only
> > > > >> > > > > used
> > > > >> > > > > > > as
> > > > >> > > > > > > > part of RPCs and admin client, but they are not exposed
> > > > via
> > > > >> any
> > > > >> > > > > public
> > > > >> > > > > > > APIs
> > > > >> > > > > > > > to consumers yet. I think the question is, first 
> > > > >> > > > > > > > should we
> > > > >> let
> > > > >> > > the
> > > > >> > > > > > > consumer
> > > > >> > > > > > > > client to be maintaining the names -> ids mapping 
> > > > >> > > > > > > > itself
> > > > to
> > > > >> > fully
> > > > >> > > > > > > leverage
> > > > >> > > > > > > > on all the augmented existing RPCs and the new RPCs 
> > > > >> > > > > > > > with
> > > > the
> > > > >> > > topic
> > > > >> > > > > ids;
> > > > >> > > > > > > and
> > > > >> > > > > > > > secondly, should we ever consider exposing the topic 
> > > > >> > > > > > > > ids
> > > > in
> > > > >> the
> > > > >> > > > > consumer
> > > > >> > > > > > > > public APIs as well (both subscribe/assign, as well as 
> > > > >> > > > > > > > in
> > > > >> the
> > > > >> > > > > rebalance
> > > > >> > > > > > > > listener for cases like topic deletion-and-recreation).
> > > > >> > > > > > >
> > > > >> > > > > > > a) Assuming that we would include converting all the 
> > > > >> > > > > > > offsets
> > > > >> > > related
> > > > >> > > > > > > RPCs to using topic ids in this KIP, the consumer would 
> > > > >> > > > > > > be
> > > > >> able
> > > > >> > to
> > > > >> > > > > > > fully operate with topic ids. That being said, it still 
> > > > >> > > > > > > has
> > > > to
> > > > >> > > provide
> > > > >> > > > > > > the topics names in various APIs so having a mapping in 
> > > > >> > > > > > > the
> > > > >> > > consumer
> > > > >> > > > > > > seems inevitable to me.
> > > > >> > > > > > > b) I don't have a strong opinion on this. Here I wonder 
> > > > >> > > > > > > if
> > > > >> this
> > > > >> > > goes
> > > > >> > > > > > > beyond the scope of this KIP. I would rather focus on the
> > > > >> > internals
> > > > >> > > > > > > here and we can consider this separately if we see value 
> > > > >> > > > > > > in
> > > > >> doing
> > > > >> > > it.
> > > > >> > > > > > >
> > > > >> > > > > > > Coming back to Ismael's point about using topic ids in 
> > > > >> > > > > > > the
> > > > >> > > > > > > ConsumerGroupHeartbeatRequest, I think that there is one
> > > > >> > advantage
> > > > >> > > in
> > > > >> > > > > > > favour of it. The consumer will have the opportunity to
> > > > >> validate
> > > > >> > > that
> > > > >> > > > > > > the topics exists before passing them into the group
> > > > rebalance
> > > > >> > > > > > > protocol. Obviously, the coordinator will also notice it 
> > > > >> > > > > > > but
> > > > >> it
> > > > >> > > does
> > > > >> > > > > > > not really have a way to reject an invalid topic in the
> > > > >> response.
> > > > >> > > > > > >
> > > > >> > > > > > > > I'm agreeing with David on all other minor questions
> > > > except
> > > > >> for
> > > > >> > > the
> > > > >> > > > > > > > `subscribe(Pattern)` question: personally I think it's 
> > > > >> > > > > > > > not
> > > > >> > > necessary
> > > > >> > > > > to
> > > > >> > > > > > > > deprecate the subscribe API with Pattern, but instead 
> > > > >> > > > > > > > we
> > > > >> still
> > > > >> > > use
> > > > >> > > > > > > Pattern
> > > > >> > > > > > > > while just documenting that our subscription may be
> > > > >> rejected by
> > > > >> > > the
> > > > >> > > > > > > server.
> > > > >> > > > > > > > Since the incompatible case is a very rare scenario I 
> > > > >> > > > > > > > felt
> > > > >> > using
> > > > >> > > an
> > > > >> > > > > > > > overloaded `String` based subscription may be more
> > > > >> vulnerable
> > > > >> > to
> > > > >> > > > > various
> > > > >> > > > > > > > invalid regexes.
> > > > >> > > > > > >
> > > > >> > > > > > > That could work. I have to look at the differences 
> > > > >> > > > > > > between
> > > > the
> > > > >> > two
> > > > >> > > > > > > engines to better understand the potential issues. My
> > > > >> > > understanding is
> > > > >> > > > > > > that would work for all the basic regular expressions. 
> > > > >> > > > > > > The
> > > > >> > > differences
> > > > >> > > > > > > between the two are mainly about the various character
> > > > >> classes. I
> > > > >> > > > > > > wonder what other people think about this.
> > > > >> > > > > > >
> > > > >> > > > > > > Best,
> > > > >> > > > > > > David
> > > > >> > > > > > >
> > > > >> > > > > > > On Tue, Jul 12, 2022 at 11:28 PM Guozhang Wang <
> > > > >> > wangg...@gmail.com
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks David! I think on the high level there are two 
> > > > >> > > > > > > > meta
> > > > >> > > points we
> > > > >> > > > > need
> > > > >> > > > > > > > to concretize a bit more:
> > > > >> > > > > > > >
> > > > >> > > > > > > > 1) the migration path, especially the last step when
> > > > clients
> > > > >> > > flip the
> > > > >> > > > > > > flag
> > > > >> > > > > > > > to enable the new protocol, in which we would have a
> > > > window
> > > > >> > where
> > > > >> > > > > both
> > > > >> > > > > > > new
> > > > >> > > > > > > > protocols / rpcs and old protocols / rpcs are used by
> > > > >> members
> > > > >> > of
> > > > >> > > the
> > > > >> > > > > same
> > > > >> > > > > > > > group. How the coordinator could "mimic" the old 
> > > > >> > > > > > > > behavior
> > > > >> while
> > > > >> > > > > using the
> > > > >> > > > > > > > new protocol is something we need to present about.
> > > > >> > > > > > > > 2) the usage of topic ids. So far as KIP-516 the topic 
> > > > >> > > > > > > > ids
> > > > >> are
> > > > >> > > only
> > > > >> > > > > used
> > > > >> > > > > > > as
> > > > >> > > > > > > > part of RPCs and admin client, but they are not exposed
> > > > via
> > > > >> any
> > > > >> > > > > public
> > > > >> > > > > > > APIs
> > > > >> > > > > > > > to consumers yet. I think the question is, first 
> > > > >> > > > > > > > should we
> > > > >> let
> > > > >> > > the
> > > > >> > > > > > > consumer
> > > > >> > > > > > > > client to be maintaining the names -> ids mapping 
> > > > >> > > > > > > > itself
> > > > to
> > > > >> > fully
> > > > >> > > > > > > leverage
> > > > >> > > > > > > > on all the augmented existing RPCs and the new RPCs 
> > > > >> > > > > > > > with
> > > > the
> > > > >> > > topic
> > > > >> > > > > ids;
> > > > >> > > > > > > and
> > > > >> > > > > > > > secondly, should we ever consider exposing the topic 
> > > > >> > > > > > > > ids
> > > > in
> > > > >> the
> > > > >> > > > > consumer
> > > > >> > > > > > > > public APIs as well (both subscribe/assign, as well as 
> > > > >> > > > > > > > in
> > > > >> the
> > > > >> > > > > rebalance
> > > > >> > > > > > > > listener for cases like topic deletion-and-recreation).
> > > > >> > > > > > > >
> > > > >> > > > > > > > I'm agreeing with David on all other minor questions
> > > > except
> > > > >> for
> > > > >> > > the
> > > > >> > > > > > > > `subscribe(Pattern)` question: personally I think it's 
> > > > >> > > > > > > > not
> > > > >> > > necessary
> > > > >> > > > > to
> > > > >> > > > > > > > deprecate the subscribe API with Pattern, but instead 
> > > > >> > > > > > > > we
> > > > >> still
> > > > >> > > use
> > > > >> > > > > > > Pattern
> > > > >> > > > > > > > while just documenting that our subscription may be
> > > > >> rejected by
> > > > >> > > the
> > > > >> > > > > > > server.
> > > > >> > > > > > > > Since the incompatible case is a very rare scenario I 
> > > > >> > > > > > > > felt
> > > > >> > using
> > > > >> > > an
> > > > >> > > > > > > > overloaded `String` based subscription may be more
> > > > >> vulnerable
> > > > >> > to
> > > > >> > > > > various
> > > > >> > > > > > > > invalid regexes.
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Guozhang
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Tue, Jul 12, 2022 at 5:23 AM David Jacot
> > > > >> > > > > <dja...@confluent.io.invalid
> > > > >> > > > > > > >
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Ismael,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks for your feedback. Let me answer your 
> > > > >> > > > > > > > > questions
> > > > >> > inline.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 1. I think it's premature to talk about target
> > > > versions
> > > > >> for
> > > > >> > > > > > > deprecation
> > > > >> > > > > > > > > and
> > > > >> > > > > > > > > > removal of the existing group protocol. Unlike 
> > > > >> > > > > > > > > > KRaft,
> > > > >> this
> > > > >> > > > > affects a
> > > > >> > > > > > > core
> > > > >> > > > > > > > > > client protocol and hence deprecation/removal will 
> > > > >> > > > > > > > > > be
> > > > >> > heavily
> > > > >> > > > > > > dependent
> > > > >> > > > > > > > > on
> > > > >> > > > > > > > > > how quickly applications migrate to the new 
> > > > >> > > > > > > > > > protocol.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > That makes sense. I will remove it.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 2. The KIP says we intend to release this in 4.x, 
> > > > >> > > > > > > > > > but
> > > > it
> > > > >> > > wasn't
> > > > >> > > > > made
> > > > >> > > > > > > > > clear
> > > > >> > > > > > > > > > why. If we added that as a way to estimate when 
> > > > >> > > > > > > > > > we'd
> > > > >> > > deprecate
> > > > >> > > > > and
> > > > >> > > > > > > remove
> > > > >> > > > > > > > > > the group protocol, I also suggest removing this 
> > > > >> > > > > > > > > > part.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Let me explain my reasoning. As explained, I plan to
> > > > >> rewrite
> > > > >> > > the
> > > > >> > > > > group
> > > > >> > > > > > > > > coordinator in Java while we implement the new 
> > > > >> > > > > > > > > protocol.
> > > > >> This
> > > > >> > > means
> > > > >> > > > > > > > > that the internals will be slightly different (e.g.
> > > > >> threading
> > > > >> > > > > model).
> > > > >> > > > > > > > > Therefore, I wanted to tighten the switch from the 
> > > > >> > > > > > > > > old
> > > > >> group
> > > > >> > > > > > > > > coordinator to the new group coordinator to a major
> > > > >> release.
> > > > >> > > The
> > > > >> > > > > > > > > alternative would be to use a flag to do the switch
> > > > >> instead
> > > > >> > of
> > > > >> > > > > relying
> > > > >> > > > > > > > > on the software upgrade.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 3. We need to flesh out the details of the 
> > > > >> > > > > > > > > > migration
> > > > >> story.
> > > > >> > > It
> > > > >> > > > > sounds
> > > > >> > > > > > > > > like
> > > > >> > > > > > > > > > we're saying we will support online migrations. Is
> > > > that
> > > > >> > > correct?
> > > > >> > > > > We
> > > > >> > > > > > > > > should
> > > > >> > > > > > > > > > explain this in detail. It could also be done as a
> > > > >> separate
> > > > >> > > KIP,
> > > > >> > > > > if
> > > > >> > > > > > > it's
> > > > >> > > > > > > > > > easier.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Yes, we will support online migrations for the group.
> > > > That
> > > > >> > > means
> > > > >> > > > > that
> > > > >> > > > > > > > > a group using the old protocol will be able to 
> > > > >> > > > > > > > > switch to
> > > > >> the
> > > > >> > > new
> > > > >> > > > > > > > > protocol.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Let me briefly explain how that will work though. It 
> > > > >> > > > > > > > > is
> > > > >> > > basically a
> > > > >> > > > > > > > > four step process:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 1. The cluster must be upgraded or rolled to a 
> > > > >> > > > > > > > > software
> > > > >> > > supporting
> > > > >> > > > > the
> > > > >> > > > > > > > > new group coordinator. Both the old and the new
> > > > >> coordinator
> > > > >> > > will
> > > > >> > > > > > > > > support the old protocol and rely on the same 
> > > > >> > > > > > > > > persisted
> > > > >> > > metadata so
> > > > >> > > > > > > > > they can work together. This point is an offline
> > > > >> migration.
> > > > >> > We
> > > > >> > > > > cannot
> > > > >> > > > > > > > > do this one live because it would require shutting 
> > > > >> > > > > > > > > down
> > > > >> the
> > > > >> > > current
> > > > >> > > > > > > > > coordinator and starting up the new one and that 
> > > > >> > > > > > > > > would
> > > > >> cause
> > > > >> > > > > > > > > unavailabilities.
> > > > >> > > > > > > > > 2. The cluster's metadata version/IBP must be 
> > > > >> > > > > > > > > upgraded
> > > > to
> > > > >> X
> > > > >> > in
> > > > >> > > > > order
> > > > >> > > > > > > > > to enable the new protocol. This cannot be done 
> > > > >> > > > > > > > > before
> > > > 1)
> > > > >> is
> > > > >> > > > > > > > > terminated because the old coordinator doesn't 
> > > > >> > > > > > > > > support
> > > > the
> > > > >> > new
> > > > >> > > > > > > > > protocol.
> > > > >> > > > > > > > > 3. The consumers must be upgraded to a version
> > > > supporting
> > > > >> the
> > > > >> > > > > online
> > > > >> > > > > > > > > migration (must have KIP-792). If the consumer is
> > > > already
> > > > >> > > there.
> > > > >> > > > > > > > > Nothing must be done at this point.
> > > > >> > > > > > > > > 4. The consumers must be rolled with the feature flag
> > > > >> turned
> > > > >> > > on.
> > > > >> > > > > The
> > > > >> > > > > > > > > consumer group is automatically converted when the 
> > > > >> > > > > > > > > first
> > > > >> > > consumer
> > > > >> > > > > > > > > using the new protocol joins the group. While the
> > > > members
> > > > >> > > using the
> > > > >> > > > > > > > > old protocol are being upgraded, the old protocol is
> > > > >> proxied
> > > > >> > > into
> > > > >> > > > > the
> > > > >> > > > > > > > > new one.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Let me clarify all of this in the KIP.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 4. I am happy that we are pushing the pattern
> > > > >> subscriptions
> > > > >> > > to
> > > > >> > > > > the
> > > > >> > > > > > > > > server,
> > > > >> > > > > > > > > > but it seems like there could be some tricky
> > > > >> compatibility
> > > > >> > > > > issues.
> > > > >> > > > > > > Will
> > > > >> > > > > > > > > we
> > > > >> > > > > > > > > > have a mechanism for users to detect that they 
> > > > >> > > > > > > > > > need to
> > > > >> > update
> > > > >> > > > > their
> > > > >> > > > > > > regex
> > > > >> > > > > > > > > > before switching to the new protocol?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think that I am a bit more optimistic than you on 
> > > > >> > > > > > > > > this
> > > > >> > > point. I
> > > > >> > > > > > > > > believe that the majority of the cases are simple
> > > > regexes
> > > > >> > which
> > > > >> > > > > should
> > > > >> > > > > > > > > work with the new engine. The coordinator will verify
> > > > the
> > > > >> > regex
> > > > >> > > > > anyway
> > > > >> > > > > > > > > and reject the consumer if the regex is not valid.
> > > > Coming
> > > > >> > back
> > > > >> > > to
> > > > >> > > > > the
> > > > >> > > > > > > > > migration path, in the worst case, the first upgraded
> > > > >> > consumer
> > > > >> > > > > joining
> > > > >> > > > > > > > > the group will be rejected. This should be used as 
> > > > >> > > > > > > > > the
> > > > >> last
> > > > >> > > > > defence, I
> > > > >> > > > > > > > > would say.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > One way for customers to validate their regex before
> > > > >> > upgrading
> > > > >> > > > > their
> > > > >> > > > > > > > > prod would be to test them with another group. For
> > > > >> instance,
> > > > >> > > that
> > > > >> > > > > > > > > could be done in a pre-prod environment. Another way
> > > > >> would be
> > > > >> > > to
> > > > >> > > > > > > > > extend the consumer-group tool to provide a regex
> > > > >> validation
> > > > >> > > > > > > > > mechanism. Would this be enough in your opinion?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 5. Related to the last question, will the Java 
> > > > >> > > > > > > > > > client
> > > > >> allow
> > > > >> > > the
> > > > >> > > > > > > users to
> > > > >> > > > > > > > > > stick with the current regex engine for 
> > > > >> > > > > > > > > > compatibility
> > > > >> > > reasons?
> > > > >> > > > > For
> > > > >> > > > > > > > > example,
> > > > >> > > > > > > > > > it may be handy to keep using client based regex at
> > > > >> first
> > > > >> > to
> > > > >> > > keep
> > > > >> > > > > > > > > > migrations simple and then migrate to server based
> > > > >> regexes
> > > > >> > > as a
> > > > >> > > > > > > second
> > > > >> > > > > > > > > step.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I understand your point but I am concerned that this
> > > > would
> > > > >> > > allow
> > > > >> > > > > users
> > > > >> > > > > > > > > to actually stay in this mode. That would go against 
> > > > >> > > > > > > > > our
> > > > >> goal
> > > > >> > > of
> > > > >> > > > > > > > > simplifying the client because we would have to 
> > > > >> > > > > > > > > continue
> > > > >> > > monitoring
> > > > >> > > > > > > > > the metadata on the client side. I would rather not 
> > > > >> > > > > > > > > do
> > > > >> this.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 6. When we say that the group coordinator will be
> > > > >> > > responsible for
> > > > >> > > > > > > storing
> > > > >> > > > > > > > > > the configurations and that the configurations 
> > > > >> > > > > > > > > > will be
> > > > >> > > deleted
> > > > >> > > > > when
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > group is deleted. Will a transition to DEAD trigger
> > > > >> > deletion
> > > > >> > > of
> > > > >> > > > > > > > > > configurations?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > That's right. The configurations will be deleted when
> > > > the
> > > > >> > > group is
> > > > >> > > > > > > > > deleted. They go together.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 7. Will the choice to store the configs in the 
> > > > >> > > > > > > > > > group
> > > > >> > > coordinator
> > > > >> > > > > > > make it
> > > > >> > > > > > > > > > harder to list all cluster configs and their 
> > > > >> > > > > > > > > > values?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I don't think so. The group configurations are 
> > > > >> > > > > > > > > overrides
> > > > >> of
> > > > >> > > cluster
> > > > >> > > > > > > > > configs. If you want to know all the overrides 
> > > > >> > > > > > > > > though,
> > > > you
> > > > >> > > would
> > > > >> > > > > have
> > > > >> > > > > > > > > to ask all the group coordinators. You cannot rely on
> > > > the
> > > > >> > > metadata
> > > > >> > > > > log
> > > > >> > > > > > > > > for instance.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 8. How would someone configure a group before 
> > > > >> > > > > > > > > > starting
> > > > >> the
> > > > >> > > > > consumers?
> > > > >> > > > > > > > > Have
> > > > >> > > > > > > > > > we considered allowing the explicit creation of
> > > > groups?
> > > > >> > > > > > > Alternatively,
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > configs could be decoupled from the group 
> > > > >> > > > > > > > > > lifecycle.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Yes. The group will be automatically created in this
> > > > case.
> > > > >> > > However,
> > > > >> > > > > > > > > the configs will be lost after the retention period 
> > > > >> > > > > > > > > of
> > > > the
> > > > >> > > group
> > > > >> > > > > > > > > passes.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 9. Will the Consumer.subscribe method for the Java
> > > > >> client
> > > > >> > > still
> > > > >> > > > > take
> > > > >> > > > > > > a
> > > > >> > > > > > > > > > `java.util.regex.Pattern` of do we have to 
> > > > >> > > > > > > > > > introduce
> > > > an
> > > > >> > > overload?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > That's a very group question. I forgot about that 
> > > > >> > > > > > > > > one.
> > > > As
> > > > >> the
> > > > >> > > > > > > > > `java.util.regex.Pattern` is not fully compatible 
> > > > >> > > > > > > > > with
> > > > the
> > > > >> > > engine
> > > > >> > > > > that
> > > > >> > > > > > > > > we plan to use, it might be better to deprecate it 
> > > > >> > > > > > > > > and
> > > > >> use an
> > > > >> > > > > overload
> > > > >> > > > > > > > > which takes a string. We would rely on the server 
> > > > >> > > > > > > > > side
> > > > >> > > validation.
> > > > >> > > > > > > > > During the migration, I think that we could still 
> > > > >> > > > > > > > > try to
> > > > >> > > toString
> > > > >> > > > > the
> > > > >> > > > > > > > > regex and use it. That should work, I think, in the
> > > > >> majority
> > > > >> > > of the
> > > > >> > > > > > > > > cases.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 10. I agree with Justine that we should be clearer
> > > > about
> > > > >> > the
> > > > >> > > > > reason
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > switch to IBP/metadata.version from the feature 
> > > > >> > > > > > > > > > flag.
> > > > >> Maybe
> > > > >> > > we
> > > > >> > > > > mean
> > > > >> > > > > > > that
> > > > >> > > > > > > > > we
> > > > >> > > > > > > > > > can switch the default for the feature flag to true
> > > > >> based
> > > > >> > on
> > > > >> > > the
> > > > >> > > > > > > > > > metadata.version once we want to make it the 
> > > > >> > > > > > > > > > default.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > My plan was to use that feature flag mainly during 
> > > > >> > > > > > > > > the
> > > > >> > > development
> > > > >> > > > > > > > > phase. I should not have mentioned it, I think, 
> > > > >> > > > > > > > > because
> > > > we
> > > > >> > > could
> > > > >> > > > > use
> > > > >> > > > > > > > > an internal config for it.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 11. Some of the protocol APIs don't mention the
> > > > required
> > > > >> > > ACLs, it
> > > > >> > > > > > > would
> > > > >> > > > > > > > > be
> > > > >> > > > > > > > > > good to add that for consistency.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Noted.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat
> > > > requires
> > > > >> > > "Read
> > > > >> > > > > Group"
> > > > >> > > > > > > > > even
> > > > >> > > > > > > > > > though it seems to do more than reading.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I agree. This is how the current protocol works 
> > > > >> > > > > > > > > though.
> > > > We
> > > > >> > only
> > > > >> > > > > > > > > require "Read Group" to join a group. We could 
> > > > >> > > > > > > > > consider
> > > > >> > > changing
> > > > >> > > > > this
> > > > >> > > > > > > > > but I am not sure that it is worth it.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 13. How is topic recreation handled by the consumer
> > > > with
> > > > >> > the
> > > > >> > > new
> > > > >> > > > > > > group
> > > > >> > > > > > > > > > protocol? It would be good to have a section on 
> > > > >> > > > > > > > > > this.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Noted. From a protocol perspective, the new topic 
> > > > >> > > > > > > > > will
> > > > >> have a
> > > > >> > > new
> > > > >> > > > > > > > > topic id so it will treat it like a topic with a
> > > > different
> > > > >> > > name.
> > > > >> > > > > The
> > > > >> > > > > > > > > only issue is that the fetch/commit offsets APIs do 
> > > > >> > > > > > > > > not
> > > > >> > support
> > > > >> > > > > topic
> > > > >> > > > > > > > > IDs so the consumer would reuse the offsets based on 
> > > > >> > > > > > > > > the
> > > > >> > same.
> > > > >> > > I
> > > > >> > > > > think
> > > > >> > > > > > > > > that we should update those APIs as well in order to 
> > > > >> > > > > > > > > be
> > > > >> > > consistent
> > > > >> > > > > end
> > > > >> > > > > > > > > to end. That would strengthen the semantics of the
> > > > >> consumer.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 14. The KIP mentions we will write the new 
> > > > >> > > > > > > > > > coordinator
> > > > >> in
> > > > >> > > Java.
> > > > >> > > > > Even
> > > > >> > > > > > > > > though
> > > > >> > > > > > > > > > this is an implementation detail, do we plan to 
> > > > >> > > > > > > > > > have a
> > > > >> new
> > > > >> > > gradle
> > > > >> > > > > > > module
> > > > >> > > > > > > > > > for it?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Yes.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 15. Do we have a scalability goal when it comes to 
> > > > >> > > > > > > > > > how
> > > > >> many
> > > > >> > > > > members
> > > > >> > > > > > > the
> > > > >> > > > > > > > > new
> > > > >> > > > > > > > > > group protocol can support?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > We don't have numbers at the moment. The protocol 
> > > > >> > > > > > > > > should
> > > > >> > > support
> > > > >> > > > > 1000s
> > > > >> > > > > > > > > of members per group. We will measure this when we 
> > > > >> > > > > > > > > have
> > > > a
> > > > >> > first
> > > > >> > > > > > > > > implementation. Note that we might have other
> > > > bottlenecks
> > > > >> > down
> > > > >> > > the
> > > > >> > > > > > > > > road (e.g. offset commits).
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > 16. Did we consider having SubscribedTopidIds 
> > > > >> > > > > > > > > > instead
> > > > of
> > > > >> > > > > > > > > > SubscribedTopicNames in 
> > > > >> > > > > > > > > > ConsumerGroupHeartbeatRequest?
> > > > >> Is
> > > > >> > the
> > > > >> > > > > idea
> > > > >> > > > > > > that
> > > > >> > > > > > > > > > since we have to resolve the regex on the server, 
> > > > >> > > > > > > > > > we
> > > > >> can do
> > > > >> > > the
> > > > >> > > > > same
> > > > >> > > > > > > for
> > > > >> > > > > > > > > > the topic name? The difference is that sending the
> > > > >> regex is
> > > > >> > > more
> > > > >> > > > > > > > > efficient
> > > > >> > > > > > > > > > whereas sending the topic names is less efficient.
> > > > >> > > Furthermore,
> > > > >> > > > > > > delete
> > > > >> > > > > > > > > and
> > > > >> > > > > > > > > > recreation is easier to handle if we have topic 
> > > > >> > > > > > > > > > ids.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > The idea was to consolidate the metadata lookup on 
> > > > >> > > > > > > > > the
> > > > >> server
> > > > >> > > for
> > > > >> > > > > both
> > > > >> > > > > > > > > paths but I do agree with your point. As a second
> > > > though,
> > > > >> > using
> > > > >> > > > > topic
> > > > >> > > > > > > > > ids may be better here for the delete and recreation
> > > > case.
> > > > >> > > Also, I
> > > > >> > > > > > > > > suppose that we may allow users to subscribe with 
> > > > >> > > > > > > > > topic
> > > > >> ids
> > > > >> > in
> > > > >> > > the
> > > > >> > > > > > > > > future because that is the only way to be really 
> > > > >> > > > > > > > > robust
> > > > to
> > > > >> > > topic
> > > > >> > > > > > > > > re-creation.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Best,
> > > > >> > > > > > > > > David
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Tue, Jul 12, 2022 at 1:38 PM David Jacot <
> > > > >> > > dja...@confluent.io>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Hi Justine,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks for your comments. Please find my answers
> > > > below.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > - Yes, the new protocol relies on topic IDs with 
> > > > >> > > > > > > > > > the
> > > > >> > > exception
> > > > >> > > > > of the
> > > > >> > > > > > > > > > topic names based in the
> > > > ConsumerGroupHeartbeatRequest.
> > > > >> I
> > > > >> > am
> > > > >> > > not
> > > > >> > > > > sure
> > > > >> > > > > > > > > > if using topic names is the right call here. I 
> > > > >> > > > > > > > > > need to
> > > > >> > think
> > > > >> > > > > about it
> > > > >> > > > > > > > > > a little more. Obviously, the KIP does not change 
> > > > >> > > > > > > > > > the
> > > > >> > > > > fetch/commit
> > > > >> > > > > > > > > > offsets RPCs to use topic IDs. This may be 
> > > > >> > > > > > > > > > something
> > > > >> that
> > > > >> > we
> > > > >> > > > > should
> > > > >> > > > > > > > > > include though as it would give better overall
> > > > >> guarantee in
> > > > >> > > the
> > > > >> > > > > > > > > > producer.
> > > > >> > > > > > > > > > - You're right. I think that I should not have
> > > > mentioned
> > > > >> > this
> > > > >> > > > > flag at
> > > > >> > > > > > > > > > all. I will remove it. We can use an internal
> > > > >> configuration
> > > > >> > > while
> > > > >> > > > > > > > > > developing the feature.
> > > > >> > > > > > > > > > - Both cluster types will be supported. The change 
> > > > >> > > > > > > > > > is
> > > > >> > > > > orthogonal. The
> > > > >> > > > > > > > > > only requirement is that the cluster uses topic 
> > > > >> > > > > > > > > > IDs.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Best,
> > > > >> > > > > > > > > > David
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Mon, Jul 11, 2022 at 9:53 PM Guozhang Wang <
> > > > >> > > > > wangg...@gmail.com>
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Hi Ismael,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thanks for the feedback. Here are some replies
> > > > inlined
> > > > >> > > below:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Sat, Jul 9, 2022 at 2:53 AM Ismael Juma <
> > > > >> > > ism...@juma.me.uk>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks for the KIP. This has the potential to 
> > > > >> > > > > > > > > > > > be a
> > > > >> > great
> > > > >> > > > > > > > > improvement. A few
> > > > >> > > > > > > > > > > > initial questions/comments:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > 1. I think it's premature to talk about target
> > > > >> versions
> > > > >> > > for
> > > > >> > > > > > > > > deprecation and
> > > > >> > > > > > > > > > > > removal of the existing group protocol. Unlike
> > > > >> KRaft,
> > > > >> > > this
> > > > >> > > > > > > affects a
> > > > >> > > > > > > > > core
> > > > >> > > > > > > > > > > > client protocol and hence deprecation/removal 
> > > > >> > > > > > > > > > > > will
> > > > >> be
> > > > >> > > heavily
> > > > >> > > > > > > > > dependent on
> > > > >> > > > > > > > > > > > how quickly applications migrate to the new
> > > > >> protocol.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Yeah I agree with you. I think we can remove the
> > > > >> proposed
> > > > >> > > > > timeline
> > > > >> > > > > > > in
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > `Compatibility, Deprecation, and Migration Plan` 
> > > > >> > > > > > > > > > > and
> > > > >> > > instead
> > > > >> > > > > just
> > > > >> > > > > > > state
> > > > >> > > > > > > > > > > that we will decide in the future about when we
> > > > would
> > > > >> > > > > deprecate old
> > > > >> > > > > > > > > > > protocol and behaviors.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 2. The KIP says we intend to release this in 
> > > > >> > > > > > > > > > > > 4.x,
> > > > >> but
> > > > >> > it
> > > > >> > > > > wasn't
> > > > >> > > > > > > made
> > > > >> > > > > > > > > clear
> > > > >> > > > > > > > > > > > why. If we added that as a way to estimate when
> > > > we'd
> > > > >> > > > > deprecate
> > > > >> > > > > > > and
> > > > >> > > > > > > > > remove
> > > > >> > > > > > > > > > > > the group protocol, I also suggest removing 
> > > > >> > > > > > > > > > > > this
> > > > >> part.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I think that's not specifically related to the
> > > > >> > > > > deprecation/removal
> > > > >> > > > > > > > > timeline
> > > > >> > > > > > > > > > > plan, but it's more for client upgrades. I.e. the
> > > > >> > > broker-side
> > > > >> > > > > > > > > > > implementation may be done first, and then the
> > > > client
> > > > >> > side,
> > > > >> > > > > and we
> > > > >> > > > > > > > > would
> > > > >> > > > > > > > > > > only mark it as "released" by the time clients
> > > > >> > > implementations
> > > > >> > > > > are
> > > > >> > > > > > > > > done. At
> > > > >> > > > > > > > > > > that time, to enable the feature the clients 
> > > > >> > > > > > > > > > > need to
> > > > >> > first
> > > > >> > > > > swap-in
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > > bytecode with a rolling bounce and then set the 
> > > > >> > > > > > > > > > > flag
> > > > >> > with a
> > > > >> > > > > second
> > > > >> > > > > > > > > rolling
> > > > >> > > > > > > > > > > bounce, and hence we feel it's better to be 
> > > > >> > > > > > > > > > > released
> > > > >> in a
> > > > >> > > major
> > > > >> > > > > > > > > version.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 3. We need to flesh out the details of the
> > > > migration
> > > > >> > > story.
> > > > >> > > > > It
> > > > >> > > > > > > > > sounds like
> > > > >> > > > > > > > > > > > we're saying we will support online 
> > > > >> > > > > > > > > > > > migrations. Is
> > > > >> that
> > > > >> > > > > correct?
> > > > >> > > > > > > We
> > > > >> > > > > > > > > should
> > > > >> > > > > > > > > > > > explain this in detail. It could also be done 
> > > > >> > > > > > > > > > > > as a
> > > > >> > > separate
> > > > >> > > > > KIP,
> > > > >> > > > > > > if
> > > > >> > > > > > > > > it's
> > > > >> > > > > > > > > > > > easier.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Yes I think that's the part we can be more 
> > > > >> > > > > > > > > > > concrete
> > > > >> about
> > > > >> > > for
> > > > >> > > > > sure
> > > > >> > > > > > > (and
> > > > >> > > > > > > > > > > this is related to your question 2) above). We 
> > > > >> > > > > > > > > > > will
> > > > >> work
> > > > >> > on
> > > > >> > > > > making
> > > > >> > > > > > > it
> > > > >> > > > > > > > > more
> > > > >> > > > > > > > > > > explicit in parallel as we solicit more feedback.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 4. I am happy that we are pushing the pattern
> > > > >> > > subscriptions
> > > > >> > > > > to
> > > > >> > > > > > > the
> > > > >> > > > > > > > > server,
> > > > >> > > > > > > > > > > > but it seems like there could be some tricky
> > > > >> > > compatibility
> > > > >> > > > > > > issues.
> > > > >> > > > > > > > > Will we
> > > > >> > > > > > > > > > > > have a mechanism for users to detect that they
> > > > need
> > > > >> to
> > > > >> > > update
> > > > >> > > > > > > their
> > > > >> > > > > > > > > regex
> > > > >> > > > > > > > > > > > before switching to the new protocol?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Yes I think we need some tooling for non-java 
> > > > >> > > > > > > > > > > client
> > > > >> > users
> > > > >> > > to
> > > > >> > > > > sort
> > > > >> > > > > > > of
> > > > >> > > > > > > > > > > "dry-run" the client before switching to the new
> > > > >> > protocol.
> > > > >> > > I
> > > > >> > > > > do not
> > > > >> > > > > > > > > have a
> > > > >> > > > > > > > > > > specific idea on top of my head though, maybe 
> > > > >> > > > > > > > > > > others
> > > > >> like
> > > > >> > > @Matt
> > > > >> > > > > > > > > Howlett can
> > > > >> > > > > > > > > > > chime-in here?
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 5. Related to the last question, will the Java
> > > > >> client
> > > > >> > > allow
> > > > >> > > > > the
> > > > >> > > > > > > > > users to
> > > > >> > > > > > > > > > > > stick with the current regex engine for
> > > > >> compatibility
> > > > >> > > > > reasons?
> > > > >> > > > > > > For
> > > > >> > > > > > > > > example,
> > > > >> > > > > > > > > > > > it may be handy to keep using client based 
> > > > >> > > > > > > > > > > > regex
> > > > at
> > > > >> > > first to
> > > > >> > > > > keep
> > > > >> > > > > > > > > > > > migrations simple and then migrate to server 
> > > > >> > > > > > > > > > > > based
> > > > >> > > regexes
> > > > >> > > > > as a
> > > > >> > > > > > > > > second
> > > > >> > > > > > > > > > > > step.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Honestly I have not thought about that for java
> > > > >> clients,
> > > > >> > > and
> > > > >> > > > > we can
> > > > >> > > > > > > > > discuss
> > > > >> > > > > > > > > > > that. What kind of compatibility issues do you 
> > > > >> > > > > > > > > > > have
> > > > in
> > > > >> > > mind?
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 6. When we say that the group coordinator will 
> > > > >> > > > > > > > > > > > be
> > > > >> > > > > responsible for
> > > > >> > > > > > > > > storing
> > > > >> > > > > > > > > > > > the configurations and that the configurations
> > > > will
> > > > >> be
> > > > >> > > > > deleted
> > > > >> > > > > > > when
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > group is deleted. Will a transition to DEAD
> > > > trigger
> > > > >> > > deletion
> > > > >> > > > > of
> > > > >> > > > > > > > > > > > configurations?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Yes, since the DEAD state is an ending state (we
> > > > would
> > > > >> > only
> > > > >> > > > > > > transit to
> > > > >> > > > > > > > > that
> > > > >> > > > > > > > > > > state when the group is EMPTY and also all of its
> > > > >> > metadata
> > > > >> > > are
> > > > >> > > > > > > gone),
> > > > >> > > > > > > > > once
> > > > >> > > > > > > > > > > it's transited to DEAD this group would never be
> > > > >> revived.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 7. Will the choice to store the configs in the
> > > > group
> > > > >> > > > > coordinator
> > > > >> > > > > > > > > make it
> > > > >> > > > > > > > > > > > harder to list all cluster configs and their
> > > > values?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > That's a good question, and our thoughts are that
> > > > the
> > > > >> > > so-called
> > > > >> > > > > > > "group
> > > > >> > > > > > > > > > > configurations" are overrides of the 
> > > > >> > > > > > > > > > > cluster-level
> > > > >> > > > > configurations
> > > > >> > > > > > > > > > > customized per group so when an admin list 
> > > > >> > > > > > > > > > > cluster
> > > > >> > configs
> > > > >> > > it's
> > > > >> > > > > > > okay to
> > > > >> > > > > > > > > > > list just the cluster-level "defaults", not 
> > > > >> > > > > > > > > > > showing
> > > > >> any
> > > > >> > > > > per-group
> > > > >> > > > > > > > > > > customizations.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 8. How would someone configure a group before
> > > > >> starting
> > > > >> > > the
> > > > >> > > > > > > > > consumers? Have
> > > > >> > > > > > > > > > > > we considered allowing the explicit creation of
> > > > >> groups?
> > > > >> > > > > > > > > Alternatively, the
> > > > >> > > > > > > > > > > > configs could be decoupled from the group
> > > > lifecycle.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > The configs can be created before the group 
> > > > >> > > > > > > > > > > itself
> > > > as
> > > > >> an
> > > > >> > > > > > > independent
> > > > >> > > > > > > > > entity
> > > > >> > > > > > > > > > > --- of course, this requires the corresponding
> > > > >> request to
> > > > >> > > be
> > > > >> > > > > > > routed to
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > right coordinator based on the group id --- the 
> > > > >> > > > > > > > > > > only
> > > > >> > thing
> > > > >> > > that
> > > > >> > > > > > > > > differs is,
> > > > >> > > > > > > > > > > when the group itself is gone we also check if 
> > > > >> > > > > > > > > > > there
> > > > >> are
> > > > >> > > any
> > > > >> > > > > > > > > configuration
> > > > >> > > > > > > > > > > entities related to that group and delete as 
> > > > >> > > > > > > > > > > well.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Admittedly this indeed introduces an asymmetry on
> > > > the
> > > > >> > > creation
> > > > >> > > > > /
> > > > >> > > > > > > > > deletion
> > > > >> > > > > > > > > > > lifecycles of the config entities, and we would 
> > > > >> > > > > > > > > > > like
> > > > >> to
> > > > >> > > hear
> > > > >> > > > > > > everyone's
> > > > >> > > > > > > > > > > feelings whether we should aim for symmetry i.e.
> > > > >> totally
> > > > >> > > > > decouple
> > > > >> > > > > > > group
> > > > >> > > > > > > > > > > configs and hence not delete them at all when the
> > > > >> group
> > > > >> > is
> > > > >> > > > > gone,
> > > > >> > > > > > > but
> > > > >> > > > > > > > > always
> > > > >> > > > > > > > > > > require explicit deletion operations by 
> > > > >> > > > > > > > > > > themselves.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 9. Will the Consumer.subscribe method for the 
> > > > >> > > > > > > > > > > > Java
> > > > >> > client
> > > > >> > > > > still
> > > > >> > > > > > > take
> > > > >> > > > > > > > > a
> > > > >> > > > > > > > > > > > `java.util.regex.Pattern` of do we have to
> > > > >> introduce an
> > > > >> > > > > overload?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I think we do not need to introduce an overload, 
> > > > >> > > > > > > > > > > but
> > > > >> I'm
> > > > >> > > all
> > > > >> > > > > ears
> > > > >> > > > > > > if
> > > > >> > > > > > > > > there
> > > > >> > > > > > > > > > > may be some compatibility issues that we may
> > > > overlook.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 10. I agree with Justine that we should be 
> > > > >> > > > > > > > > > > > clearer
> > > > >> > about
> > > > >> > > the
> > > > >> > > > > > > reason
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > > switch to IBP/metadata.version from the feature
> > > > >> flag.
> > > > >> > > Maybe
> > > > >> > > > > we
> > > > >> > > > > > > mean
> > > > >> > > > > > > > > that we
> > > > >> > > > > > > > > > > > can switch the default for the feature flag to
> > > > true
> > > > >> > > based on
> > > > >> > > > > the
> > > > >> > > > > > > > > > > > metadata.version once we want to make it the
> > > > >> default.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > 11. Some of the protocol APIs don't mention the
> > > > >> required
> > > > >> > > ACLs,
> > > > >> > > > > it
> > > > >> > > > > > > > > would be
> > > > >> > > > > > > > > > > > good to add that for consistency.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Ack, we can certainly do that.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat
> > > > >> > requires
> > > > >> > > > > "Read
> > > > >> > > > > > > > > Group" even
> > > > >> > > > > > > > > > > > though it seems to do more than reading.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I had that thought myself as well, but in the 
> > > > >> > > > > > > > > > > end we
> > > > >> > could
> > > > >> > > not
> > > > >> > > > > > > find a
> > > > >> > > > > > > > > > > better alternative: adding Write Group seems an
> > > > >> overkill
> > > > >> > > here
> > > > >> > > > > > > since we
> > > > >> > > > > > > > > do
> > > > >> > > > > > > > > > > not have it elsewhere (we only have Read / Delete
> > > > and
> > > > >> > > Describe
> > > > >> > > > > on
> > > > >> > > > > > > > > groups so
> > > > >> > > > > > > > > > > far). Would like to hear others thoughts.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 13. How is topic recreation handled by the
> > > > consumer
> > > > >> > with
> > > > >> > > the
> > > > >> > > > > new
> > > > >> > > > > > > > > group
> > > > >> > > > > > > > > > > > protocol? It would be good to have a section on
> > > > >> this.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > You mean with regex subscription right? Yes we 
> > > > >> > > > > > > > > > > can
> > > > >> add a
> > > > >> > > > > section
> > > > >> > > > > > > about
> > > > >> > > > > > > > > > > that, but basically the idea is that consumer 
> > > > >> > > > > > > > > > > would
> > > > be
> > > > >> > > totally
> > > > >> > > > > > > > > agnostic in
> > > > >> > > > > > > > > > > the new protocol as it's handled all by the 
> > > > >> > > > > > > > > > > brokers.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 14. The KIP mentions we will write the new
> > > > >> coordinator
> > > > >> > in
> > > > >> > > > > Java.
> > > > >> > > > > > > Even
> > > > >> > > > > > > > > though
> > > > >> > > > > > > > > > > > this is an implementation detail, do we plan to
> > > > >> have a
> > > > >> > > new
> > > > >> > > > > gradle
> > > > >> > > > > > > > > module
> > > > >> > > > > > > > > > > > for it?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > We have not thought about that. But I think the
> > > > answer
> > > > >> > > should
> > > > >> > > > > be
> > > > >> > > > > > > yes.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 15. Do we have a scalability goal when it 
> > > > >> > > > > > > > > > > > comes to
> > > > >> how
> > > > >> > > many
> > > > >> > > > > > > members
> > > > >> > > > > > > > > the new
> > > > >> > > > > > > > > > > > group protocol can support?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Within a group, I think we should shoot for 
> > > > >> > > > > > > > > > > 1000s of
> > > > >> > > members.
> > > > >> > > > > But
> > > > >> > > > > > > that
> > > > >> > > > > > > > > > > scalability goals also depend on the offset
> > > > management
> > > > >> > > (commit,
> > > > >> > > > > > > fetch)
> > > > >> > > > > > > > > > > capabilities of the coordinator which we did not
> > > > >> cover in
> > > > >> > > this
> > > > >> > > > > > > KIP, so
> > > > >> > > > > > > > > it's
> > > > >> > > > > > > > > > > hard to give a number that applies universally.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 16. Did we consider having SubscribedTopidIds
> > > > >> instead
> > > > >> > of
> > > > >> > > > > > > > > > > > SubscribedTopicNames in
> > > > >> ConsumerGroupHeartbeatRequest?
> > > > >> > > Is the
> > > > >> > > > > > > idea
> > > > >> > > > > > > > > that
> > > > >> > > > > > > > > > > > since we have to resolve the regex on the 
> > > > >> > > > > > > > > > > > server,
> > > > we
> > > > >> > can
> > > > >> > > do
> > > > >> > > > > the
> > > > >> > > > > > > same
> > > > >> > > > > > > > > for
> > > > >> > > > > > > > > > > > the topic name? The difference is that sending 
> > > > >> > > > > > > > > > > > the
> > > > >> > regex
> > > > >> > > is
> > > > >> > > > > more
> > > > >> > > > > > > > > efficient
> > > > >> > > > > > > > > > > > whereas sending the topic names is less 
> > > > >> > > > > > > > > > > > efficient.
> > > > >> > > > > Furthermore,
> > > > >> > > > > > > > > delete and
> > > > >> > > > > > > > > > > > recreation is easier to handle if we have topic
> > > > ids.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > The main reason to still let the clients send 
> > > > >> > > > > > > > > > > names
> > > > >> is to
> > > > >> > > keep
> > > > >> > > > > the
> > > > >> > > > > > > > > > > reasoning of names -> ids on the broker / admin
> > > > client
> > > > >> > > only.
> > > > >> > > > > Note
> > > > >> > > > > > > that
> > > > >> > > > > > > > > > > although we added topic id in KIP-516, we never
> > > > >> > > implemented the
> > > > >> > > > > > > logic
> > > > >> > > > > > > > > on
> > > > >> > > > > > > > > > > consumer/producers leveraging the related newer
> > > > >> versioned
> > > > >> > > RPCs,
> > > > >> > > > > > > > > instead we
> > > > >> > > > > > > > > > > just set the topic id as empty UUID. We want to 
> > > > >> > > > > > > > > > > keep
> > > > >> the
> > > > >> > > > > > > > > consumer/producer
> > > > >> > > > > > > > > > > to be thin and only delegate the reasoning on 
> > > > >> > > > > > > > > > > broker
> > > > >> and
> > > > >> > > > > > > potentially
> > > > >> > > > > > > > > admin
> > > > >> > > > > > > > > > > clients.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > Ismael
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Wed, Jul 6, 2022 at 10:45 AM David Jacot
> > > > >> > > > > > > > > <dja...@confluent.io.invalid>
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Hi all,
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > I would like to start a discussion thread on
> > > > >> KIP-848:
> > > > >> > > The
> > > > >> > > > > Next
> > > > >> > > > > > > > > > > > > Generation of the Consumer Rebalance 
> > > > >> > > > > > > > > > > > > Protocol.
> > > > >> With
> > > > >> > > this
> > > > >> > > > > KIP,
> > > > >> > > > > > > we
> > > > >> > > > > > > > > aim
> > > > >> > > > > > > > > > > > > to make the rebalance protocol (for 
> > > > >> > > > > > > > > > > > > consumers)
> > > > >> more
> > > > >> > > > > reliable,
> > > > >> > > > > > > more
> > > > >> > > > > > > > > > > > > scalable, easier to implement for clients, 
> > > > >> > > > > > > > > > > > > and
> > > > >> easier
> > > > >> > > to
> > > > >> > > > > debug
> > > > >> > > > > > > for
> > > > >> > > > > > > > > > > > > operators.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > The KIP is here:
> > > > >> > > > > https://cwiki.apache.org/confluence/x/HhD1D.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Please take a look and let me know what you
> > > > think.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > > > David
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > PS: I will be away from July 18th to August 
> > > > >> > > > > > > > > > > > > 8th.
> > > > >> That
> > > > >> > > gives
> > > > >> > > > > > > you a
> > > > >> > > > > > > > > bit
> > > > >> > > > > > > > > > > > > of time to read and digest this long KIP.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > --
> > > > >> > > > > > > > > > > -- Guozhang
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > > -- Guozhang
> > > > >> > > > > > >
> > > > >> > > > >
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> -- Guozhang
> > > > >>
> > > > >
> > > >

Reply via email to