Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

David Jacot Fri, 15 Jul 2022 05:04:44 -0700

Thanks, Ismael.

> 1. Regarding java.util.regex.Pattern vs com.google.re2j.Pattern, we should
> document the differences in more detail before deciding one way or another.
> That said, if people pass java.util.regex.Pattern, they expect their
> semantics to be honored. If we are doing something different, then we
> should consider adding an overload with our own Pattern class (I don't
> think we'd want to expose re2j's at this point).


Yeah. I do agree with you. I looked a bit more into re2j and the
differences are subtle. Simple regexes will work with both but special
constructs or character classes won't work. I like the idea of having
our own Pattern. That would make things really clear and avoid
compatibility issues. I do agree that we should not expose re2j's
Pattern in our public API.

> 2. Regarding topic ids, any major new protocol should integrate fully with
> it and should handle the topic recreation case correctly. That's the main
> part we need to handle. I agree with David that we'd want to add topic ids
> to the relevant protocols that don't have it yet and that we can probably
> focus on the internals versus adding new APIs to the Java Consumer (unless
> we find that adding new APIs is required for reasonable semantics).

I wonder what we should do with the admin client though. We have a
couple of apis related to offsets there. Do we want to allow users to
fetch, get or delete offsets based on topic ids? On the server side,
we will have to store the offsets based on the topic id. When a topic
is recreated, we will likely have offsets stored with both the old
topic id and the new topic id. From an admin perspective, we can't
really differentiate them without letting the user use the topic id
and providing the topic ids to him as well. One way would be to
disambiguate them based on the group current assignment. I need to
think a little more about this.

> 3. I am still not sure about the coordinator storing the configs. It's
> powerful for configs to be centralized in the metadata log for various
> reasons (auditability, visibility, consistency, etc.). Similarly, I am not
> sure about automatically deleting configs in a way that they cannot be
> recovered. A good property for modern systems is to minimize the number of
> unrecoverable data loss scenarios.

I see your point. One concern that I have regarding storing them in
the controller is that they will never be deleted unless the user does
it. As users consider groups as throwing away resources, it will never
happen. If we draw a parallel with topics, we do delete their configs
when they are deleted. It is also worth pointing out that offsets are
deleted when the group is expired and that is even worse than deleting
configs in my opinion. One way would be to store configs in the
controller and to let the coordinator delete them when a group is
deleted. I suppose that the problem is the same because the configs
are also automatically deleted. Things would be different if groups
were a first class resource in the cluster.

Best,
David

On Fri, Jul 15, 2022 at 1:30 PM David Jacot <dja...@confluent.io> wrote:
>
> Thanks Hector! Our goal is to move forward with specialized API
> instead of relying on one generic API. For Connect, we can apply the
> exact same pattern and reuse/share the core implementation on the
> server side. For the schema registry, I think that we should consider
> having a tailored API to do simple membership/leader election.
>
> Best,
> David
>
> On Fri, Jul 15, 2022 at 10:22 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > Three quick comments:
> >
> > 1. Regarding java.util.regex.Pattern vs com.google.re2j.Pattern, we should
> > document the differences in more detail before deciding one way or another.
> > That said, if people pass java.util.regex.Pattern, they expect their
> > semantics to be honored. If we are doing something different, then we
> > should consider adding an overload with our own Pattern class (I don't
> > think we'd want to expose re2j's at this point).
> > 2. Regarding topic ids, any major new protocol should integrate fully with
> > it and should handle the topic recreation case correctly. That's the main
> > part we need to handle. I agree with David that we'd want to add topic ids
> > to the relevant protocols that don't have it yet and that we can probably
> > focus on the internals versus adding new APIs to the Java Consumer (unless
> > we find that adding new APIs is required for reasonable semantics).
> > 3. I am still not sure about the coordinator storing the configs. It's
> > powerful for configs to be centralized in the metadata log for various
> > reasons (auditability, visibility, consistency, etc.). Similarly, I am not
> > sure about automatically deleting configs in a way that they cannot be
> > recovered. A good property for modern systems is to minimize the number of
> > unrecoverable data loss scenarios.
> >
> > Ismael
> >
> > On Wed, Jul 13, 2022 at 3:47 PM David Jacot <dja...@confluent.io.invalid>
> > wrote:
> >
> > > Thanks Guozhang. My answers are below:
> > >
> > > > 1) the migration path, especially the last step when clients flip the
> > > flag
> > > > to enable the new protocol, in which we would have a window where both
> > > new
> > > > protocols / rpcs and old protocols / rpcs are used by members of the 
> > > > same
> > > > group. How the coordinator could "mimic" the old behavior while using 
> > > > the
> > > > new protocol is something we need to present about.
> > >
> > > Noted. I just published a new version of KIP which includes more
> > > details about this. See the "Supporting Online Consumer Group Upgrade"
> > > and the "Compatibility, Deprecation, and Migration Plan". I think that
> > > I have to think through a few cases now but the overall idea and
> > > mechanism should be understandable.
> > >
> > > > 2) the usage of topic ids. So far as KIP-516 the topic ids are only used
> > > as
> > > > part of RPCs and admin client, but they are not exposed via any public
> > > APIs
> > > > to consumers yet. I think the question is, first should we let the
> > > consumer
> > > > client to be maintaining the names -> ids mapping itself to fully
> > > leverage
> > > > on all the augmented existing RPCs and the new RPCs with the topic ids;
> > > and
> > > > secondly, should we ever consider exposing the topic ids in the consumer
> > > > public APIs as well (both subscribe/assign, as well as in the rebalance
> > > > listener for cases like topic deletion-and-recreation).
> > >
> > > a) Assuming that we would include converting all the offsets related
> > > RPCs to using topic ids in this KIP, the consumer would be able to
> > > fully operate with topic ids. That being said, it still has to provide
> > > the topics names in various APIs so having a mapping in the consumer
> > > seems inevitable to me.
> > > b) I don't have a strong opinion on this. Here I wonder if this goes
> > > beyond the scope of this KIP. I would rather focus on the internals
> > > here and we can consider this separately if we see value in doing it.
> > >
> > > Coming back to Ismael's point about using topic ids in the
> > > ConsumerGroupHeartbeatRequest, I think that there is one advantage in
> > > favour of it. The consumer will have the opportunity to validate that
> > > the topics exists before passing them into the group rebalance
> > > protocol. Obviously, the coordinator will also notice it but it does
> > > not really have a way to reject an invalid topic in the response.
> > >
> > > > I'm agreeing with David on all other minor questions except for the
> > > > `subscribe(Pattern)` question: personally I think it's not necessary to
> > > > deprecate the subscribe API with Pattern, but instead we still use
> > > Pattern
> > > > while just documenting that our subscription may be rejected by the
> > > server.
> > > > Since the incompatible case is a very rare scenario I felt using an
> > > > overloaded `String` based subscription may be more vulnerable to various
> > > > invalid regexes.
> > >
> > > That could work. I have to look at the differences between the two
> > > engines to better understand the potential issues. My understanding is
> > > that would work for all the basic regular expressions. The differences
> > > between the two are mainly about the various character classes. I
> > > wonder what other people think about this.
> > >
> > > Best,
> > > David
> > >
> > > On Tue, Jul 12, 2022 at 11:28 PM Guozhang Wang <wangg...@gmail.com> wrote:
> > > >
> > > > Thanks David! I think on the high level there are two meta points we 
> > > > need
> > > > to concretize a bit more:
> > > >
> > > > 1) the migration path, especially the last step when clients flip the
> > > flag
> > > > to enable the new protocol, in which we would have a window where both
> > > new
> > > > protocols / rpcs and old protocols / rpcs are used by members of the 
> > > > same
> > > > group. How the coordinator could "mimic" the old behavior while using 
> > > > the
> > > > new protocol is something we need to present about.
> > > > 2) the usage of topic ids. So far as KIP-516 the topic ids are only used
> > > as
> > > > part of RPCs and admin client, but they are not exposed via any public
> > > APIs
> > > > to consumers yet. I think the question is, first should we let the
> > > consumer
> > > > client to be maintaining the names -> ids mapping itself to fully
> > > leverage
> > > > on all the augmented existing RPCs and the new RPCs with the topic ids;
> > > and
> > > > secondly, should we ever consider exposing the topic ids in the consumer
> > > > public APIs as well (both subscribe/assign, as well as in the rebalance
> > > > listener for cases like topic deletion-and-recreation).
> > > >
> > > > I'm agreeing with David on all other minor questions except for the
> > > > `subscribe(Pattern)` question: personally I think it's not necessary to
> > > > deprecate the subscribe API with Pattern, but instead we still use
> > > Pattern
> > > > while just documenting that our subscription may be rejected by the
> > > server.
> > > > Since the incompatible case is a very rare scenario I felt using an
> > > > overloaded `String` based subscription may be more vulnerable to various
> > > > invalid regexes.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Tue, Jul 12, 2022 at 5:23 AM David Jacot <dja...@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > Thanks for your feedback. Let me answer your questions inline.
> > > > >
> > > > > > 1. I think it's premature to talk about target versions for
> > > deprecation
> > > > > and
> > > > > > removal of the existing group protocol. Unlike KRaft, this affects a
> > > core
> > > > > > client protocol and hence deprecation/removal will be heavily
> > > dependent
> > > > > on
> > > > > > how quickly applications migrate to the new protocol.
> > > > >
> > > > > That makes sense. I will remove it.
> > > > >
> > > > > > 2. The KIP says we intend to release this in 4.x, but it wasn't made
> > > > > clear
> > > > > > why. If we added that as a way to estimate when we'd deprecate and
> > > remove
> > > > > > the group protocol, I also suggest removing this part.
> > > > >
> > > > > Let me explain my reasoning. As explained, I plan to rewrite the group
> > > > > coordinator in Java while we implement the new protocol. This means
> > > > > that the internals will be slightly different (e.g. threading model).
> > > > > Therefore, I wanted to tighten the switch from the old group
> > > > > coordinator to the new group coordinator to a major release. The
> > > > > alternative would be to use a flag to do the switch instead of relying
> > > > > on the software upgrade.
> > > > >
> > > > > > 3. We need to flesh out the details of the migration story. It 
> > > > > > sounds
> > > > > like
> > > > > > we're saying we will support online migrations. Is that correct? We
> > > > > should
> > > > > > explain this in detail. It could also be done as a separate KIP, if
> > > it's
> > > > > > easier.
> > > > >
> > > > > Yes, we will support online migrations for the group. That means that
> > > > > a group using the old protocol will be able to switch to the new
> > > > > protocol.
> > > > >
> > > > > Let me briefly explain how that will work though. It is basically a
> > > > > four step process:
> > > > >
> > > > > 1. The cluster must be upgraded or rolled to a software supporting the
> > > > > new group coordinator. Both the old and the new coordinator will
> > > > > support the old protocol and rely on the same persisted metadata so
> > > > > they can work together. This point is an offline migration. We cannot
> > > > > do this one live because it would require shutting down the current
> > > > > coordinator and starting up the new one and that would cause
> > > > > unavailabilities.
> > > > > 2. The cluster's metadata version/IBP must be upgraded to X in order
> > > > > to enable the new protocol. This cannot be done before 1) is
> > > > > terminated because the old coordinator doesn't support the new
> > > > > protocol.
> > > > > 3. The consumers must be upgraded to a version supporting the online
> > > > > migration (must have KIP-792). If the consumer is already there.
> > > > > Nothing must be done at this point.
> > > > > 4. The consumers must be rolled with the feature flag turned on. The
> > > > > consumer group is automatically converted when the first consumer
> > > > > using the new protocol joins the group. While the members using the
> > > > > old protocol are being upgraded, the old protocol is proxied into the
> > > > > new one.
> > > > >
> > > > > Let me clarify all of this in the KIP.
> > > > >
> > > > > > 4. I am happy that we are pushing the pattern subscriptions to the
> > > > > server,
> > > > > > but it seems like there could be some tricky compatibility issues.
> > > Will
> > > > > we
> > > > > > have a mechanism for users to detect that they need to update their
> > > regex
> > > > > > before switching to the new protocol?
> > > > >
> > > > > I think that I am a bit more optimistic than you on this point. I
> > > > > believe that the majority of the cases are simple regexes which should
> > > > > work with the new engine. The coordinator will verify the regex anyway
> > > > > and reject the consumer if the regex is not valid. Coming back to the
> > > > > migration path, in the worst case, the first upgraded consumer joining
> > > > > the group will be rejected. This should be used as the last defence, I
> > > > > would say.
> > > > >
> > > > > One way for customers to validate their regex before upgrading their
> > > > > prod would be to test them with another group. For instance, that
> > > > > could be done in a pre-prod environment. Another way would be to
> > > > > extend the consumer-group tool to provide a regex validation
> > > > > mechanism. Would this be enough in your opinion?
> > > > >
> > > > > > 5. Related to the last question, will the Java client allow the
> > > users to
> > > > > > stick with the current regex engine for compatibility reasons? For
> > > > > example,
> > > > > > it may be handy to keep using client based regex at first to keep
> > > > > > migrations simple and then migrate to server based regexes as a
> > > second
> > > > > step.
> > > > >
> > > > > I understand your point but I am concerned that this would allow users
> > > > > to actually stay in this mode. That would go against our goal of
> > > > > simplifying the client because we would have to continue monitoring
> > > > > the metadata on the client side. I would rather not do this.
> > > > >
> > > > > > 6. When we say that the group coordinator will be responsible for
> > > storing
> > > > > > the configurations and that the configurations will be deleted when
> > > the
> > > > > > group is deleted. Will a transition to DEAD trigger deletion of
> > > > > > configurations?
> > > > >
> > > > > That's right. The configurations will be deleted when the group is
> > > > > deleted. They go together.
> > > > >
> > > > > > 7. Will the choice to store the configs in the group coordinator
> > > make it
> > > > > > harder to list all cluster configs and their values?
> > > > >
> > > > > I don't think so. The group configurations are overrides of cluster
> > > > > configs. If you want to know all the overrides though, you would have
> > > > > to ask all the group coordinators. You cannot rely on the metadata log
> > > > > for instance.
> > > > >
> > > > > > 8. How would someone configure a group before starting the 
> > > > > > consumers?
> > > > > Have
> > > > > > we considered allowing the explicit creation of groups?
> > > Alternatively,
> > > > > the
> > > > > > configs could be decoupled from the group lifecycle.
> > > > >
> > > > > Yes. The group will be automatically created in this case. However,
> > > > > the configs will be lost after the retention period of the group
> > > > > passes.
> > > > >
> > > > > > 9. Will the Consumer.subscribe method for the Java client still take
> > > a
> > > > > > `java.util.regex.Pattern` of do we have to introduce an overload?
> > > > >
> > > > > That's a very group question. I forgot about that one. As the
> > > > > `java.util.regex.Pattern` is not fully compatible with the engine that
> > > > > we plan to use, it might be better to deprecate it and use an overload
> > > > > which takes a string. We would rely on the server side validation.
> > > > > During the migration, I think that we could still try to toString the
> > > > > regex and use it. That should work, I think, in the majority of the
> > > > > cases.
> > > > >
> > > > > > 10. I agree with Justine that we should be clearer about the reason
> > > to
> > > > > > switch to IBP/metadata.version from the feature flag. Maybe we mean
> > > that
> > > > > we
> > > > > > can switch the default for the feature flag to true based on the
> > > > > > metadata.version once we want to make it the default.
> > > > >
> > > > > My plan was to use that feature flag mainly during the development
> > > > > phase. I should not have mentioned it, I think, because we could use
> > > > > an internal config for it.
> > > > >
> > > > > > 11. Some of the protocol APIs don't mention the required ACLs, it
> > > would
> > > > > be
> > > > > > good to add that for consistency.
> > > > >
> > > > > Noted.
> > > > >
> > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat requires "Read 
> > > > > > Group"
> > > > > even
> > > > > > though it seems to do more than reading.
> > > > >
> > > > > I agree. This is how the current protocol works though. We only
> > > > > require "Read Group" to join a group. We could consider changing this
> > > > > but I am not sure that it is worth it.
> > > > >
> > > > > > 13. How is topic recreation handled by the consumer with the new
> > > group
> > > > > > protocol? It would be good to have a section on this.
> > > > >
> > > > > Noted. From a protocol perspective, the new topic will have a new
> > > > > topic id so it will treat it like a topic with a different name. The
> > > > > only issue is that the fetch/commit offsets APIs do not support topic
> > > > > IDs so the consumer would reuse the offsets based on the same. I think
> > > > > that we should update those APIs as well in order to be consistent end
> > > > > to end. That would strengthen the semantics of the consumer.
> > > > >
> > > > > > 14. The KIP mentions we will write the new coordinator in Java. Even
> > > > > though
> > > > > > this is an implementation detail, do we plan to have a new gradle
> > > module
> > > > > > for it?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > 15. Do we have a scalability goal when it comes to how many members
> > > the
> > > > > new
> > > > > > group protocol can support?
> > > > >
> > > > > We don't have numbers at the moment. The protocol should support 1000s
> > > > > of members per group. We will measure this when we have a first
> > > > > implementation. Note that we might have other bottlenecks down the
> > > > > road (e.g. offset commits).
> > > > >
> > > > > > 16. Did we consider having SubscribedTopidIds instead of
> > > > > > SubscribedTopicNames in ConsumerGroupHeartbeatRequest? Is the idea
> > > that
> > > > > > since we have to resolve the regex on the server, we can do the same
> > > for
> > > > > > the topic name? The difference is that sending the regex is more
> > > > > efficient
> > > > > > whereas sending the topic names is less efficient. Furthermore,
> > > delete
> > > > > and
> > > > > > recreation is easier to handle if we have topic ids.
> > > > >
> > > > > The idea was to consolidate the metadata lookup on the server for both
> > > > > paths but I do agree with your point. As a second though, using topic
> > > > > ids may be better here for the delete and recreation case. Also, I
> > > > > suppose that we may allow users to subscribe with topic ids in the
> > > > > future because that is the only way to be really robust to topic
> > > > > re-creation.
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Tue, Jul 12, 2022 at 1:38 PM David Jacot <dja...@confluent.io>
> > > wrote:
> > > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for your comments. Please find my answers below.
> > > > > >
> > > > > > - Yes, the new protocol relies on topic IDs with the exception of 
> > > > > > the
> > > > > > topic names based in the ConsumerGroupHeartbeatRequest. I am not 
> > > > > > sure
> > > > > > if using topic names is the right call here. I need to think about 
> > > > > > it
> > > > > > a little more. Obviously, the KIP does not change the fetch/commit
> > > > > > offsets RPCs to use topic IDs. This may be something that we should
> > > > > > include though as it would give better overall guarantee in the
> > > > > > producer.
> > > > > > - You're right. I think that I should not have mentioned this flag 
> > > > > > at
> > > > > > all. I will remove it. We can use an internal configuration while
> > > > > > developing the feature.
> > > > > > - Both cluster types will be supported. The change is orthogonal. 
> > > > > > The
> > > > > > only requirement is that the cluster uses topic IDs.
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > > > On Mon, Jul 11, 2022 at 9:53 PM Guozhang Wang <wangg...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi Ismael,
> > > > > > >
> > > > > > > Thanks for the feedback. Here are some replies inlined below:
> > > > > > >
> > > > > > > On Sat, Jul 9, 2022 at 2:53 AM Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP. This has the potential to be a great
> > > > > improvement. A few
> > > > > > > > initial questions/comments:
> > > > > > > >
> > > > > > > > 1. I think it's premature to talk about target versions for
> > > > > deprecation and
> > > > > > > > removal of the existing group protocol. Unlike KRaft, this
> > > affects a
> > > > > core
> > > > > > > > client protocol and hence deprecation/removal will be heavily
> > > > > dependent on
> > > > > > > > how quickly applications migrate to the new protocol.
> > > > > > > >
> > > > > > >
> > > > > > > Yeah I agree with you. I think we can remove the proposed timeline
> > > in
> > > > > the
> > > > > > > `Compatibility, Deprecation, and Migration Plan` and instead just
> > > state
> > > > > > > that we will decide in the future about when we would deprecate 
> > > > > > > old
> > > > > > > protocol and behaviors.
> > > > > > >
> > > > > > >
> > > > > > > > 2. The KIP says we intend to release this in 4.x, but it wasn't
> > > made
> > > > > clear
> > > > > > > > why. If we added that as a way to estimate when we'd deprecate
> > > and
> > > > > remove
> > > > > > > > the group protocol, I also suggest removing this part.
> > > > > > > >
> > > > > > >
> > > > > > > I think that's not specifically related to the deprecation/removal
> > > > > timeline
> > > > > > > plan, but it's more for client upgrades. I.e. the broker-side
> > > > > > > implementation may be done first, and then the client side, and we
> > > > > would
> > > > > > > only mark it as "released" by the time clients implementations are
> > > > > done. At
> > > > > > > that time, to enable the feature the clients need to first swap-in
> > > the
> > > > > > > bytecode with a rolling bounce and then set the flag with a second
> > > > > rolling
> > > > > > > bounce, and hence we feel it's better to be released in a major
> > > > > version.
> > > > > > >
> > > > > > >
> > > > > > > > 3. We need to flesh out the details of the migration story. It
> > > > > sounds like
> > > > > > > > we're saying we will support online migrations. Is that correct?
> > > We
> > > > > should
> > > > > > > > explain this in detail. It could also be done as a separate KIP,
> > > if
> > > > > it's
> > > > > > > > easier.
> > > > > > > >
> > > > > > >
> > > > > > > Yes I think that's the part we can be more concrete about for sure
> > > (and
> > > > > > > this is related to your question 2) above). We will work on making
> > > it
> > > > > more
> > > > > > > explicit in parallel as we solicit more feedback.
> > > > > > >
> > > > > > >
> > > > > > > > 4. I am happy that we are pushing the pattern subscriptions to
> > > the
> > > > > server,
> > > > > > > > but it seems like there could be some tricky compatibility
> > > issues.
> > > > > Will we
> > > > > > > > have a mechanism for users to detect that they need to update
> > > their
> > > > > regex
> > > > > > > > before switching to the new protocol?
> > > > > > > >
> > > > > > >
> > > > > > > Yes I think we need some tooling for non-java client users to sort
> > > of
> > > > > > > "dry-run" the client before switching to the new protocol. I do 
> > > > > > > not
> > > > > have a
> > > > > > > specific idea on top of my head though, maybe others like @Matt
> > > > > Howlett can
> > > > > > > chime-in here?
> > > > > > >
> > > > > > >
> > > > > > > > 5. Related to the last question, will the Java client allow the
> > > > > users to
> > > > > > > > stick with the current regex engine for compatibility reasons?
> > > For
> > > > > example,
> > > > > > > > it may be handy to keep using client based regex at first to 
> > > > > > > > keep
> > > > > > > > migrations simple and then migrate to server based regexes as a
> > > > > second
> > > > > > > > step.
> > > > > > > >
> > > > > > >
> > > > > > > Honestly I have not thought about that for java clients, and we 
> > > > > > > can
> > > > > discuss
> > > > > > > that. What kind of compatibility issues do you have in mind?
> > > > > > >
> > > > > > >
> > > > > > > > 6. When we say that the group coordinator will be responsible 
> > > > > > > > for
> > > > > storing
> > > > > > > > the configurations and that the configurations will be deleted
> > > when
> > > > > the
> > > > > > > > group is deleted. Will a transition to DEAD trigger deletion of
> > > > > > > > configurations?
> > > > > > > >
> > > > > > >
> > > > > > > Yes, since the DEAD state is an ending state (we would only
> > > transit to
> > > > > that
> > > > > > > state when the group is EMPTY and also all of its metadata are
> > > gone),
> > > > > once
> > > > > > > it's transited to DEAD this group would never be revived.
> > > > > > >
> > > > > > >
> > > > > > > > 7. Will the choice to store the configs in the group coordinator
> > > > > make it
> > > > > > > > harder to list all cluster configs and their values?
> > > > > > > >
> > > > > > >
> > > > > > > That's a good question, and our thoughts are that the so-called
> > > "group
> > > > > > > configurations" are overrides of the cluster-level configurations
> > > > > > > customized per group so when an admin list cluster configs it's
> > > okay to
> > > > > > > list just the cluster-level "defaults", not showing any per-group
> > > > > > > customizations.
> > > > > > >
> > > > > > >
> > > > > > > > 8. How would someone configure a group before starting the
> > > > > consumers? Have
> > > > > > > > we considered allowing the explicit creation of groups?
> > > > > Alternatively, the
> > > > > > > > configs could be decoupled from the group lifecycle.
> > > > > > > >
> > > > > > >
> > > > > > > The configs can be created before the group itself as an
> > > independent
> > > > > entity
> > > > > > > --- of course, this requires the corresponding request to be
> > > routed to
> > > > > the
> > > > > > > right coordinator based on the group id --- the only thing that
> > > > > differs is,
> > > > > > > when the group itself is gone we also check if there are any
> > > > > configuration
> > > > > > > entities related to that group and delete as well.
> > > > > > >
> > > > > > > Admittedly this indeed introduces an asymmetry on the creation /
> > > > > deletion
> > > > > > > lifecycles of the config entities, and we would like to hear
> > > everyone's
> > > > > > > feelings whether we should aim for symmetry i.e. totally decouple
> > > group
> > > > > > > configs and hence not delete them at all when the group is gone,
> > > but
> > > > > always
> > > > > > > require explicit deletion operations by themselves.
> > > > > > >
> > > > > > >
> > > > > > > > 9. Will the Consumer.subscribe method for the Java client still
> > > take
> > > > > a
> > > > > > > > `java.util.regex.Pattern` of do we have to introduce an 
> > > > > > > > overload?
> > > > > > > >
> > > > > > >
> > > > > > > I think we do not need to introduce an overload, but I'm all ears
> > > if
> > > > > there
> > > > > > > may be some compatibility issues that we may overlook.
> > > > > > >
> > > > > > >
> > > > > > > > 10. I agree with Justine that we should be clearer about the
> > > reason
> > > > > to
> > > > > > > > switch to IBP/metadata.version from the feature flag. Maybe we
> > > mean
> > > > > that we
> > > > > > > > can switch the default for the feature flag to true based on the
> > > > > > > > metadata.version once we want to make it the default.
> > > > > > >
> > > > > > > 11. Some of the protocol APIs don't mention the required ACLs, it
> > > > > would be
> > > > > > > > good to add that for consistency.
> > > > > > > >
> > > > > > >
> > > > > > > Ack, we can certainly do that.
> > > > > > >
> > > > > > >
> > > > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat requires "Read
> > > > > Group" even
> > > > > > > > though it seems to do more than reading.
> > > > > > > >
> > > > > > >
> > > > > > > I had that thought myself as well, but in the end we could not
> > > find a
> > > > > > > better alternative: adding Write Group seems an overkill here
> > > since we
> > > > > do
> > > > > > > not have it elsewhere (we only have Read / Delete and Describe on
> > > > > groups so
> > > > > > > far). Would like to hear others thoughts.
> > > > > > >
> > > > > > >
> > > > > > > > 13. How is topic recreation handled by the consumer with the new
> > > > > group
> > > > > > > > protocol? It would be good to have a section on this.
> > > > > > > >
> > > > > > >
> > > > > > > You mean with regex subscription right? Yes we can add a section
> > > about
> > > > > > > that, but basically the idea is that consumer would be totally
> > > > > agnostic in
> > > > > > > the new protocol as it's handled all by the brokers.
> > > > > > >
> > > > > > >
> > > > > > > > 14. The KIP mentions we will write the new coordinator in Java.
> > > Even
> > > > > though
> > > > > > > > this is an implementation detail, do we plan to have a new 
> > > > > > > > gradle
> > > > > module
> > > > > > > > for it?
> > > > > > > >
> > > > > > >
> > > > > > > We have not thought about that. But I think the answer should be
> > > yes.
> > > > > > >
> > > > > > >
> > > > > > > > 15. Do we have a scalability goal when it comes to how many
> > > members
> > > > > the new
> > > > > > > > group protocol can support?
> > > > > > > >
> > > > > > >
> > > > > > > Within a group, I think we should shoot for 1000s of members. But
> > > that
> > > > > > > scalability goals also depend on the offset management (commit,
> > > fetch)
> > > > > > > capabilities of the coordinator which we did not cover in this
> > > KIP, so
> > > > > it's
> > > > > > > hard to give a number that applies universally.
> > > > > > >
> > > > > > >
> > > > > > > > 16. Did we consider having SubscribedTopidIds instead of
> > > > > > > > SubscribedTopicNames in ConsumerGroupHeartbeatRequest? Is the
> > > idea
> > > > > that
> > > > > > > > since we have to resolve the regex on the server, we can do the
> > > same
> > > > > for
> > > > > > > > the topic name? The difference is that sending the regex is more
> > > > > efficient
> > > > > > > > whereas sending the topic names is less efficient. Furthermore,
> > > > > delete and
> > > > > > > > recreation is easier to handle if we have topic ids.
> > > > > > > >
> > > > > > >
> > > > > > > The main reason to still let the clients send names is to keep the
> > > > > > > reasoning of names -> ids on the broker / admin client only. Note
> > > that
> > > > > > > although we added topic id in KIP-516, we never implemented the
> > > logic
> > > > > on
> > > > > > > consumer/producers leveraging the related newer versioned RPCs,
> > > > > instead we
> > > > > > > just set the topic id as empty UUID. We want to keep the
> > > > > consumer/producer
> > > > > > > to be thin and only delegate the reasoning on broker and
> > > potentially
> > > > > admin
> > > > > > > clients.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Wed, Jul 6, 2022 at 10:45 AM David Jacot
> > > > > <dja...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I would like to start a discussion thread on KIP-848: The Next
> > > > > > > > > Generation of the Consumer Rebalance Protocol. With this KIP,
> > > we
> > > > > aim
> > > > > > > > > to make the rebalance protocol (for consumers) more reliable,
> > > more
> > > > > > > > > scalable, easier to implement for clients, and easier to debug
> > > for
> > > > > > > > > operators.
> > > > > > > > >
> > > > > > > > > The KIP is here: https://cwiki.apache.org/confluence/x/HhD1D.
> > > > > > > > >
> > > > > > > > > Please take a look and let me know what you think.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > David
> > > > > > > > >
> > > > > > > > > PS: I will be away from July 18th to August 8th. That gives
> > > you a
> > > > > bit
> > > > > > > > > of time to read and digest this long KIP.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >

Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

Reply via email to