Hi David, Thank you for your response. The reason I thought connect can also fit into this new scheme is that even today the connect uses a WorkerCoordinator extending from AbstractCoordinator to empower rebalances of tasks/connectors. The WorkerCoordinator sets the protocolType() to connect and uses the metadata() method by plumbing into JoinGroupRequestProtocol.
I think the changes to support connect would be similar at a high level to the changes in streams mainly because of the Client side assignors being used in both. At an implementation level, we might need to make a lot of changes to get onto this new assignment protocol like enhancing the JoinGroup request/response and SyncGroup and using ConsumerGroupHeartbeat API etc again on similar lines to streams (or there might be deviations). I would try to perform a detailed analysis of the same and we can have a separate discussion thread for that as that would derail this discussion thread. Let me know if that sounds good to you. Thanks! Sagar. On Fri, Jul 15, 2022 at 5:47 PM David Jacot <dja...@confluent.io.invalid> wrote: > Hi Sagar, > > Thanks for your comments. > > 1) Yes. That refers to `Assignment#error`. Sure, I can mention it. > > 2) The idea is to transition C from his current assignment to his > target assignment when he can move to epoch 3. When that happens, the > member assignment is updated and persisted with all its assigned > partitions even if they are not all revoked yet. In other words, the > member assignment becomes the target assignment. This is basically an > optimization to avoid having to write all the changes to the log. The > examples are based on the persisted state so I understand the > confusion. Let me see if I can improve this in the description. > > 3) Regarding Connect, it could reuse the protocol with a client side > assignor if it fits in the protocol. The assignment is about > topicid-partitions + metadata, could Connect fit into this? > > Best, > David > > On Fri, Jul 15, 2022 at 1:55 PM Sagar <sagarmeansoc...@gmail.com> wrote: > > > > Hi David, > > > > Thanks for the KIP. I just had minor observations: > > > > 1) In the Assignment Error section in Client Side mode Assignment > process, > > you mentioned => `In this case, the client side assignor can return an > > error to the group coordinator`. In this case are you referring to the > > Assignor returning an AssignmentError that's listed down towards the end? > > If yes, do you think it would make sense to mention this explicitly here? > > > > 2) In the Case Studies section, I have a slight confusion, not sure if > > others have the same. Consider this step: > > > > When B heartbeats, the group coordinator transitions him to epoch 3 > because > > B has no partitions to revoke. It persists the change and reply. > > > > - Group (epoch=3) > > - A > > - B > > - C > > - Target Assignment (epoch=3) > > - A - partitions=[foo-0] > > - B - partitions=[foo-2] > > - C - partitions=[foo-1] > > - Member Assignment > > - A - epoch=2, partitions=[foo-0, foo-1] > > - B - epoch=3, partitions=[foo-2] > > - C - epoch=3, partitions=[foo-1] > > > > When C heartbeats, it transitions to epoch 3 but cannot get foo-1 yet. > > > > Here,it's mentioned that member C can't get the foo-1 partition yet, but > > based on the description above, it seems it already has it. Do you think > it > > would be better to remove it and populate it only when it actually gets > it? > > I see this in a lot of other places, so have I understood it incorrectly > ? > > > > > > Regarding connect , it might be out of scope of this discussion, but from > > what I understood it would probably be running in client side assignor > mode > > even on the new rebalance protocol as it has its own Custom > Assignors(Eager > > and IncrementalCooperative). > > > > Thanks! > > > > Sagar. > > > > > > > > > > > > > > On Fri, Jul 15, 2022 at 5:00 PM David Jacot <dja...@confluent.io.invalid > > > > wrote: > > > > > Thanks Hector! Our goal is to move forward with specialized API > > > instead of relying on one generic API. For Connect, we can apply the > > > exact same pattern and reuse/share the core implementation on the > > > server side. For the schema registry, I think that we should consider > > > having a tailored API to do simple membership/leader election. > > > > > > Best, > > > David > > > > > > On Fri, Jul 15, 2022 at 10:22 AM Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > > Three quick comments: > > > > > > > > 1. Regarding java.util.regex.Pattern vs com.google.re2j.Pattern, we > > > should > > > > document the differences in more detail before deciding one way or > > > another. > > > > That said, if people pass java.util.regex.Pattern, they expect their > > > > semantics to be honored. If we are doing something different, then we > > > > should consider adding an overload with our own Pattern class (I > don't > > > > think we'd want to expose re2j's at this point). > > > > 2. Regarding topic ids, any major new protocol should integrate fully > > > with > > > > it and should handle the topic recreation case correctly. That's the > main > > > > part we need to handle. I agree with David that we'd want to add > topic > > > ids > > > > to the relevant protocols that don't have it yet and that we can > probably > > > > focus on the internals versus adding new APIs to the Java Consumer > > > (unless > > > > we find that adding new APIs is required for reasonable semantics). > > > > 3. I am still not sure about the coordinator storing the configs. > It's > > > > powerful for configs to be centralized in the metadata log for > various > > > > reasons (auditability, visibility, consistency, etc.). Similarly, I > am > > > not > > > > sure about automatically deleting configs in a way that they cannot > be > > > > recovered. A good property for modern systems is to minimize the > number > > > of > > > > unrecoverable data loss scenarios. > > > > > > > > Ismael > > > > > > > > On Wed, Jul 13, 2022 at 3:47 PM David Jacot > <dja...@confluent.io.invalid > > > > > > > > wrote: > > > > > > > > > Thanks Guozhang. My answers are below: > > > > > > > > > > > 1) the migration path, especially the last step when clients > flip the > > > > > flag > > > > > > to enable the new protocol, in which we would have a window where > > > both > > > > > new > > > > > > protocols / rpcs and old protocols / rpcs are used by members of > the > > > same > > > > > > group. How the coordinator could "mimic" the old behavior while > > > using the > > > > > > new protocol is something we need to present about. > > > > > > > > > > Noted. I just published a new version of KIP which includes more > > > > > details about this. See the "Supporting Online Consumer Group > Upgrade" > > > > > and the "Compatibility, Deprecation, and Migration Plan". I think > that > > > > > I have to think through a few cases now but the overall idea and > > > > > mechanism should be understandable. > > > > > > > > > > > 2) the usage of topic ids. So far as KIP-516 the topic ids are > only > > > used > > > > > as > > > > > > part of RPCs and admin client, but they are not exposed via any > > > public > > > > > APIs > > > > > > to consumers yet. I think the question is, first should we let > the > > > > > consumer > > > > > > client to be maintaining the names -> ids mapping itself to fully > > > > > leverage > > > > > > on all the augmented existing RPCs and the new RPCs with the > topic > > > ids; > > > > > and > > > > > > secondly, should we ever consider exposing the topic ids in the > > > consumer > > > > > > public APIs as well (both subscribe/assign, as well as in the > > > rebalance > > > > > > listener for cases like topic deletion-and-recreation). > > > > > > > > > > a) Assuming that we would include converting all the offsets > related > > > > > RPCs to using topic ids in this KIP, the consumer would be able to > > > > > fully operate with topic ids. That being said, it still has to > provide > > > > > the topics names in various APIs so having a mapping in the > consumer > > > > > seems inevitable to me. > > > > > b) I don't have a strong opinion on this. Here I wonder if this > goes > > > > > beyond the scope of this KIP. I would rather focus on the internals > > > > > here and we can consider this separately if we see value in doing > it. > > > > > > > > > > Coming back to Ismael's point about using topic ids in the > > > > > ConsumerGroupHeartbeatRequest, I think that there is one advantage > in > > > > > favour of it. The consumer will have the opportunity to validate > that > > > > > the topics exists before passing them into the group rebalance > > > > > protocol. Obviously, the coordinator will also notice it but it > does > > > > > not really have a way to reject an invalid topic in the response. > > > > > > > > > > > I'm agreeing with David on all other minor questions except for > the > > > > > > `subscribe(Pattern)` question: personally I think it's not > necessary > > > to > > > > > > deprecate the subscribe API with Pattern, but instead we still > use > > > > > Pattern > > > > > > while just documenting that our subscription may be rejected by > the > > > > > server. > > > > > > Since the incompatible case is a very rare scenario I felt using > an > > > > > > overloaded `String` based subscription may be more vulnerable to > > > various > > > > > > invalid regexes. > > > > > > > > > > That could work. I have to look at the differences between the two > > > > > engines to better understand the potential issues. My > understanding is > > > > > that would work for all the basic regular expressions. The > differences > > > > > between the two are mainly about the various character classes. I > > > > > wonder what other people think about this. > > > > > > > > > > Best, > > > > > David > > > > > > > > > > On Tue, Jul 12, 2022 at 11:28 PM Guozhang Wang <wangg...@gmail.com > > > > > wrote: > > > > > > > > > > > > Thanks David! I think on the high level there are two meta > points we > > > need > > > > > > to concretize a bit more: > > > > > > > > > > > > 1) the migration path, especially the last step when clients > flip the > > > > > flag > > > > > > to enable the new protocol, in which we would have a window where > > > both > > > > > new > > > > > > protocols / rpcs and old protocols / rpcs are used by members of > the > > > same > > > > > > group. How the coordinator could "mimic" the old behavior while > > > using the > > > > > > new protocol is something we need to present about. > > > > > > 2) the usage of topic ids. So far as KIP-516 the topic ids are > only > > > used > > > > > as > > > > > > part of RPCs and admin client, but they are not exposed via any > > > public > > > > > APIs > > > > > > to consumers yet. I think the question is, first should we let > the > > > > > consumer > > > > > > client to be maintaining the names -> ids mapping itself to fully > > > > > leverage > > > > > > on all the augmented existing RPCs and the new RPCs with the > topic > > > ids; > > > > > and > > > > > > secondly, should we ever consider exposing the topic ids in the > > > consumer > > > > > > public APIs as well (both subscribe/assign, as well as in the > > > rebalance > > > > > > listener for cases like topic deletion-and-recreation). > > > > > > > > > > > > I'm agreeing with David on all other minor questions except for > the > > > > > > `subscribe(Pattern)` question: personally I think it's not > necessary > > > to > > > > > > deprecate the subscribe API with Pattern, but instead we still > use > > > > > Pattern > > > > > > while just documenting that our subscription may be rejected by > the > > > > > server. > > > > > > Since the incompatible case is a very rare scenario I felt using > an > > > > > > overloaded `String` based subscription may be more vulnerable to > > > various > > > > > > invalid regexes. > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > On Tue, Jul 12, 2022 at 5:23 AM David Jacot > > > <dja...@confluent.io.invalid > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Ismael, > > > > > > > > > > > > > > Thanks for your feedback. Let me answer your questions inline. > > > > > > > > > > > > > > > 1. I think it's premature to talk about target versions for > > > > > deprecation > > > > > > > and > > > > > > > > removal of the existing group protocol. Unlike KRaft, this > > > affects a > > > > > core > > > > > > > > client protocol and hence deprecation/removal will be heavily > > > > > dependent > > > > > > > on > > > > > > > > how quickly applications migrate to the new protocol. > > > > > > > > > > > > > > That makes sense. I will remove it. > > > > > > > > > > > > > > > 2. The KIP says we intend to release this in 4.x, but it > wasn't > > > made > > > > > > > clear > > > > > > > > why. If we added that as a way to estimate when we'd > deprecate > > > and > > > > > remove > > > > > > > > the group protocol, I also suggest removing this part. > > > > > > > > > > > > > > Let me explain my reasoning. As explained, I plan to rewrite > the > > > group > > > > > > > coordinator in Java while we implement the new protocol. This > means > > > > > > > that the internals will be slightly different (e.g. threading > > > model). > > > > > > > Therefore, I wanted to tighten the switch from the old group > > > > > > > coordinator to the new group coordinator to a major release. > The > > > > > > > alternative would be to use a flag to do the switch instead of > > > relying > > > > > > > on the software upgrade. > > > > > > > > > > > > > > > 3. We need to flesh out the details of the migration story. > It > > > sounds > > > > > > > like > > > > > > > > we're saying we will support online migrations. Is that > correct? > > > We > > > > > > > should > > > > > > > > explain this in detail. It could also be done as a separate > KIP, > > > if > > > > > it's > > > > > > > > easier. > > > > > > > > > > > > > > Yes, we will support online migrations for the group. That > means > > > that > > > > > > > a group using the old protocol will be able to switch to the > new > > > > > > > protocol. > > > > > > > > > > > > > > Let me briefly explain how that will work though. It is > basically a > > > > > > > four step process: > > > > > > > > > > > > > > 1. The cluster must be upgraded or rolled to a software > supporting > > > the > > > > > > > new group coordinator. Both the old and the new coordinator > will > > > > > > > support the old protocol and rely on the same persisted > metadata so > > > > > > > they can work together. This point is an offline migration. We > > > cannot > > > > > > > do this one live because it would require shutting down the > current > > > > > > > coordinator and starting up the new one and that would cause > > > > > > > unavailabilities. > > > > > > > 2. The cluster's metadata version/IBP must be upgraded to X in > > > order > > > > > > > to enable the new protocol. This cannot be done before 1) is > > > > > > > terminated because the old coordinator doesn't support the new > > > > > > > protocol. > > > > > > > 3. The consumers must be upgraded to a version supporting the > > > online > > > > > > > migration (must have KIP-792). If the consumer is already > there. > > > > > > > Nothing must be done at this point. > > > > > > > 4. The consumers must be rolled with the feature flag turned > on. > > > The > > > > > > > consumer group is automatically converted when the first > consumer > > > > > > > using the new protocol joins the group. While the members > using the > > > > > > > old protocol are being upgraded, the old protocol is proxied > into > > > the > > > > > > > new one. > > > > > > > > > > > > > > Let me clarify all of this in the KIP. > > > > > > > > > > > > > > > 4. I am happy that we are pushing the pattern subscriptions > to > > > the > > > > > > > server, > > > > > > > > but it seems like there could be some tricky compatibility > > > issues. > > > > > Will > > > > > > > we > > > > > > > > have a mechanism for users to detect that they need to update > > > their > > > > > regex > > > > > > > > before switching to the new protocol? > > > > > > > > > > > > > > I think that I am a bit more optimistic than you on this > point. I > > > > > > > believe that the majority of the cases are simple regexes which > > > should > > > > > > > work with the new engine. The coordinator will verify the regex > > > anyway > > > > > > > and reject the consumer if the regex is not valid. Coming back > to > > > the > > > > > > > migration path, in the worst case, the first upgraded consumer > > > joining > > > > > > > the group will be rejected. This should be used as the last > > > defence, I > > > > > > > would say. > > > > > > > > > > > > > > One way for customers to validate their regex before upgrading > > > their > > > > > > > prod would be to test them with another group. For instance, > that > > > > > > > could be done in a pre-prod environment. Another way would be > to > > > > > > > extend the consumer-group tool to provide a regex validation > > > > > > > mechanism. Would this be enough in your opinion? > > > > > > > > > > > > > > > 5. Related to the last question, will the Java client allow > the > > > > > users to > > > > > > > > stick with the current regex engine for compatibility > reasons? > > > For > > > > > > > example, > > > > > > > > it may be handy to keep using client based regex at first to > keep > > > > > > > > migrations simple and then migrate to server based regexes > as a > > > > > second > > > > > > > step. > > > > > > > > > > > > > > I understand your point but I am concerned that this would > allow > > > users > > > > > > > to actually stay in this mode. That would go against our goal > of > > > > > > > simplifying the client because we would have to continue > monitoring > > > > > > > the metadata on the client side. I would rather not do this. > > > > > > > > > > > > > > > 6. When we say that the group coordinator will be > responsible for > > > > > storing > > > > > > > > the configurations and that the configurations will be > deleted > > > when > > > > > the > > > > > > > > group is deleted. Will a transition to DEAD trigger deletion > of > > > > > > > > configurations? > > > > > > > > > > > > > > That's right. The configurations will be deleted when the > group is > > > > > > > deleted. They go together. > > > > > > > > > > > > > > > 7. Will the choice to store the configs in the group > coordinator > > > > > make it > > > > > > > > harder to list all cluster configs and their values? > > > > > > > > > > > > > > I don't think so. The group configurations are overrides of > cluster > > > > > > > configs. If you want to know all the overrides though, you > would > > > have > > > > > > > to ask all the group coordinators. You cannot rely on the > metadata > > > log > > > > > > > for instance. > > > > > > > > > > > > > > > 8. How would someone configure a group before starting the > > > consumers? > > > > > > > Have > > > > > > > > we considered allowing the explicit creation of groups? > > > > > Alternatively, > > > > > > > the > > > > > > > > configs could be decoupled from the group lifecycle. > > > > > > > > > > > > > > Yes. The group will be automatically created in this case. > However, > > > > > > > the configs will be lost after the retention period of the > group > > > > > > > passes. > > > > > > > > > > > > > > > 9. Will the Consumer.subscribe method for the Java client > still > > > take > > > > > a > > > > > > > > `java.util.regex.Pattern` of do we have to introduce an > overload? > > > > > > > > > > > > > > That's a very group question. I forgot about that one. As the > > > > > > > `java.util.regex.Pattern` is not fully compatible with the > engine > > > that > > > > > > > we plan to use, it might be better to deprecate it and use an > > > overload > > > > > > > which takes a string. We would rely on the server side > validation. > > > > > > > During the migration, I think that we could still try to > toString > > > the > > > > > > > regex and use it. That should work, I think, in the majority > of the > > > > > > > cases. > > > > > > > > > > > > > > > 10. I agree with Justine that we should be clearer about the > > > reason > > > > > to > > > > > > > > switch to IBP/metadata.version from the feature flag. Maybe > we > > > mean > > > > > that > > > > > > > we > > > > > > > > can switch the default for the feature flag to true based on > the > > > > > > > > metadata.version once we want to make it the default. > > > > > > > > > > > > > > My plan was to use that feature flag mainly during the > development > > > > > > > phase. I should not have mentioned it, I think, because we > could > > > use > > > > > > > an internal config for it. > > > > > > > > > > > > > > > 11. Some of the protocol APIs don't mention the required > ACLs, it > > > > > would > > > > > > > be > > > > > > > > good to add that for consistency. > > > > > > > > > > > > > > Noted. > > > > > > > > > > > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat requires > "Read > > > Group" > > > > > > > even > > > > > > > > though it seems to do more than reading. > > > > > > > > > > > > > > I agree. This is how the current protocol works though. We only > > > > > > > require "Read Group" to join a group. We could consider > changing > > > this > > > > > > > but I am not sure that it is worth it. > > > > > > > > > > > > > > > 13. How is topic recreation handled by the consumer with the > new > > > > > group > > > > > > > > protocol? It would be good to have a section on this. > > > > > > > > > > > > > > Noted. From a protocol perspective, the new topic will have a > new > > > > > > > topic id so it will treat it like a topic with a different > name. > > > The > > > > > > > only issue is that the fetch/commit offsets APIs do not support > > > topic > > > > > > > IDs so the consumer would reuse the offsets based on the same. > I > > > think > > > > > > > that we should update those APIs as well in order to be > consistent > > > end > > > > > > > to end. That would strengthen the semantics of the consumer. > > > > > > > > > > > > > > > 14. The KIP mentions we will write the new coordinator in > Java. > > > Even > > > > > > > though > > > > > > > > this is an implementation detail, do we plan to have a new > gradle > > > > > module > > > > > > > > for it? > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > 15. Do we have a scalability goal when it comes to how many > > > members > > > > > the > > > > > > > new > > > > > > > > group protocol can support? > > > > > > > > > > > > > > We don't have numbers at the moment. The protocol should > support > > > 1000s > > > > > > > of members per group. We will measure this when we have a first > > > > > > > implementation. Note that we might have other bottlenecks down > the > > > > > > > road (e.g. offset commits). > > > > > > > > > > > > > > > 16. Did we consider having SubscribedTopidIds instead of > > > > > > > > SubscribedTopicNames in ConsumerGroupHeartbeatRequest? Is the > > > idea > > > > > that > > > > > > > > since we have to resolve the regex on the server, we can do > the > > > same > > > > > for > > > > > > > > the topic name? The difference is that sending the regex is > more > > > > > > > efficient > > > > > > > > whereas sending the topic names is less efficient. > Furthermore, > > > > > delete > > > > > > > and > > > > > > > > recreation is easier to handle if we have topic ids. > > > > > > > > > > > > > > The idea was to consolidate the metadata lookup on the server > for > > > both > > > > > > > paths but I do agree with your point. As a second though, using > > > topic > > > > > > > ids may be better here for the delete and recreation case. > Also, I > > > > > > > suppose that we may allow users to subscribe with topic ids in > the > > > > > > > future because that is the only way to be really robust to > topic > > > > > > > re-creation. > > > > > > > > > > > > > > Best, > > > > > > > David > > > > > > > > > > > > > > On Tue, Jul 12, 2022 at 1:38 PM David Jacot < > dja...@confluent.io> > > > > > wrote: > > > > > > > > > > > > > > > > Hi Justine, > > > > > > > > > > > > > > > > Thanks for your comments. Please find my answers below. > > > > > > > > > > > > > > > > - Yes, the new protocol relies on topic IDs with the > exception > > > of the > > > > > > > > topic names based in the ConsumerGroupHeartbeatRequest. I am > not > > > sure > > > > > > > > if using topic names is the right call here. I need to think > > > about it > > > > > > > > a little more. Obviously, the KIP does not change the > > > fetch/commit > > > > > > > > offsets RPCs to use topic IDs. This may be something that we > > > should > > > > > > > > include though as it would give better overall guarantee in > the > > > > > > > > producer. > > > > > > > > - You're right. I think that I should not have mentioned this > > > flag at > > > > > > > > all. I will remove it. We can use an internal configuration > while > > > > > > > > developing the feature. > > > > > > > > - Both cluster types will be supported. The change is > > > orthogonal. The > > > > > > > > only requirement is that the cluster uses topic IDs. > > > > > > > > > > > > > > > > Best, > > > > > > > > David > > > > > > > > > > > > > > > > On Mon, Jul 11, 2022 at 9:53 PM Guozhang Wang < > > > wangg...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Ismael, > > > > > > > > > > > > > > > > > > Thanks for the feedback. Here are some replies inlined > below: > > > > > > > > > > > > > > > > > > On Sat, Jul 9, 2022 at 2:53 AM Ismael Juma < > ism...@juma.me.uk> > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the KIP. This has the potential to be a great > > > > > > > improvement. A few > > > > > > > > > > initial questions/comments: > > > > > > > > > > > > > > > > > > > > 1. I think it's premature to talk about target versions > for > > > > > > > deprecation and > > > > > > > > > > removal of the existing group protocol. Unlike KRaft, > this > > > > > affects a > > > > > > > core > > > > > > > > > > client protocol and hence deprecation/removal will be > heavily > > > > > > > dependent on > > > > > > > > > > how quickly applications migrate to the new protocol. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah I agree with you. I think we can remove the proposed > > > timeline > > > > > in > > > > > > > the > > > > > > > > > `Compatibility, Deprecation, and Migration Plan` and > instead > > > just > > > > > state > > > > > > > > > that we will decide in the future about when we would > > > deprecate old > > > > > > > > > protocol and behaviors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The KIP says we intend to release this in 4.x, but it > > > wasn't > > > > > made > > > > > > > clear > > > > > > > > > > why. If we added that as a way to estimate when we'd > > > deprecate > > > > > and > > > > > > > remove > > > > > > > > > > the group protocol, I also suggest removing this part. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that's not specifically related to the > > > deprecation/removal > > > > > > > timeline > > > > > > > > > plan, but it's more for client upgrades. I.e. the > broker-side > > > > > > > > > implementation may be done first, and then the client side, > > > and we > > > > > > > would > > > > > > > > > only mark it as "released" by the time clients > implementations > > > are > > > > > > > done. At > > > > > > > > > that time, to enable the feature the clients need to first > > > swap-in > > > > > the > > > > > > > > > bytecode with a rolling bounce and then set the flag with a > > > second > > > > > > > rolling > > > > > > > > > bounce, and hence we feel it's better to be released in a > major > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. We need to flesh out the details of the migration > story. > > > It > > > > > > > sounds like > > > > > > > > > > we're saying we will support online migrations. Is that > > > correct? > > > > > We > > > > > > > should > > > > > > > > > > explain this in detail. It could also be done as a > separate > > > KIP, > > > > > if > > > > > > > it's > > > > > > > > > > easier. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes I think that's the part we can be more concrete about > for > > > sure > > > > > (and > > > > > > > > > this is related to your question 2) above). We will work on > > > making > > > > > it > > > > > > > more > > > > > > > > > explicit in parallel as we solicit more feedback. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. I am happy that we are pushing the pattern > subscriptions > > > to > > > > > the > > > > > > > server, > > > > > > > > > > but it seems like there could be some tricky > compatibility > > > > > issues. > > > > > > > Will we > > > > > > > > > > have a mechanism for users to detect that they need to > update > > > > > their > > > > > > > regex > > > > > > > > > > before switching to the new protocol? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes I think we need some tooling for non-java client users > to > > > sort > > > > > of > > > > > > > > > "dry-run" the client before switching to the new protocol. > I > > > do not > > > > > > > have a > > > > > > > > > specific idea on top of my head though, maybe others like > @Matt > > > > > > > Howlett can > > > > > > > > > chime-in here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. Related to the last question, will the Java client > allow > > > the > > > > > > > users to > > > > > > > > > > stick with the current regex engine for compatibility > > > reasons? > > > > > For > > > > > > > example, > > > > > > > > > > it may be handy to keep using client based regex at > first to > > > keep > > > > > > > > > > migrations simple and then migrate to server based > regexes > > > as a > > > > > > > second > > > > > > > > > > step. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Honestly I have not thought about that for java clients, > and > > > we can > > > > > > > discuss > > > > > > > > > that. What kind of compatibility issues do you have in > mind? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. When we say that the group coordinator will be > > > responsible for > > > > > > > storing > > > > > > > > > > the configurations and that the configurations will be > > > deleted > > > > > when > > > > > > > the > > > > > > > > > > group is deleted. Will a transition to DEAD trigger > deletion > > > of > > > > > > > > > > configurations? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, since the DEAD state is an ending state (we would only > > > > > transit to > > > > > > > that > > > > > > > > > state when the group is EMPTY and also all of its metadata > are > > > > > gone), > > > > > > > once > > > > > > > > > it's transited to DEAD this group would never be revived. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7. Will the choice to store the configs in the group > > > coordinator > > > > > > > make it > > > > > > > > > > harder to list all cluster configs and their values? > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's a good question, and our thoughts are that the > so-called > > > > > "group > > > > > > > > > configurations" are overrides of the cluster-level > > > configurations > > > > > > > > > customized per group so when an admin list cluster configs > it's > > > > > okay to > > > > > > > > > list just the cluster-level "defaults", not showing any > > > per-group > > > > > > > > > customizations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8. How would someone configure a group before starting > the > > > > > > > consumers? Have > > > > > > > > > > we considered allowing the explicit creation of groups? > > > > > > > Alternatively, the > > > > > > > > > > configs could be decoupled from the group lifecycle. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The configs can be created before the group itself as an > > > > > independent > > > > > > > entity > > > > > > > > > --- of course, this requires the corresponding request to > be > > > > > routed to > > > > > > > the > > > > > > > > > right coordinator based on the group id --- the only thing > that > > > > > > > differs is, > > > > > > > > > when the group itself is gone we also check if there are > any > > > > > > > configuration > > > > > > > > > entities related to that group and delete as well. > > > > > > > > > > > > > > > > > > Admittedly this indeed introduces an asymmetry on the > creation > > > / > > > > > > > deletion > > > > > > > > > lifecycles of the config entities, and we would like to > hear > > > > > everyone's > > > > > > > > > feelings whether we should aim for symmetry i.e. totally > > > decouple > > > > > group > > > > > > > > > configs and hence not delete them at all when the group is > > > gone, > > > > > but > > > > > > > always > > > > > > > > > require explicit deletion operations by themselves. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 9. Will the Consumer.subscribe method for the Java client > > > still > > > > > take > > > > > > > a > > > > > > > > > > `java.util.regex.Pattern` of do we have to introduce an > > > overload? > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think we do not need to introduce an overload, but I'm > all > > > ears > > > > > if > > > > > > > there > > > > > > > > > may be some compatibility issues that we may overlook. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 10. I agree with Justine that we should be clearer about > the > > > > > reason > > > > > > > to > > > > > > > > > > switch to IBP/metadata.version from the feature flag. > Maybe > > > we > > > > > mean > > > > > > > that we > > > > > > > > > > can switch the default for the feature flag to true > based on > > > the > > > > > > > > > > metadata.version once we want to make it the default. > > > > > > > > > > > > > > > > > > 11. Some of the protocol APIs don't mention the required > ACLs, > > > it > > > > > > > would be > > > > > > > > > > good to add that for consistency. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ack, we can certainly do that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 12. It is a bit odd that ConsumerGroupHeartbeat requires > > > "Read > > > > > > > Group" even > > > > > > > > > > though it seems to do more than reading. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I had that thought myself as well, but in the end we could > not > > > > > find a > > > > > > > > > better alternative: adding Write Group seems an overkill > here > > > > > since we > > > > > > > do > > > > > > > > > not have it elsewhere (we only have Read / Delete and > Describe > > > on > > > > > > > groups so > > > > > > > > > far). Would like to hear others thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 13. How is topic recreation handled by the consumer with > the > > > new > > > > > > > group > > > > > > > > > > protocol? It would be good to have a section on this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > You mean with regex subscription right? Yes we can add a > > > section > > > > > about > > > > > > > > > that, but basically the idea is that consumer would be > totally > > > > > > > agnostic in > > > > > > > > > the new protocol as it's handled all by the brokers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 14. The KIP mentions we will write the new coordinator in > > > Java. > > > > > Even > > > > > > > though > > > > > > > > > > this is an implementation detail, do we plan to have a > new > > > gradle > > > > > > > module > > > > > > > > > > for it? > > > > > > > > > > > > > > > > > > > > > > > > > > > > We have not thought about that. But I think the answer > should > > > be > > > > > yes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 15. Do we have a scalability goal when it comes to how > many > > > > > members > > > > > > > the new > > > > > > > > > > group protocol can support? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Within a group, I think we should shoot for 1000s of > members. > > > But > > > > > that > > > > > > > > > scalability goals also depend on the offset management > (commit, > > > > > fetch) > > > > > > > > > capabilities of the coordinator which we did not cover in > this > > > > > KIP, so > > > > > > > it's > > > > > > > > > hard to give a number that applies universally. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 16. Did we consider having SubscribedTopidIds instead of > > > > > > > > > > SubscribedTopicNames in ConsumerGroupHeartbeatRequest? > Is the > > > > > idea > > > > > > > that > > > > > > > > > > since we have to resolve the regex on the server, we can > do > > > the > > > > > same > > > > > > > for > > > > > > > > > > the topic name? The difference is that sending the regex > is > > > more > > > > > > > efficient > > > > > > > > > > whereas sending the topic names is less efficient. > > > Furthermore, > > > > > > > delete and > > > > > > > > > > recreation is easier to handle if we have topic ids. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The main reason to still let the clients send names is to > keep > > > the > > > > > > > > > reasoning of names -> ids on the broker / admin client > only. > > > Note > > > > > that > > > > > > > > > although we added topic id in KIP-516, we never > implemented the > > > > > logic > > > > > > > on > > > > > > > > > consumer/producers leveraging the related newer versioned > RPCs, > > > > > > > instead we > > > > > > > > > just set the topic id as empty UUID. We want to keep the > > > > > > > consumer/producer > > > > > > > > > to be thin and only delegate the reasoning on broker and > > > > > potentially > > > > > > > admin > > > > > > > > > clients. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > > > On Wed, Jul 6, 2022 at 10:45 AM David Jacot > > > > > > > <dja...@confluent.io.invalid> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion thread on KIP-848: > The > > > Next > > > > > > > > > > > Generation of the Consumer Rebalance Protocol. With > this > > > KIP, > > > > > we > > > > > > > aim > > > > > > > > > > > to make the rebalance protocol (for consumers) more > > > reliable, > > > > > more > > > > > > > > > > > scalable, easier to implement for clients, and easier > to > > > debug > > > > > for > > > > > > > > > > > operators. > > > > > > > > > > > > > > > > > > > > > > The KIP is here: > > > https://cwiki.apache.org/confluence/x/HhD1D. > > > > > > > > > > > > > > > > > > > > > > Please take a look and let me know what you think. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > PS: I will be away from July 18th to August 8th. That > gives > > > > > you a > > > > > > > bit > > > > > > > > > > > of time to read and digest this long KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- Guozhang > > > > > > > > > >