Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

David Jacot Tue, 12 Jul 2022 04:39:36 -0700

Hi Justine,

Thanks for your comments. Please find my answers below.


- Yes, the new protocol relies on topic IDs with the exception of the
topic names based in the ConsumerGroupHeartbeatRequest. I am not sure
if using topic names is the right call here. I need to think about it
a little more. Obviously, the KIP does not change the fetch/commit
offsets RPCs to use topic IDs. This may be something that we should
include though as it would give better overall guarantee in the
producer.
- You're right. I think that I should not have mentioned this flag at
all. I will remove it. We can use an internal configuration while
developing the feature.
- Both cluster types will be supported. The change is orthogonal. The
only requirement is that the cluster uses topic IDs.

Best,
David

On Mon, Jul 11, 2022 at 9:53 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> Hi Ismael,
>
> Thanks for the feedback. Here are some replies inlined below:
>
> On Sat, Jul 9, 2022 at 2:53 AM Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Thanks for the KIP. This has the potential to be a great improvement. A few
> > initial questions/comments:
> >
> > 1. I think it's premature to talk about target versions for deprecation and
> > removal of the existing group protocol. Unlike KRaft, this affects a core
> > client protocol and hence deprecation/removal will be heavily dependent on
> > how quickly applications migrate to the new protocol.
> >
>
> Yeah I agree with you. I think we can remove the proposed timeline in the
> `Compatibility, Deprecation, and Migration Plan` and instead just state
> that we will decide in the future about when we would deprecate old
> protocol and behaviors.
>
>
> > 2. The KIP says we intend to release this in 4.x, but it wasn't made clear
> > why. If we added that as a way to estimate when we'd deprecate and remove
> > the group protocol, I also suggest removing this part.
> >
>
> I think that's not specifically related to the deprecation/removal timeline
> plan, but it's more for client upgrades. I.e. the broker-side
> implementation may be done first, and then the client side, and we would
> only mark it as "released" by the time clients implementations are done. At
> that time, to enable the feature the clients need to first swap-in the
> bytecode with a rolling bounce and then set the flag with a second rolling
> bounce, and hence we feel it's better to be released in a major version.
>
>
> > 3. We need to flesh out the details of the migration story. It sounds like
> > we're saying we will support online migrations. Is that correct? We should
> > explain this in detail. It could also be done as a separate KIP, if it's
> > easier.
> >
>
> Yes I think that's the part we can be more concrete about for sure (and
> this is related to your question 2) above). We will work on making it more
> explicit in parallel as we solicit more feedback.
>
>
> > 4. I am happy that we are pushing the pattern subscriptions to the server,
> > but it seems like there could be some tricky compatibility issues. Will we
> > have a mechanism for users to detect that they need to update their regex
> > before switching to the new protocol?
> >
>
> Yes I think we need some tooling for non-java client users to sort of
> "dry-run" the client before switching to the new protocol. I do not have a
> specific idea on top of my head though, maybe others like @Matt Howlett can
> chime-in here?
>
>
> > 5. Related to the last question, will the Java client allow the users to
> > stick with the current regex engine for compatibility reasons? For example,
> > it may be handy to keep using client based regex at first to keep
> > migrations simple and then migrate to server based regexes as a second
> > step.
> >
>
> Honestly I have not thought about that for java clients, and we can discuss
> that. What kind of compatibility issues do you have in mind?
>
>
> > 6. When we say that the group coordinator will be responsible for storing
> > the configurations and that the configurations will be deleted when the
> > group is deleted. Will a transition to DEAD trigger deletion of
> > configurations?
> >
>
> Yes, since the DEAD state is an ending state (we would only transit to that
> state when the group is EMPTY and also all of its metadata are gone), once
> it's transited to DEAD this group would never be revived.
>
>
> > 7. Will the choice to store the configs in the group coordinator make it
> > harder to list all cluster configs and their values?
> >
>
> That's a good question, and our thoughts are that the so-called "group
> configurations" are overrides of the cluster-level configurations
> customized per group so when an admin list cluster configs it's okay to
> list just the cluster-level "defaults", not showing any per-group
> customizations.
>
>
> > 8. How would someone configure a group before starting the consumers? Have
> > we considered allowing the explicit creation of groups? Alternatively, the
> > configs could be decoupled from the group lifecycle.
> >
>
> The configs can be created before the group itself as an independent entity
> --- of course, this requires the corresponding request to be routed to the
> right coordinator based on the group id --- the only thing that differs is,
> when the group itself is gone we also check if there are any configuration
> entities related to that group and delete as well.
>
> Admittedly this indeed introduces an asymmetry on the creation / deletion
> lifecycles of the config entities, and we would like to hear everyone's
> feelings whether we should aim for symmetry i.e. totally decouple group
> configs and hence not delete them at all when the group is gone, but always
> require explicit deletion operations by themselves.
>
>
> > 9. Will the Consumer.subscribe method for the Java client still take a
> > `java.util.regex.Pattern` of do we have to introduce an overload?
> >
>
> I think we do not need to introduce an overload, but I'm all ears if there
> may be some compatibility issues that we may overlook.
>
>
> > 10. I agree with Justine that we should be clearer about the reason to
> > switch to IBP/metadata.version from the feature flag. Maybe we mean that we
> > can switch the default for the feature flag to true based on the
> > metadata.version once we want to make it the default.
>
> 11. Some of the protocol APIs don't mention the required ACLs, it would be
> > good to add that for consistency.
> >
>
> Ack, we can certainly do that.
>
>
> > 12. It is a bit odd that ConsumerGroupHeartbeat requires "Read Group" even
> > though it seems to do more than reading.
> >
>
> I had that thought myself as well, but in the end we could not find a
> better alternative: adding Write Group seems an overkill here since we do
> not have it elsewhere (we only have Read / Delete and Describe on groups so
> far). Would like to hear others thoughts.
>
>
> > 13. How is topic recreation handled by the consumer with the new group
> > protocol? It would be good to have a section on this.
> >
>
> You mean with regex subscription right? Yes we can add a section about
> that, but basically the idea is that consumer would be totally agnostic in
> the new protocol as it's handled all by the brokers.
>
>
> > 14. The KIP mentions we will write the new coordinator in Java. Even though
> > this is an implementation detail, do we plan to have a new gradle module
> > for it?
> >
>
> We have not thought about that. But I think the answer should be yes.
>
>
> > 15. Do we have a scalability goal when it comes to how many members the new
> > group protocol can support?
> >
>
> Within a group, I think we should shoot for 1000s of members. But that
> scalability goals also depend on the offset management (commit, fetch)
> capabilities of the coordinator which we did not cover in this KIP, so it's
> hard to give a number that applies universally.
>
>
> > 16. Did we consider having SubscribedTopidIds instead of
> > SubscribedTopicNames in ConsumerGroupHeartbeatRequest? Is the idea that
> > since we have to resolve the regex on the server, we can do the same for
> > the topic name? The difference is that sending the regex is more efficient
> > whereas sending the topic names is less efficient. Furthermore, delete and
> > recreation is easier to handle if we have topic ids.
> >
>
> The main reason to still let the clients send names is to keep the
> reasoning of names -> ids on the broker / admin client only. Note that
> although we added topic id in KIP-516, we never implemented the logic on
> consumer/producers leveraging the related newer versioned RPCs, instead we
> just set the topic id as empty UUID. We want to keep the consumer/producer
> to be thin and only delegate the reasoning on broker and potentially admin
> clients.
>
>
> >
> > Thanks,
> > Ismael
> >
> > On Wed, Jul 6, 2022 at 10:45 AM David Jacot <dja...@confluent.io.invalid>
> > wrote:
> >
> > > Hi all,
> > >
> > > I would like to start a discussion thread on KIP-848: The Next
> > > Generation of the Consumer Rebalance Protocol. With this KIP, we aim
> > > to make the rebalance protocol (for consumers) more reliable, more
> > > scalable, easier to implement for clients, and easier to debug for
> > > operators.
> > >
> > > The KIP is here: https://cwiki.apache.org/confluence/x/HhD1D.
> > >
> > > Please take a look and let me know what you think.
> > >
> > > Best,
> > > David
> > >
> > > PS: I will be away from July 18th to August 8th. That gives you a bit
> > > of time to read and digest this long KIP.
> > >
> >
>
>
> --
> -- Guozhang

Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

Reply via email to