Thanks for the explanation! So effectively, a share group is subscribed to each partition - but the data is not pushed to the consumer, but only sent on demand. And when demand is signalled, a batch of messages is sent? Hence it would be up to the consumer to prefetch a sufficient number of batches to ensure, that it will never be "bored"?
Adam > On 30 May 2023, at 15:25, Andrew Schofield <andrew_schofi...@live.com> wrote: > > Hi Adam, > Thanks for your question. > > With a share group, each fetch is able to grab available records from any > partition. So, it alleviates > the “head-of-line” blocking problem where a slow consumer gets in the way. > There’s no actual > stealing from a slow consumer, but it can be overtaken and must complete its > processing within > the timeout. > > The way I see this working is that when a consumer joins a share group, it > receives a set of > assigned share-partitions. To start with, every consumer will be assigned all > partitions. We > can be smarter than that, but I think that’s really a question of writing a > smarter assignor > just as has occurred over the years with consumer groups. > > Only a small proportion of Kafka workloads are super high throughput. Share > groups would > struggle with those I’m sure. Share groups do not diminish the value of > consumer groups > for streaming. They just give another option for situations where a different > style of > consumption is more appropriate. > > Thanks, > Andrew > >> On 29 May 2023, at 17:18, Adam Warski <a...@warski.org> wrote: >> >> Hello, >> >> thank you for the proposal! A very interesting read. >> >> I do have one question, though. When you subscribe to a topic using consumer >> groups, it might happen that one consumer has processed all messages from >> its partitions, while another one still has a lot of work to do (this might >> be due to unbalanced partitioning, long processing times etc.). In a >> message-queue approach, it would be great to solve this problem - so that a >> consumer that is free can steal work from other consumers. Is this somehow >> covered by share groups? >> >> Maybe this is planned as "further work", as indicated here: >> >> " >> It manages the topic-partition assignments for the share-group members. An >> initial, trivial implementation would be to give each member the list of all >> topic-partitions which matches its subscriptions and then use the pull-based >> protocol to fetch records from all partitions. A more sophisticated >> implementation could use topic-partition load and lag metrics to distribute >> partitions among the consumers as a kind of autonomous, self-balancing >> partition assignment, steering more consumers to busier partitions, for >> example. Alternatively, a push-based fetching scheme could be used. Protocol >> details will follow later. >> " >> >> but I’m not sure if I understand this correctly. A fully-connected graph >> seems like a lot of connections, and I’m not sure if this would play well >> with streaming. >> >> This also seems as one of the central problems - a key differentiator >> between share and consumer groups (the other one being persisting state of >> messages). And maybe the exact way we’d want to approach this would, to a >> certain degree, dictate the design of the queueing system? >> >> Best, >> Adam Warski >> >> On 2023/05/15 11:55:14 Andrew Schofield wrote: >>> Hi, >>> I would like to start a discussion thread on KIP-932: Queues for Kafka. >>> This KIP proposes an alternative to consumer groups to enable cooperative >>> consumption by consumers without partition assignment. You end up with >>> queue semantics on top of regular Kafka topics, with per-message >>> acknowledgement and automatic handling of messages which repeatedly fail to >>> be processed. >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka >>> >>> Please take a look and let me know what you think. >>> >>> Thanks. >>> Andrew >> > -- Adam Warski https://www.softwaremill.com https://twitter.com/adamwarski