Hi Philip,

I have been scanning through https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design and KIP-848 and from this I understand that the kafka consumer API will not change.

Perhaps the refactoring and/or KIP-848 is a good opportunity to improve the API somewhat. In this email I explain why and also give a rough idea what that could look like.

In the current API, the rebalance listener callback gives the user a chance to commit all work in progress before a partition is actually revoked and assigned to another consumer.

While the callback is doing all this, the main user thread is not able to process new incoming data. So the rebalance listener affects latency and throughput for non-revoked partitions during a rebalance.

In addition, I feel that doing a lot of stuff /in/ a callback is always quite awkward. Better only use it to trigger some processing elsewhere.

Therefore, I would like to propose a new API that does not have these problems and is easy to use (and I hope still easy to implement). In my ideal world, poll is the only method that you need. Lets call it poll2 (to do: come up with a less crappy name). Poll2 returns more than just the polled records, it will also contain newly assigned partitions, partitions that will be revoked during the next call to poll2, partitions that were lost, and perhaps it will even contain the offsets committed so far.

The most important idea here is that partitions are not revoked immediately, but in the next call to poll2.

With this API, a user can commit offsets at their own pace during a rebalance. Optionally, for the case that processing of data from the to-be-revoked partition is stil ongoing, we allow the user to postpone the actual revocation in the next poll, so that polling can continue for other partitions.

Since we are no longer blocking the main user thread, partitions that are not revoked can be processed at full speed.

Removal of the rebalance listener also makes the API safer; there is no more need for the thread-id check (nor KIP-944) because, concurrent invocations are simply no longer needed. (Of course, if backward compatibility is a goal, not all of these things can be done.)

Curious to your thoughts and kind regards,
    Erik.

--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com

Reply via email to