Hi Erik,

Regarding the consumer refactor project, we’re in the process of converting 
Philip’s design to a “proper” KIP here:

    
https://cwiki.apache.org/confluence/display/KAFKA/KIP-945%3A+Update+threading+model+for+Consumer
 

It’s still very much a draft and not ready for a formal DISCUSS thread, but 
we’d welcome feedback.

That said, the callback issue being discussed here may be better served with a 
dedicated KIP so as not to entangle the fate of one with the other.

Thanks,
Kirk

> On Jul 13, 2023, at 11:44 AM, Erik van Oosten <e.vanoos...@grons.nl.INVALID> 
> wrote:
> 
> Hi Philip,
> 
> I have been scanning through 
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design
>  and KIP-848 and from this I understand that the kafka consumer API will not 
> change.
> 
> Perhaps the refactoring and/or KIP-848 is a good opportunity to improve the 
> API somewhat. In this email I explain why and also give a rough idea what 
> that could look like.
> 
> In the current API, the rebalance listener callback gives the user a chance 
> to commit all work in progress before a partition is actually revoked and 
> assigned to another consumer.
> 
> While the callback is doing all this, the main user thread is not able to 
> process new incoming data. So the rebalance listener affects latency and 
> throughput for non-revoked partitions during a rebalance.
> 
> In addition, I feel that doing a lot of stuff /in/ a callback is always quite 
> awkward. Better only use it to trigger some processing elsewhere.
> 
> Therefore, I would like to propose a new API that does not have these 
> problems and is easy to use (and I hope still easy to implement). In my ideal 
> world, poll is the only method that you need. Lets call it poll2 (to do: come 
> up with a less crappy name). Poll2 returns more than just the polled records, 
> it will also contain newly assigned partitions, partitions that will be 
> revoked during the next call to poll2, partitions that were lost, and perhaps 
> it will even contain the offsets committed so far.
> 
> The most important idea here is that partitions are not revoked immediately, 
> but in the next call to poll2.
> 
> With this API, a user can commit offsets at their own pace during a 
> rebalance. Optionally, for the case that processing of data from the 
> to-be-revoked partition is stil ongoing, we allow the user to postpone the 
> actual revocation in the next poll, so that polling can continue for other 
> partitions.
> 
> Since we are no longer blocking the main user thread, partitions that are not 
> revoked can be processed at full speed.
> 
> Removal of the rebalance listener also makes the API safer; there is no more 
> need for the thread-id check (nor KIP-944) because, concurrent invocations 
> are simply no longer needed. (Of course, if backward compatibility is a goal, 
> not all of these things can be done.)
> 
> Curious to your thoughts and kind regards,
>     Erik.
> 
> -- 
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
> 

Reply via email to