[ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562104#comment-14562104
 ] 

Jay Kreps commented on KAFKA-2168:
----------------------------------

Yeah for processing messages we really thought about two models:
1. N fetcher threads with N clients feeding M processors through a blocking 
queue (M could be 1) 
2. N threads each with a client which does it's own processing
I think what you are suggesting is 
3. M processor threads each fetching and processing using a shared client

I wonder how well (3) will work given that poll() returns all available 
messages. So it may not distribute load very well.

[~ewencp] your use cases are good. Here is my take on those:
1. See above
2. Doesn't this work now? The concern with allowing simultaneous commit() and 
poll is just that it will expose weird intermediate states where your offset is 
being updated.
3. I don't think the consumer's close is blocking in the way the producer's is. 
The producer has to block until all sent messages are gone but the consumer 
just shuts down your tcp connections and exits so I'm not sure if this is 
needed?
4. I think metrics() can actually be called in parallel from any number of 
threads and isn't blocked on poll
5. This is a good point. It may be okay just to wait, though, right and have MM 
just set a low timeout?

At a high-level my concern is just that this may be a fairly deep change and 
break a lot of assumptions. That code is very single-threaded top to bottom so 
I'm worried about trying to change that and kind of hoping we can wave our 
hands and not need to :-)


> New consumer poll() can block other calls like position(), commit(), and 
> close() indefinitely
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2168
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2168
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>
> The new consumer is currently using very coarse-grained synchronization. For 
> most methods this isn't a problem since they finish quickly once the lock is 
> acquired, but poll() might run for a long time (and commonly will since 
> polling with long timeouts is a normal use case). This means any operations 
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It 
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If 
> the commit policy is not purely time based, it's not currently possibly to 
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have 
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) 
> this is probably hard to make work cleanly since locking is currently only 
> performed at the KafkaConsumer level and we'd want it unlocked around a 
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This 
> would require some additional coordination to make sure the caller of 
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
> thread being woken up and then promptly reacquiring the lock with a 
> subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to