[ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2168:
---------------------------------
    Fix Version/s: 0.8.3

> New consumer poll() can block other calls like position(), commit(), and 
> close() indefinitely
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2168
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2168
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: clients, consumer
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.8.3
>
>         Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, 
> KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, 
> KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, 
> KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, 
> KAFKA-2168_2015-06-11_14:09:59.patch
>
>
> The new consumer is currently using very coarse-grained synchronization. For 
> most methods this isn't a problem since they finish quickly once the lock is 
> acquired, but poll() might run for a long time (and commonly will since 
> polling with long timeouts is a normal use case). This means any operations 
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It 
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If 
> the commit policy is not purely time based, it's not currently possibly to 
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have 
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) 
> this is probably hard to make work cleanly since locking is currently only 
> performed at the KafkaConsumer level and we'd want it unlocked around a 
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This 
> would require some additional coordination to make sure the caller of 
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
> thread being woken up and then promptly reacquiring the lock with a 
> subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to