[
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554724#comment-14554724
]
Jason Gustafson commented on KAFKA-2168:
----------------------------------------
I feel a little wary about finer-grained synchronization given all the state in
the consumer, the network client, and the selector. I actually think the
two-lock approach is the least intrusive since it only touches the
KafkaConsumer and preserves the current coarse synchronization design, but I
agree that it's unusual. Here's an idea of what it might look like in the code:
{code:java}
lock.queue();
try {
client.wakeup();
lock.lock();
// critical section
} finally {
lock.unlock()
}
{code}
Definitely weird, but not that hard to understand. You'd still run into the
same problem if multiple threads are trying to poll, but that seems like
unintended usage anyway.
> New consumer poll() can block other calls like position(), commit(), and
> close() indefinitely
> ---------------------------------------------------------------------------------------------
>
> Key: KAFKA-2168
> URL: https://issues.apache.org/jira/browse/KAFKA-2168
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer
> Reporter: Ewen Cheslack-Postava
> Assignee: Jason Gustafson
>
> The new consumer is currently using very coarse-grained synchronization. For
> most methods this isn't a problem since they finish quickly once the lock is
> acquired, but poll() might run for a long time (and commonly will since
> polling with long timeouts is a normal use case). This means any operations
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If
> the commit policy is not purely time based, it's not currently possibly to
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector)
> this is probably hard to make work cleanly since locking is currently only
> performed at the KafkaConsumer level and we'd want it unlocked around a
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This
> would require some additional coordination to make sure the caller of
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll()
> thread being woken up and then promptly reacquiring the lock with a
> subsequent long poll() call).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)