[
https://issues.apache.org/jira/browse/KAFKA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neha Narkhede updated KAFKA-749:
--------------------------------
Attachment: kafka-749-v2.patch
Thanks for the review ! I think I was over thinking the issue of request queue
size having to be larger than io threads. Even if it is smaller, some io
thread's shutdown will wait for some space to free up. Space will free up since
some other io thread will dequeue the AllDone command.
This patch is very simple. It changed the shutdown logic of the Kafka server to
go through following steps -
1. Shutdown acceptor, so no new connections are accepted
2. Shutdown processor threads, they will enqueue the currently selected keys'
requests in the request queue. This is fine since io threads are alive and will
dequeue requests. So this step will not block
3. Request channel shutdown will clear the queue. At this time, no thread is
enqueuing more data. IO threads trying to dequeue data will hang on the
receiveRequest
4. Shutdown io threads, this will enqueue AllDone command in the queue. And all
io threads will shutdown one after the other. Even if the request queue is
smaller than # of io threads, it will eventually shutdown
> Bug in socket server shutdown logic makes the broker hang on shutdown until
> it has to be killed
> -----------------------------------------------------------------------------------------------
>
> Key: KAFKA-749
> URL: https://issues.apache.org/jira/browse/KAFKA-749
> Project: Kafka
> Issue Type: Bug
> Components: network
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Blocker
> Labels: bugs, p1
> Attachments: kafka-749-v1.patch, kafka-749-v2.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The current shutdown logic of the server shuts down the io threads first,
> followed by acceptor and finally processor threads. The shutdown API of io
> threads enqueues a special AllDone command into the common request queue. It
> shuts down the io thread when it dequeues this special all done command. What
> can happen is that while this shutdown command processing is happening on the
> io threads, the network/processor threads can still accept new connections
> and requests and will add those new requests to the request queue. That
> means, more requests can be enqueued after the AllDone command. What happens
> is that after the io threads have shutdown, there is no thread available to
> dequeue from the request queue. So the processor threads can hang while
> adding new requests to a full request queue, thereby blocking the server from
> shutting down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira