[
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771981#comment-13771981
]
Guozhang Wang commented on KAFKA-1043:
--------------------------------------
IMHO the local time processing the fetch response is linear to # partitions in
the request, while the network time writing the socket buffer is not, depending
on whether the data is still in file cache or not. Hence following the 1)
reset-socket-buffer-size or 2) subset-topic-partitions-at-a-time methods if we
need either 1) set the buffer size too small which is unfair for other requests
that do not hit I/O and may result in unnecessary round trips or 2) fetch too
small a subset of topic-partitions which will be the same case as 1).
Capping based on time is better since it provides "fairness" but that seems a
little hacky.
My reasoning of decoupling socket and network processor is the following. As we
scale up the principle should be "various clients are isolated from each
other". As for fetch request it would be "if you request old data from many
topic partitions only your self-request should take long time but other
requests should not be impacted". Today a request's life time as on server is
socket -> network processor -> request handler -> (possible) disk I/O due to
flush for produce request -> socket processor -> network I/O
and one way to enable isolation is that no pair of this path is
single-threaded. Today socket -> network processor is via acceptor, network
processor -> request handler is via request queue, request handler ->
(possible) disk I/O due to flush for produce request is fixed in KAFKA-615; but
socket processor -> network I/O is still coupled, and fixes to issues resulted
by this coupling would be taking care of the "worst case", which does not obey
the "isolation" principle.
I agree this is rather complex and would be a long term thing.
> Time-consuming FetchRequest could block other request in the response queue
> ---------------------------------------------------------------------------
>
> Key: KAFKA-1043
> URL: https://issues.apache.org/jira/browse/KAFKA-1043
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Guozhang Wang
> Assignee: Guozhang Wang
> Fix For: 0.8, 0.8.1
>
>
> Since in SocketServer the processor who takes any request is also responsible
> for writing the response for that request, we make each processor owning its
> own response queue. If a FetchRequest takes irregularly long time to write
> the channel buffer it would block all other responses in the queue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira