[jira] [Updated] (KAFKA-747) RequestChannel re-design

Jason Gustafson (JIRA) Mon, 12 Sep 2016 18:52:59 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Gustafson updated KAFKA-747:
----------------------------------
    Fix Version/s:     (was: 0.10.1.0)
                   0.10.2.0

> RequestChannel re-design
> ------------------------
>
>                 Key: KAFKA-747
>                 URL: https://issues.apache.org/jira/browse/KAFKA-747
>             Project: Kafka
>          Issue Type: New Feature
>          Components: network
>            Reporter: Jay Kreps
>            Assignee: Neha Narkhede
>             Fix For: 0.10.2.0
>
>
> We have had some discussion around how to handle queuing requests. There are 
> two competing concerns:
> 1. We need to maintain request order on a per-socket basis.
> 2. We want to be able to balance load flexibly over a pool of threads so that 
> if one thread blocks on I/O request processing continues.
> Two Approaches We Have Considered
> 1. Have a global queue of unprocessed requests. All I/O threads read requests 
> off this global queue and process them. To avoid re-ordering have the network 
> layer only read one request at a time.
> 2. Have a queue per I/O thread and have the network threads statically map 
> sockets to I/O thread request queues.
> Problems With These Approaches
> In the first case you are not able to get any per-producer parallelism. That 
> is you can't read the next request while the current one is being handled. 
> This seems like it would not be a big deal, but preliminary benchmarks show 
> that it might be. 
> In the second case there are two problems. The first is that when an I/O 
> thread gets blocked all request processing for sockets attached to that I/O 
> thread will grind to a halt. If you have 10,000 connections, and  10 I/O 
> threads, then each blockage will stop 1,000 producers. If there is one topic 
> that has long synchronous flush times enabled (or is experiencing fsync 
> locking) this will cause big latency blips for all producers using that I/O 
> thread. The next problem is around backpressure and memory management. Say we 
> use BlockingQueues to feed the I/O threads. And say that one I/O thread 
> stalls. It's request queue will fill up and it will then block ALL network 
> threads, since they will block on inserting into that queue, even though the 
> other I/O threads are unused and have empty queues.
> A Proposed Better Solution
> The problem with the first solution is that we are not pipelining requests. 
> The problem with the second approach is that we are too constrained in moving 
> work from one I/O thread to another.
> Instead we should have a single request queue-like structure, but internally 
> enforce the condition that requests are not re-ordered.
> Here are the details. We retain RequestChannel but refactor its internals. 
> Internally we replace the blocking queue with a linked list. We also keep an 
> in-flight-keys array with one entry per I/O thread. When removing a work item 
> from the list we can't just take the first thing. Instead we need to walk the 
> list and look for something with a request key not in the in-flight-keys 
> array. When a response is sent, we remove that key from the in-flight array.
> This guarantees that requests for a socket with key K are ordered, but that 
> processing for K can only block requests made by K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-747) RequestChannel re-design

Reply via email to