[ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson updated KAFKA-747: ---------------------------------- Fix Version/s: (was: 0.10.1.0) 0.10.2.0 > RequestChannel re-design > ------------------------ > > Key: KAFKA-747 > URL: https://issues.apache.org/jira/browse/KAFKA-747 > Project: Kafka > Issue Type: New Feature > Components: network > Reporter: Jay Kreps > Assignee: Neha Narkhede > Fix For: 0.10.2.0 > > > We have had some discussion around how to handle queuing requests. There are > two competing concerns: > 1. We need to maintain request order on a per-socket basis. > 2. We want to be able to balance load flexibly over a pool of threads so that > if one thread blocks on I/O request processing continues. > Two Approaches We Have Considered > 1. Have a global queue of unprocessed requests. All I/O threads read requests > off this global queue and process them. To avoid re-ordering have the network > layer only read one request at a time. > 2. Have a queue per I/O thread and have the network threads statically map > sockets to I/O thread request queues. > Problems With These Approaches > In the first case you are not able to get any per-producer parallelism. That > is you can't read the next request while the current one is being handled. > This seems like it would not be a big deal, but preliminary benchmarks show > that it might be. > In the second case there are two problems. The first is that when an I/O > thread gets blocked all request processing for sockets attached to that I/O > thread will grind to a halt. If you have 10,000 connections, and 10 I/O > threads, then each blockage will stop 1,000 producers. If there is one topic > that has long synchronous flush times enabled (or is experiencing fsync > locking) this will cause big latency blips for all producers using that I/O > thread. The next problem is around backpressure and memory management. Say we > use BlockingQueues to feed the I/O threads. And say that one I/O thread > stalls. It's request queue will fill up and it will then block ALL network > threads, since they will block on inserting into that queue, even though the > other I/O threads are unused and have empty queues. > A Proposed Better Solution > The problem with the first solution is that we are not pipelining requests. > The problem with the second approach is that we are too constrained in moving > work from one I/O thread to another. > Instead we should have a single request queue-like structure, but internally > enforce the condition that requests are not re-ordered. > Here are the details. We retain RequestChannel but refactor its internals. > Internally we replace the blocking queue with a linked list. We also keep an > in-flight-keys array with one entry per I/O thread. When removing a work item > from the list we can't just take the first thing. Instead we need to walk the > list and look for something with a request key not in the in-flight-keys > array. When a response is sent, we remove that key from the in-flight array. > This guarantees that requests for a socket with key K are ordered, but that > processing for K can only block requests made by K. -- This message was sent by Atlassian JIRA (v6.3.4#6332)