Wang, Xinglong created HADOOP-17237:
---------------------------------------
Summary: Improve NameNode RPC throughput with
ReadWriteRpcCallQueue
Key: HADOOP-17237
URL: https://issues.apache.org/jira/browse/HADOOP-17237
Project: Hadoop Common
Issue Type: Improvement
Components: rpc-server
Reporter: Wang, Xinglong
*Current*
In our production cluster, a typical traffic model is read to write raito is
10:1 and sometimes the ratios goes to 30:1.
NameNode is using ReEntrantReadWriteLock under the hood of FSNamesystemLock.
Read lock is shared lock while write lock is exclusive lock.
Read RPC and Write RPC comes randomly to namenode. This makes read and write
mixed up. And then only a small fraction of read can really share their read
lock.
Currently we have default callqueue and faircallqueue. And we can
refreshCallQueue on the fly. This opens room to design new call queue.
*Idea*
If we reorder the rpc call in callqueue to group read rpc together and write
rpc together, we will have sort of control to let a batch of read rpc come to
handlers together and possibly share the same read lock. Thus we can reduce
Fragments of read locks.
This will only improve the chance to share the read lock among the batch of
read rpc due to there are some namenode internal write lock is out of call
queue.
Under ReEntrantReadWriteLock, there is a queue to manage threads asking for
locks. We can give an example.
R: stands for read rpc
W: stands for write rpc
e.g
RRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRW
In this case, we need 16 lock timeslice.
optimized
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWWWWWWWW
In this case, we only need 9 lock timeslice.
*Correctness*
Since the execution order of any 2 concurrent or queued rpc in namenode is not
guaranteed. We can reorder the rpc in callqueue into read group and write
group. And then dequeue from these 2 queues by a designed strategy. let's say
dequeue 100 read and then dequeue 5 write rpc and then dequeue read again and
then write again.
Since FairCallQueue also does rpc call reorder in callqueue, for this part I
think they share the same logic to guarantee rpc result correctness.
*Performance*
In test environment, we can see a 15% - 20% NameNode RPC throughput
improvement comparing with default callqueue.
Test traffic is 30 read:3 write :1 list using NNLoadGeneratorMR
This performance is not a surprise. Due to some write rpc is not managed in
callqueue. We can't do reorder to them by reording calls in callqueue.
But still we can do a fully read write reorder if we redesign
ReEntrantReadWriteLock to achieve this. This will be further step after this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]