[
https://issues.apache.org/jira/browse/HADOOP-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525519
]
Raghu Angadi commented on HADOOP-1849:
--------------------------------------
Server log for HADOOP-1763 would have been very useful for this. As far as I
remember Dhruba looked for "dropping because max q reached" messages for
scalability improvements on Namenode. When these messages went away that was a
good indicator of improvement. With a large cluster this is pretty easy to test.
Yes, memory should also be a concern, though increasing handler also has the
same memory increase plus memory for for each of the threads (may be 512k
virtual memory for each thread). I datanode blockReports is one example where
each RPC take a lot of memory.
> IPC server max queue size should be configurable
> ------------------------------------------------
>
> Key: HADOOP-1849
> URL: https://issues.apache.org/jira/browse/HADOOP-1849
> Project: Hadoop
> Issue Type: Improvement
> Reporter: Raghu Angadi
> Fix For: 0.15.0
>
>
> Currently max queue size for IPC server is set to (100 * handlers). Usually
> when RPC failures are observed (e.g. HADOOP-1763), we increase number of
> handlers and the problem goes away. I think a big part of such a fix is
> increase in max queue size. I think we should make maxQsize per handler
> configurable (with a bigger default than 100). There are other improvements
> also (HADOOP-1841).
> Server keeps reading RPC requests from clients. When the number in-flight
> RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main
> feedback Server has for the client. I have often heard from users that Hadoop
> doesn't handle bursty traffic.
> Say handler count is 10 (default) and Server can handle 1000 RPCs a sec
> (quite conservative/low for a typical server), it implies that an RPC can
> wait for only for 1 sec before it is dropped. If there 3000 clients and all
> of them send RPCs around the same time (not very rare, with heartbeats etc),
> 2000 will be dropped. In stead of dropping the earliest RPCs, if the server
> delays reading new RPCs, the feedback to clients would be much smoother, I
> will file another jira regd queue management.
> For this jira I propose to make queue size per handler configurable, with a
> larger default (may be 500).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.