[ 
https://issues.apache.org/jira/browse/HADOOP-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-1849:
---------------------------------

    Fix Version/s:     (was: 0.15.0)
      Description: 
Currently max queue size for IPC server is set to (100 * handlers). Usually 
when RPC failures are observed (e.g. HADOOP-1763), we increase number of 
handlers and the problem goes away. I think a big part of such a fix is 
increase in max queue size. I think we should make maxQsize per handler 
configurable (with a bigger default than 100). There are other improvements 
also (HADOOP-1841).

Server keeps reading RPC requests from clients. When the number in-flight RPCs 
is larger than maxQsize, the earliest RPCs are deleted. This is the main 
feedback Server has for the client. I have often heard from users that Hadoop 
doesn't handle bursty traffic.

Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite 
conservative/low for a typical server), it implies that an RPC can wait for 
only for 1 sec before it is dropped. If there 3000 clients and all of them send 
RPCs around the same time (not very rare, with heartbeats etc), 2000 will be 
dropped. In stead of dropping the earliest RPCs, if the server delays reading 
new RPCs, the feedback to clients would be much smoother, I will file another 
jira regd queue management.

For this jira I propose to make queue size per handler configurable, with a 
larger default (may be 500).


  was:

Currently max queue size for IPC server is set to (100 * handlers). Usually 
when RPC failures are observed (e.g. HADOOP-1763), we increase number of 
handlers and the problem goes away. I think a big part of such a fix is 
increase in max queue size. I think we should make maxQsize per handler 
configurable (with a bigger default than 100). There are other improvements 
also (HADOOP-1841).

Server keeps reading RPC requests from clients. When the number in-flight RPCs 
is larger than maxQsize, the earliest RPCs are deleted. This is the main 
feedback Server has for the client. I have often heard from users that Hadoop 
doesn't handle bursty traffic.

Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite 
conservative/low for a typical server), it implies that an RPC can wait for 
only for 1 sec before it is dropped. If there 3000 clients and all of them send 
RPCs around the same time (not very rare, with heartbeats etc), 2000 will be 
dropped. In stead of dropping the earliest RPCs, if the server delays reading 
new RPCs, the feedback to clients would be much smoother, I will file another 
jira regd queue management.

For this jira I propose to make queue size per handler configurable, with a 
larger default (may be 500).



> IPC server max queue size should be configurable
> ------------------------------------------------
>
>                 Key: HADOOP-1849
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1849
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Raghu Angadi
>
> Currently max queue size for IPC server is set to (100 * handlers). Usually 
> when RPC failures are observed (e.g. HADOOP-1763), we increase number of 
> handlers and the problem goes away. I think a big part of such a fix is 
> increase in max queue size. I think we should make maxQsize per handler 
> configurable (with a bigger default than 100). There are other improvements 
> also (HADOOP-1841).
> Server keeps reading RPC requests from clients. When the number in-flight 
> RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main 
> feedback Server has for the client. I have often heard from users that Hadoop 
> doesn't handle bursty traffic.
> Say handler count is 10 (default) and Server can handle 1000 RPCs a sec 
> (quite conservative/low for a typical server), it implies that an RPC can 
> wait for only for 1 sec before it is dropped. If there 3000 clients and all 
> of them send RPCs around the same time (not very rare, with heartbeats etc), 
> 2000 will be dropped. In stead of dropping the earliest RPCs, if the server 
> delays reading new RPCs, the feedback to clients would be much smoother, I 
> will file another jira regd queue management.
> For this jira I propose to make queue size per handler configurable, with a 
> larger default (may be 500).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to