He Tianyi created HDFS-9665: ------------------------------- Summary: Cilents are easily affected by standby namenode Key: HDFS-9665 URL: https://issues.apache.org/jira/browse/HDFS-9665 Project: Hadoop HDFS Issue Type: Wish Components: hdfs-client, namenode Affects Versions: 2.6.0 Reporter: He Tianyi Assignee: He Tianyi Priority: Minor
My case is during restarting of standby NameNode, there is chances that {{hadoop fs}} command get hung until either IPC timeout reached or {{StandbyException}} received, and then failover to active NameNode. Normally, duration of the 'hung stage' depends on {{min(timeout_configuration, rpc_queue_time)}}. However, RPC queue in standby NameNode is usually filled with block reports at this period, client requests can't get processed quickly. I wish to get rid of this, by one of the following manners: a) we distinguish priority in RPC queue (chances causing starvation) b) we speculate first request, send it to both NameNodes, and take one valid response. c) make client aware of HA state (by accessing ZK probably, chances causing performance issue) Any suggestions or comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)