[jira] [Commented] (HDFS-9665) Cilents are easily affected by standby namenode
[ https://issues.apache.org/jira/browse/HDFS-9665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107854#comment-15107854 ] He Tianyi commented on HDFS-9665: - Also, I observed that {{doTailEdits}} may acquire write lock in standby namenode and last several seconds in my cluster. This causes first request from client to block either. > Cilents are easily affected by standby namenode > --- > > Key: HDFS-9665 > URL: https://issues.apache.org/jira/browse/HDFS-9665 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client, namenode >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi >Priority: Minor > > My case is during restarting of standby NameNode, there is chances that > {{hadoop fs}} command get hung until either IPC timeout reached or > {{StandbyException}} received, and then failover to active NameNode. > Normally, duration of the 'hung stage' depends on > {{min(timeout_configuration, rpc_queue_time)}}. However, RPC queue in standby > NameNode is usually filled with block reports at this period, client requests > can't get processed quickly. > I wish to get rid of this, by one of the following manners: > a) we distinguish priority in RPC queue (chances causing starvation) > b) we speculate first request, send it to both NameNodes, and take one valid > response. > c) make client aware of HA state (by accessing ZK probably, chances causing > performance issue) > Any suggestions or comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9665) Cilents are easily affected by standby namenode
[ https://issues.apache.org/jira/browse/HDFS-9665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107896#comment-15107896 ] He Tianyi commented on HDFS-9665: - Thanks. That patch worked well. Closing issue. > Cilents are easily affected by standby namenode > --- > > Key: HDFS-9665 > URL: https://issues.apache.org/jira/browse/HDFS-9665 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client, namenode >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi >Priority: Minor > > My case is during restarting of standby NameNode, there is chances that > {{hadoop fs}} command get hung until either IPC timeout reached or > {{StandbyException}} received, and then failover to active NameNode. > Normally, duration of the 'hung stage' depends on > {{min(timeout_configuration, rpc_queue_time)}}. However, RPC queue in standby > NameNode is usually filled with block reports at this period, client requests > can't get processed quickly. > I wish to get rid of this, by one of the following manners: > a) we distinguish priority in RPC queue (chances causing starvation) > b) we speculate first request, send it to both NameNodes, and take one valid > response. > c) make client aware of HA state (by accessing ZK probably, chances causing > performance issue) > Any suggestions or comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9665) Cilents are easily affected by standby namenode
[ https://issues.apache.org/jira/browse/HDFS-9665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107858#comment-15107858 ] Jing Zhao commented on HDFS-9665: - [~He Tianyi], maybe you can try to use {{RequestHedgingProxyProvider}} from HDFS-7858? > Cilents are easily affected by standby namenode > --- > > Key: HDFS-9665 > URL: https://issues.apache.org/jira/browse/HDFS-9665 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client, namenode >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi >Priority: Minor > > My case is during restarting of standby NameNode, there is chances that > {{hadoop fs}} command get hung until either IPC timeout reached or > {{StandbyException}} received, and then failover to active NameNode. > Normally, duration of the 'hung stage' depends on > {{min(timeout_configuration, rpc_queue_time)}}. However, RPC queue in standby > NameNode is usually filled with block reports at this period, client requests > can't get processed quickly. > I wish to get rid of this, by one of the following manners: > a) we distinguish priority in RPC queue (chances causing starvation) > b) we speculate first request, send it to both NameNodes, and take one valid > response. > c) make client aware of HA state (by accessing ZK probably, chances causing > performance issue) > Any suggestions or comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)