[ https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622626#comment-16622626 ]
Chen Liang commented on HDFS-13924: ----------------------------------- Thanks for the update [~csun]! Just to add to my previous comment: What I was thinking of was to make the retry a more uniformed fashion. Specifically, in the ideal situation, I think on client side, it should always only be ProxyProvider that handles NN redirecting logic. To this extend, I would think of server side a better place to handle this compared to DFSOutputStream: server side throws exception, then ProxyProvider does the redirecting properly, so DFSInputStream is hidden from the retry and doesn't need to do anything in addition. So IMO, the better way may be, just like you mentioned, creating a new exception, say, ObserverOperationFailException, for all the situations where Observer can not successfully handle a whatever request and worth retry active, just throw this exception. Whenever ObserverProxyProvider sees this exception, try again with active. Something along this line. > Handle BlockMissingException when reading from observer > ------------------------------------------------------- > > Key: HDFS-13924 > URL: https://issues.apache.org/jira/browse/HDFS-13924 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Chao Sun > Priority: Major > > Internally we found that reading from ObserverNode may result to > {{BlockMissingException}}. This may happen when the observer sees a smaller > number of DNs than active (maybe due to communication issue with those DNs), > or (we guess) late block reports from some DNs to the observer. This error > happens in > [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846], > when no valid DN can be found for the {{LocatedBlock}} got from the NN side. > One potential solution (although a little hacky) is to ask the > {{DFSInputStream}} to retry active when this happens. The retry logic already > present in the code - we just have to dynamically set a flag to ask the > {{ObserverReadProxyProvider}} try active in this case. > cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org