[ 
https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622626#comment-16622626
 ] 

Chen Liang commented on HDFS-13924:
-----------------------------------

Thanks for the update [~csun]! Just to add to my previous comment: What I was 
thinking of was to make the retry a more uniformed fashion. Specifically, in 
the ideal situation, I think on client side, it should always only be 
ProxyProvider that handles NN redirecting logic. To this extend, I would think 
of server side a better place to handle this compared to DFSOutputStream: 
server side throws exception, then ProxyProvider does the redirecting properly, 
so DFSInputStream is hidden from the retry and doesn't need to do anything in 
addition.

So IMO, the better way may be, just like you mentioned, creating a new 
exception, say, ObserverOperationFailException, for all the situations where 
Observer can not successfully handle a whatever request and worth retry active, 
just throw this exception. Whenever ObserverProxyProvider sees this exception, 
try again with active. Something along this line.

> Handle BlockMissingException when reading from observer
> -------------------------------------------------------
>
>                 Key: HDFS-13924
>                 URL: https://issues.apache.org/jira/browse/HDFS-13924
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chao Sun
>            Priority: Major
>
> Internally we found that reading from ObserverNode may result to 
> {{BlockMissingException}}. This may happen when the observer sees a smaller 
> number of DNs than active (maybe due to communication issue with those DNs), 
> or (we guess) late block reports from some DNs to the observer. This error 
> happens in 
> [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
>  when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the 
> {{DFSInputStream}} to retry active when this happens. The retry logic already 
> present in the code - we just have to dynamically set a flag to ask the 
> {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to