[ 
https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618281#comment-16618281
 ] 

Chen Liang commented on HDFS-13924:
-----------------------------------

Thanks [~csun]. I see, so I imagine the error did not happen on server side, 
because server side does not treat this as error, it still returns a 
LocatedBlock, but with empty block info list. This only becomes an exception 
when later client actually tries to read the block? If this is what was 
happening, maybe another fix would be that on server side, if server finds 
itself in observer state, and getBlockLocations is called with no known block 
info, instead of returning empty list, it throws exception instead, so that 
client side triggers retry to a different node.

Let DFSInputStream switch to active also makes sense to me though.

> Handle BlockMissingException when reading from observer
> -------------------------------------------------------
>
>                 Key: HDFS-13924
>                 URL: https://issues.apache.org/jira/browse/HDFS-13924
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chao Sun
>            Priority: Major
>
> Internally we found that reading from ObserverNode may result to 
> {{BlockMissingException}}. This may happen when the observer sees a smaller 
> number of DNs than active (maybe due to communication issue with those DNs), 
> or (we guess) late block reports from some DNs to the observer. This error 
> happens in 
> [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
>  when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the 
> {{DFSInputStream}} to retry active when this happens. The retry logic already 
> present in the code - we just have to dynamically set a flag to ask the 
> {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to