[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992443#comment-14992443
 ] 

Bob Hansen commented on HDFS-9103:
----------------------------------

{quote}
"I would like to see the FileHandle::Pread method implement the retry logic 
internally so we have a simple "read all this data or completely fail" method 
rather than forcing partial read and retry onto our consumer. Understanding the 
logic that these errors mean you should retry, but this error means that you 
shouldn't retry could be abstracted away as a kindness to the consumer."
I agree with this. I think the BadDataNodeTracker should be part of the 
filesystem; it seems like it complicates the API to have the user declare it. 
With the set<string> for exclusion I think it was reasonable to pass in but now 
that it's a more complicate class that needs to be passed it might not be a 
good fit for the API.

If I recall Haohui Mai wanted the passing of failed nodes to be very explicit 
by design. Do you have an opinion now that I've changed how failures are 
tracked Haohui? I think a reasonable middle ground might be keeping the failed 
DN tracking mechanism internal but providing a hook to ask for failed datanodes 
that were tried during the read. Optionally passing in a pointer to a vector of 
strings might work well for this.
{quote}
It is a good thing to have a method where they are passed explicitly.  I would 
hope that because we love our userbase, we also have a method where that is 
taken care of for them (both to reduce cognitive load and errors in 
re-implementing code that should be done for them).  In the HDFS-9144 
refactoring, I have the easy-bake method that just passes in a buffer, size, 
and offset (taken from the hdfs_cpp FileHandle API) while keeping a 
semi-stateless AsyncPReadSome that takes explicit values for the active 
parameters (such as the dead data nodes).  I think it's a good trade.

> Retry reads on DN failure
> -------------------------
>
>                 Key: HDFS-9103
>                 URL: https://issues.apache.org/jira/browse/HDFS-9103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Bob Hansen
>            Assignee: James Clampffer
>             Fix For: HDFS-8707
>
>         Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, 
> HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to