[ https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992443#comment-14992443 ]
Bob Hansen commented on HDFS-9103: ---------------------------------- {quote} "I would like to see the FileHandle::Pread method implement the retry logic internally so we have a simple "read all this data or completely fail" method rather than forcing partial read and retry onto our consumer. Understanding the logic that these errors mean you should retry, but this error means that you shouldn't retry could be abstracted away as a kindness to the consumer." I agree with this. I think the BadDataNodeTracker should be part of the filesystem; it seems like it complicates the API to have the user declare it. With the set<string> for exclusion I think it was reasonable to pass in but now that it's a more complicate class that needs to be passed it might not be a good fit for the API. If I recall Haohui Mai wanted the passing of failed nodes to be very explicit by design. Do you have an opinion now that I've changed how failures are tracked Haohui? I think a reasonable middle ground might be keeping the failed DN tracking mechanism internal but providing a hook to ask for failed datanodes that were tried during the read. Optionally passing in a pointer to a vector of strings might work well for this. {quote} It is a good thing to have a method where they are passed explicitly. I would hope that because we love our userbase, we also have a method where that is taken care of for them (both to reduce cognitive load and errors in re-implementing code that should be done for them). In the HDFS-9144 refactoring, I have the easy-bake method that just passes in a buffer, size, and offset (taken from the hdfs_cpp FileHandle API) while keeping a semi-stateless AsyncPReadSome that takes explicit values for the active parameters (such as the dead data nodes). I think it's a good trade. > Retry reads on DN failure > ------------------------- > > Key: HDFS-9103 > URL: https://issues.apache.org/jira/browse/HDFS-9103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: Bob Hansen > Assignee: James Clampffer > Fix For: HDFS-8707 > > Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, > HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, > HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, > HDFS-9103.HDFS-8707.5.patch > > > When AsyncPreadSome fails, add the failed DataNode to the excluded list and > try again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)