[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016251#comment-17016251 ]
Ayush Saxena commented on HDFS-15112: ------------------------------------- bq. Any thoughts on RetriableException and StandbyException? I think we should habdle these two also. bq. That code in getCreateLocation() is only to check if the file actually exists, I think for that particular case we should be fine. If a cluster is unavailable, and the file exists there? Then when the cluster comes up, there would be two files one in the old one and one that got created, Without the entry being fault tolerant. {{getExistingLocation}} calls {{invokeConcurrent}} and we changed in {{invokeSequential}}, Does it affect here? I think the problem we tried to fix for {{invokeSequential}} is also there in {{invokeConcurrent}} too, because which the test failed randomly here : {code:java} // Throw the exception for the first location if there are no results if (ret.isEmpty()) { final RemoteResult<T, R> result = results.get(0); if (result.hasException()) { throw result.getException(); } } {code} In case the the file is not there, there won't be any result, and if one NS isn't available, one result would be having {{UnavailableException}}, if that tends to be the first one, the {{UnavailableException}} would be thrown and write will fail, but if it isn't the first one {{FileNotFound}} from other NS will be thrown thus write would be success. I don't think the test failed because of changes here, May be a random failure because with v07 the test passes at my local. Give a check, if I am catching it correct, you can handle the {{invokeConcurrent}} one in different JIRA too. > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -------------------------------------------------------------------------- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Íñigo Goiri > Assignee: Íñigo Goiri > Priority: Major > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org