[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

Ayush Saxena (Jira) Wed, 15 Jan 2020 11:26:45 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016251#comment-17016251
 ]


Ayush Saxena commented on HDFS-15112:
-------------------------------------

bq. Any thoughts on RetriableException and StandbyException?
I think we should habdle these two also.

bq. That code in getCreateLocation() is only to check if the file actually 
exists, I think for that particular case we should be fine.
If a cluster is unavailable, and the file exists there? Then when the cluster 
comes up, there would be two files one in the old one and one that got created, 
Without the entry being fault tolerant.
{{getExistingLocation}} calls {{invokeConcurrent}} and we changed in 
{{invokeSequential}}, Does it affect here? 

I think the problem we tried to fix for {{invokeSequential}} is also there in 
{{invokeConcurrent}} too, because which the test failed randomly here :


{code:java}

    // Throw the exception for the first location if there are no results
    if (ret.isEmpty()) {
      final RemoteResult<T, R> result = results.get(0);
      if (result.hasException()) {
        throw result.getException();
      }
    }
{code}

In case the the file is not there, there won't be any result, and if one NS 
isn't available, one result would be having {{UnavailableException}}, if that 
tends to be the first one, the {{UnavailableException}} would be thrown and 
write will fail, but if it isn't the first one {{FileNotFound}} from other NS 
will be thrown thus write would be success. 

I don't think the test failed because of changes here, May be a random failure 
because with v07 the test passes at my local. Give a check, if I am catching it 
correct, you can handle the {{invokeConcurrent}} one in different JIRA too.

> RBF: Do not return FileNotFoundException when a subcluster is unavailable 
> --------------------------------------------------------------------------
>
>                 Key: HDFS-15112
>                 URL: https://issues.apache.org/jira/browse/HDFS-15112
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, 
> HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, 
> HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, 
> HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

Reply via email to