[ 
https://issues.apache.org/jira/browse/HADOOP-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676731#action_12676731
 ] 

dhruba borthakur commented on HADOOP-3998:
------------------------------------------

My thinking is that it is bad if the client gives up too early and does not 
retry. The application will encounter an IO error if the client gives up 
prematurely.

>an ongoing pipeline recovery on the same block.

It is possible that the first attempt from the client encountered a 
ongoing-pipeline-recovery on the primary datanode. But that does not mean that 
if the client retries the recoverBlock on the newly selected primary 
(originally, the second datanode in the pipeline) that it too will encounter an 
ongoing-pipeline recovery! It is possible that the original primary is network 
partitioned from the remaining datanodes in the pipiline and the 
original-pipeline recovery never succeeded. Isn's this situation possible?

I am wondering why the need to not retry? Not retryign means that the client IO 
will fail. This is very bad, isn't it? I am assuming that ss long as there is 
some possibility of recovery, the system should try all those opportunities to 
not make the client IO fail. Especially when the tradeoff is negligible extra 
RPC overhead and that too only in error cases.

However, I like the idea of the client seeing if it is AlreadyCommitted 
execption and not retrying in that case.

> Got an exception from ClientFinalizer when the JT is terminated
> ---------------------------------------------------------------
>
>                 Key: HADOOP-3998
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3998
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Amar Kamat
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.2, 0.20.0
>
>         Attachments: closeAll.patch, closeAll.patch
>
>
> This happens when we terminate the JT using _control-C_. It throws the 
> following exception
> {noformat}
> Exception closing file my-file
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:193)
>         at org.apache.hadoop.hdfs.DFSClient.access$700(DFSClient.java:64)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2868)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2837)
>         at 
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:808)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:205)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:253)
>         at 
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1367)
>         at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:234)
>         at 
> org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:219)
> {noformat}
> Note that _my-file_ is some file used by the JT.
> Also if there is some file renaming done, then the exception states that the 
> earlier file does not exist. I am not sure if this is a MR issue or a DFS 
> issue. Opening this issue for investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to