[
https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704010#action_12704010
]
dhruba borthakur commented on HADOOP-2757:
------------------------------------------
You are referring to dfs.datanode.socket.write.timeout. These are configurable
parameters and I already set them to an appropriate number, e.g. 20 seconds
because I want real-timeish behaviour.
If all the datanode(s) in the pipeline die, then the client detects an error
and aborts. That is intended behaviour. If one datanode is not really dead (but
hangs), then the client will hang too. This patch does not fix that problem.
The main motivation for this patch is to detect namenode failures early. If a
client is writing to a block, it might take a while for the block to get filled
up.... this time is dependent at the rate at which the client is writing
data... if the client is trickling data into the block, it will not experience
the dfs.datanode.socket.write.timeout timeout for a while. In the existing code
in trunk, the lease recovery thread will detect NN problem after a while but it
does nothing to terminate the threads that were writing to the block. The patch
does this.
> Should DFS outputstream's close wait forever?
> ---------------------------------------------
>
> Key: HADOOP-2757
> URL: https://issues.apache.org/jira/browse/HADOOP-2757
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: dhruba borthakur
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty
> annoying for a user. Shoud the loop inside close have a timeout? If so how
> much? It could probably something like 10 minutes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.