[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540308 ]
dhruba borthakur commented on HADOOP-1707: ------------------------------------------ In the current trunk, the first datanode in the pipeline sets a timeout of 2 minutes. The second datanode sets a timeout of 1 minute, and so on. If a datanode does not receive a response from a downstream datanode within this timeout period, it declared the downsteam data as dead. In this patch for removing the client-side-disk buffer, the connection between datanodes in the pipeline could remain open for extended periods of time, especially for clients that are producing output slowly. I propose that we change the timeouts to behave as follows: 1. Each datanode in the pipeline has the same timeout of 1 minute. If a datanode does not receive a response from a downstream datanode in 1 minute, it declares the downstream datanode as dead. 2. Each datanode sends a heartbeat message to the upstream datanode once every half-timeout-period. > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: clientDiskBuffer.patch, clientDiskBuffer2.patch > > > The DFS client currently uses a staging file on local disk to cache all > user-writes to a file. When the staging file accumulates 1 block worth of > data, its contents are flushed to a HDFS datanode. These operations occur > sequentially. > A simple optimization of allowing the user to write to another staging file > while simultaneously uploading the contents of the first staging file to HDFS > will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.