[jira] Commented: (HADOOP-1707) DFS client can allow user to write data to the next block while uploading previous block to HDFS

Doug Cutting (JIRA) Wed, 10 Oct 2007 14:54:11 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533885
 ]


Doug Cutting commented on HADOOP-1707:
--------------------------------------

bq. If a datanode fails to write a buffer to its disk, it is reported back to 
the client. The client removes this datanode from the pipeline and continues to 
write to the remaining two datanodes. [ ... ] When the file is closed, the 
under-replicated blocks will be replicated by the namenode.

I think the more typical failure mode will be a timeout.  I'm also still not 
sure of the answer to my question: if the first datanode in the pipeline times 
out, does the write fail, throwing an exception to the client?  Or does the 
client route around the first datanode in the pipeline and continue until all 
datanodes in the pipeline time out?  If so, how can it be sure that the other 
datanodes have received their copies of prior chunks from the first datanode in 
the pipeline?

Also, HADOOP-1927 states that we should fail as soon as any element in the 
pipeline fails.  Do you agree?  Currently this would be invisible to clients, 
since the entire block can be replayed to a new pipeline.  But, without a local 
file, this would force us to fail the write when any element of the pipeline 
fails.  Thoughts?

> DFS client can allow user to write data to the next block while uploading 
> previous block to HDFS
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The DFS client currently uses a staging file on local disk to cache all 
> user-writes to a file. When the staging file accumulates 1 block worth of 
> data, its contents are flushed to a HDFS datanode. These operations occur 
> sequentially.
> A simple optimization of allowing the user to write to another staging file 
> while simultaneously uploading the contents of the first staging file to HDFS 
> will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1707) DFS client can allow user to write data to the next block while uploading previous block to HDFS

Reply via email to