[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469952#comment-13469952
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
----------------------------------------------

{quote}
For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged 
data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss 
of acknowledged data)
{quote}
Why API4 is needed for HBase?

As everyone known, there are usually 3 replicas in HDFS.  If only one of the 
datanodes is killed, the data is still available in the other two datanodes.  
That's why we have invented "hflush" (i.e. API 3) in HDFS-265.
                
> Fix hsync and hflush semantics.
> -------------------------------
>
>                 Key: HDFS-3979
>                 URL: https://issues.apache.org/jira/browse/HDFS-3979
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to