[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1539:
-----------------------------------

    Attachment: syncOnClose1.txt

Here is a patch then makes the datanode flush and sync all data and metadata of 
a block file to disk when the block is closed. This occurs only if 
dfs.datanode.synconclose is set to true. The default value of 
dfs.datanode.synconclose is false.

If the admin does not set any value for the new config parameter, then the 
behaviour of the datanode stys the same as it is prior to this patch.

> prevent data loss when a cluster suffers a power loss
> -----------------------------------------------------
>
>                 Key: HDFS-1539
>                 URL: https://issues.apache.org/jira/browse/HDFS-1539
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client, name-node
>            Reporter: dhruba borthakur
>         Attachments: syncOnClose1.txt
>
>
> we have seen an instance where a external outage caused many datanodes to 
> reboot at around the same time.  This resulted in many corrupted blocks. 
> These were recently written blocks; the current implementation of HDFS 
> Datanodes do not sync the data of a block file when the block is closed.
> 1. Have a cluster-wide config setting that causes the datanode to sync a 
> block file when a block is finalized.
> 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
> behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
> 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
> specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to