[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973093#action_12973093
 ] 

dhruba borthakur commented on HDFS-1539:
----------------------------------------

I could make it the default, but I would like the hear the opinion of many 
people who are running hadoop clusters. Also, performance numbers could vary a 
lot based on the operating system (CentOs, Redhat, windows, ext4, xfs), etc., 
so it would be difficult to get it right based solely on performance. On the 
other hand, if the entire community thinks that it is better to have the 
default the prevents data loss at all costs, then this could be the default. If 
the debate on either side is fierce, then I would like to get this in first and 
then open another JIRA to debate the default settings.

We are definitely going to first deploy this first on our "archival" cluster. 
This is a cluster that is used purely to backup/restore data from mySQL 
databases.

> prevent data loss when a cluster suffers a power loss
> -----------------------------------------------------
>
>                 Key: HDFS-1539
>                 URL: https://issues.apache.org/jira/browse/HDFS-1539
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client, name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: syncOnClose1.txt
>
>
> we have seen an instance where a external outage caused many datanodes to 
> reboot at around the same time.  This resulted in many corrupted blocks. 
> These were recently written blocks; the current implementation of HDFS 
> Datanodes do not sync the data of a block file when the block is closed.
> 1. Have a cluster-wide config setting that causes the datanode to sync a 
> block file when a block is finalized.
> 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
> behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
> 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
> specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to