[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227406#comment-15227406 ]
Zhe Zhang commented on HDFS-7661: --------------------------------- Thanks for the discussions [~demongaorui], [~jingzhao], [~liuml07]. Addressing Mingliang's comment first: bq. If we manage all cellBuffers across different stripes in one file, we also need some kind of "undo" mechanism for rolling back to last flushed data to handle failure. Otherwise, managing those individual files may be nontrivial. I was proposing to only store non-full cellBuffers on parity DNs. If a stripe is full, we always generate the parity cells. At that time the cellBuffers stored on parity DNs will be meaningless and should be deleted. So at any given time each parity DN will store 6 cellBuffers (for each stored parity block). In other words, for any given stripe, we always check if the parity cell is available (by checking the length of the stored parity block). If so we always use the parity cell itself in decoding. We use the stored cellBuffers only if the parity cell is not generated. I should probably draw an example. Will do soon. bq. If the writer fails during the hflush operation, we have to make sure last flushed cellBuffer is still available If the last flush was in the same stripe (as the current flush), the last flushed cellBuffers will always be available. If the last flush was in an earlier stripe, we'll just use the parity cells. So essentially, all files stored on parity DNs (including parity blocks and flushed cellBuffers) will only be appended to. They'll never be overwritten. The proposal is basically to temporarily turn the blockGroup into replication mode. I'm still trying to fully understand Rui's proposal using 2 versions of parity cells. Will post a comment soon. [~demongaorui] If you could provide some more details of how to use the 2 versions to handle failures that'd be very helpful. [~jingzhao] If we store cellBuffers I think the "last time flushed data" is always safe because they are not overwritten. If the current flush failed you are always guaranteed to have the same amount or more safe (uncorrupt) data than the last flush. > [umbrella] support hflush and hsync for erasure coded files > ----------------------------------------------------------- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding > Reporter: Tsz Wo Nicholas Sze > Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)