[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227406#comment-15227406
 ] 

Zhe Zhang commented on HDFS-7661:
---------------------------------

Thanks for the discussions [~demongaorui], [~jingzhao], [~liuml07].

Addressing Mingliang's comment first:
bq. If we manage all cellBuffers across different stripes in one file, we also 
need some kind of "undo" mechanism for rolling back to last flushed data to 
handle failure. Otherwise, managing those individual files may be nontrivial.
I was proposing to only store non-full cellBuffers on parity DNs. If a stripe 
is full, we always generate the parity cells. At that time the cellBuffers 
stored on parity DNs will be meaningless and should be deleted. So at any given 
time each parity DN will store 6 cellBuffers (for each stored parity block). In 
other words, for any given stripe, we always check if the parity cell is 
available (by checking the length of the stored parity block). If so we always 
use the parity cell itself in decoding. We use the stored cellBuffers only if 
the parity cell is not generated. I should probably draw an example. Will do 
soon.

bq. If the writer fails during the hflush operation, we have to make sure last 
flushed cellBuffer is still available
If the last flush was in the same stripe (as the current flush), the last 
flushed cellBuffers will always be available. If the last flush was in an 
earlier stripe, we'll just use the parity cells. So essentially, all files 
stored on parity DNs (including parity blocks and flushed cellBuffers) will 
only be appended to. They'll never be overwritten.

The proposal is basically to temporarily turn the blockGroup into replication 
mode.

I'm still trying to fully understand Rui's proposal using 2 versions of parity 
cells. Will post a comment soon. [~demongaorui] If you could provide some more 
details of how to use the 2 versions to handle failures that'd be very helpful.

[~jingzhao] If we store cellBuffers I think the "last time flushed data" is 
always safe because they are not overwritten. If the current flush failed you 
are always guaranteed to have the same amount or more safe (uncorrupt) data 
than the last flush.


> [umbrella] support hflush and hsync for erasure coded files
> -----------------------------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: erasure-coding
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to