[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-11 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234667#comment-15234667
 ] 

GAO Rui commented on HDFS-7661:
---

Thanks for your comments, [~drankye],[~tlipcon],[~zhz],[~walter.k.su].

I am totally agree that we should not let this JIRA block the 3.0 release. We 
could  make hflush/hsync of EC files as TO-BE-IMPLEMENTED, and release 3.0 .   
For the future phrase, along with the new decode/encode library and hardware 
support of EC,  maybe we could use striped EC instead of replication as the 
default and major file format of HDFS. The function of hflush/hsync would 
become necessary, so [~liuml07] and I may should continue to implement 
hflush/hsync based on the latest design. We would break down this JIRA to 
several sub tasks and solved them one by one. :D

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232103#comment-15232103
 ] 

Walter Su commented on HDFS-7661:
-

Great design/discussion. Since we come back to discuss the use cases, and 
"effort vs benefit“. I'm thinking if the use cases are rare, we can provide a 
simpler workaround. We provide:
1. a fake "flush", which only flushes the full stripe, and doesn't flush the 
last partial stripe. It won't make sure every byte is safe, but it helps 
recovery logic to recover more data.
2. a real "flush". The easiest way to do this is to start a new block group. It 
makes sure the data written before the "flush" is safe and visible. It saves 
user the trouble of closing and appending the same file.

Since we support variable-length blocks, it's totally doable. I need to mention 
that the implementation of appending striped file also utilizes variable-length 
blocks. The trouble is creating too many block groups. But if there's too many 
small blocks, and if they are  adjacent in the same file, we can concatenate 
them to a bigger block, although striped blocks concatenation seems not easy 
either.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-07 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231367#comment-15231367
 ] 

Zhe Zhang commented on HDFS-7661:
-

Thanks for the helpful discussion [~drankye], [~tlipcon]. I agree that at least 
for the HBase case, hflush-on-EC is not very useful.

[~jingzhao], [~szetszwo], [~liuml07], [~demongaorui] Are you aware of other 
important use cases of hflush-on-EC? If not, I think we should target on having 
this as an unsupported feature for 3.0. After some production experience with 
EC I think we'll have more insights which could lead to a better (simpler and 
more robust) solution.

I'm also planning to make a pass on remaining subtasks under HDFS-8031 and find 
out those "3.0 blockers". For example, HDFS-9869 could add metrics keys, and we 
are also planning to change some of the EC policy names. Any comments and 
suggestions on this are very welcome.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-06 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229509#comment-15229509
 ] 

Kai Zheng commented on HDFS-7661:
-

Thanks [~tlipcon] for the quick response and insights for us!

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229490#comment-15229490
 ] 

Todd Lipcon commented on HDFS-7661:
---

I'm not super active on either HDFS or HBase anymore, but Kai asked me to take 
a look at the issue, especially with regard to his latest comment. My (slightly 
ill-informed) opinion is that he's right -- this sounds like a very complicated 
feature to get right, and maybe has minimal benefit. For HBase, the use case 
for hflush is for the WALs, which typically make up a vast minority of the disk 
space usage of the cluster. So, the benefits of EC from a space-savings 
perspective are not so large. The benefits from a throughput perspective due to 
striping sound enticing at first, but I think it's probably better addressed by 
the "multi-WAL" feature which already allows striping at the application level.

So, my gut feel is that for the first EC-supporting release it might be safest 
to not include the feature, or to do so only as an experimental feature that 
has to be enabled by a config (somewhat like dfs.support.append was, way back 
in the day when it wasnt super stable).

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-06 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229446#comment-15229446
 ] 

Kai Zheng commented on HDFS-7661:
-

Thanks all for the great discussions and hard work. Sorry for my interruption.

The proposed way that sends all cellBuffers to all parity DNs additionally 
sounds to work because all the real data in question or to be flushed are 
replicated to the parity DNs thus the parity cells can be computed on DNs on 
demand. On the other hand, this will complicate the effort to support advanced 
erasure codecs because any codec/coder will then need to aware the trick played 
here. 

I understand there is no easy solution to this problem. I'm wondering the 
involved complexity and overhead for the support may be too much because we 
don't have any practical solid use cases that requires hflush/hsync in striping 
mode. Looks like HBase workload did be mentioned, but I doubt if there is any 
benchmark that proves striping will be suitable for HBase. Any HBase fellow 
could cast some thoughts here? How about making hflush/hsync api as 
NOT-SUPPORTED or TO-BE-IMPLEMENTED for striping files? This may resolve a 
blocker for the 3.0 release, and when any practical use cases or requirement 
for such, this effort can then be revisited.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-06 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229367#comment-15229367
 ] 

Zhe Zhang commented on HDFS-7661:
-

Addressing Mingliang's analysis first:
bq.  what if the writer fails when the full stripe has not been sent to all 
parity DNs? In this case, some of the parity DN has deleted the cellBuffers so 
the last flushed data is not available any more, while the other parity DNs 
have not received the parity cell for the full stripe.
In that case, any parity DN which "have not received the parity cell for the 
full stripe" will have the cellBuffers. Note that we don't need all parity DNs 
to have the full set of cellBuffers. As long as 1 parity DN has the full set of 
cellBuffers we can recover to the state of the last successful flush.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-05 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227810#comment-15227810
 ] 

GAO Rui commented on HDFS-7661:
---

Hi [~zhz], [~liuml07] and I have discussed about {{two version cells undo log 
design}}. I have attached an illustration graph [^Undo-Log-Design-20160406.jpg].

In the design, Undo Log is consisted of three parts, first and second part used 
to store latest flushed parity cell and current flushed parity cell. The length 
of these two parts was depended on EC policy. Each part could to long enough to 
store a full cell and it's checksum. The third part of Undo Log is a list of 
flush records just the same as described in the design document, only the 
latest successfully flushed cell pointer is added. This list could be appended 
as much times as needed(Generally, this would not cause the Undo Log to be too 
big).

With the third part of Undo Log, two phrase commit mechanism could be used to 
control the data safety.
For example:
   1. The last successfully flushed cell was stored in parity-cell-1(the second 
part of Undo Log).
   2. Current flush happens.
   3. The first part of the Undo Log file updated according to current flushed 
parity cell.
   4. New record added to the third part(the record list) of Undo Log file.

For failure happens during step.3 (The first part of the Undo Log file 
updated.), we still have latest successful flushed parity cell in the second 
part of Undo Log. And the last record of the record list is pointing the second 
part as well.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227433#comment-15227433
 ] 

Mingliang Liu commented on HDFS-7661:
-

Thanks for the detailed explanation, [~zhz]. I got the idea of avoiding 
overwritten, which may simplify the problem.

I agree that once the stripe is full (for all parity DN) the cellBuffers for 
the current stripe is useless. I'm not sure I understand your idea of managing 
cellBuffer across stripes. One quick comment is that, if parity DN deletes the 
cellBuffers once it finds the current stripe is full, what if the writer fails 
when the full stripe has not been sent to all parity DNs? In this case, some of 
the parity DN has deleted the cellBuffers so the last flushed data is not 
available any more, while the other parity DNs have not received the parity 
cell for the full stripe. The reader client is not able to read either last 
flushed data or full stripe data in this case.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-05 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227406#comment-15227406
 ] 

Zhe Zhang commented on HDFS-7661:
-

Thanks for the discussions [~demongaorui], [~jingzhao], [~liuml07].

Addressing Mingliang's comment first:
bq. If we manage all cellBuffers across different stripes in one file, we also 
need some kind of "undo" mechanism for rolling back to last flushed data to 
handle failure. Otherwise, managing those individual files may be nontrivial.
I was proposing to only store non-full cellBuffers on parity DNs. If a stripe 
is full, we always generate the parity cells. At that time the cellBuffers 
stored on parity DNs will be meaningless and should be deleted. So at any given 
time each parity DN will store 6 cellBuffers (for each stored parity block). In 
other words, for any given stripe, we always check if the parity cell is 
available (by checking the length of the stored parity block). If so we always 
use the parity cell itself in decoding. We use the stored cellBuffers only if 
the parity cell is not generated. I should probably draw an example. Will do 
soon.

bq. If the writer fails during the hflush operation, we have to make sure last 
flushed cellBuffer is still available
If the last flush was in the same stripe (as the current flush), the last 
flushed cellBuffers will always be available. If the last flush was in an 
earlier stripe, we'll just use the parity cells. So essentially, all files 
stored on parity DNs (including parity blocks and flushed cellBuffers) will 
only be appended to. They'll never be overwritten.

The proposal is basically to temporarily turn the blockGroup into replication 
mode.

I'm still trying to fully understand Rui's proposal using 2 versions of parity 
cells. Will post a comment soon. [~demongaorui] If you could provide some more 
details of how to use the 2 versions to handle failures that'd be very helpful.

[~jingzhao] If we store cellBuffers I think the "last time flushed data" is 
always safe because they are not overwritten. If the current flush failed you 
are always guaranteed to have the same amount or more safe (uncorrupt) data 
than the last flush.


> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227058#comment-15227058
 ] 

Mingliang Liu commented on HDFS-7661:
-

Thanks for proposing solutions to this problem.

I agree with [~jingzhao] that there are still challenges that should be 
addressed even if we keep non-full cellBuffers on the "parity DN".  Moreover, 
for different stripe, we have to maintain multiple cellBuffers on the "parity 
DN" side as the "append" operation only happens in the same stripe. If the 
writer fails during the hflush operation, we have to  make sure last flushed 
cellBuffer is still available, which may be in a previous (different) stripe. 
If we manage all cellBuffers across different stripes in one file, we also need 
some kind of "undo" mechanism for rolling back to last flushed data to handle 
failure. Otherwise, managing those individual files may be nontrivial.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224630#comment-15224630
 ] 

Jing Zhao commented on HDFS-7661:
-

I agree with [~demongaorui]'s comment here. In general, storing data cells will 
not simplify the problem when we have multiple flushes within the same stripe. 
Starting from the 2nd flush, to "keep the last time flushed data" safe during 
the current flush, we need to have similar mechanism proposed in the current 
design.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-03 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223571#comment-15223571
 ] 

GAO Rui commented on HDFS-7661:
---

Hi [~zhz].

The reason of using two versions is we need to keep the latest flushed parity 
cells and the current flushed cells, in case of the current flush failing. We 
could still keep the safety of latest flushed datas even if the current flush 
operation failed. So the minimum request is using two partial parity cell 
files. 

I think using two versions in Parity DNS could limit both the writing and 
reading operations changes to mainly source code of datanode. For 
writing/reading client, only minor changes need to be implemented. While, 
storing data cells on parity DNs need totally different logical for 
writing/reading client implementation. 



> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222637#comment-15222637
 ] 

Zhe Zhang commented on HDFS-7661:
-

Thanks for sharing the thoughts [~demongaorui]. Having 2 versions of partial 
parity block sounds a little arbitrary (e.g. why not 1 or 3 versions).

I think the main benefit of storing data cells on parity DNs is that there's no 
risk of returning wrong data. Hence no need to undo and manage versioning. I 
think we can create a mechanism to associate the "data cell files" to the 
parity block (though file naming etc.).

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-03-30 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217583#comment-15217583
 ] 

GAO Rui commented on HDFS-7661:
---

Very creative idea, [~zhz]. {{Without any overwriting}} actually could simplify 
{{hflush/hsync}}. Inspired by your idea, I have came up with some new thoughts.

It may be a little strange to store data cells to parity DNs. Instead, maybe we 
could store IPB(Internal Parity Block) file as two parts(two seperate files). 
The first part is parity data which would not be modified. The second part is 
the flushed parity cell of the being written stripe. For the second part, we 
could keep the latest two version, for example, {{last-flushed-parity-cell-0}} 
and {{last-flushed-parity-cell-1}}. And the structure of 
{{last-flushed-parity-cell-X}} could be:  logical block group length + parity 
cell data.

So, for writing, whenever the being written stipe is been hflush/hsync, we 
replace the older {{last-flushed-parity-cell-X}} file with the new flushed 
logical block group length and new parity cell data. For reading, parity DN 
locally choose on of the two {{last-flushed-parity-cell-X}} files based on read 
client requests. 

With this kind of design we avoid {{overwriting}} IPB file, which simplify code 
implementation as well. Also we always keep the safety of the last flushed data 
by switch from two files names ({{last-flushed-parity-cell-0}} and 
{{last-flushed-parity-cell-1}}).

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-03-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217030#comment-15217030
 ] 

Zhe Zhang commented on HDFS-7661:
-

Thanks for the discussions [~demongaorui], [~liuml07], [~szetszwo].

I read through the design doc and agree that overwriting the parity blocks is 
very complex. So here's an alternative thought:
# On the high level, we don't create temporary parity blocks when {{hflush}} is 
called. Instead we can send the actual data cells to the "parity DNs".
# On the client write path, {{DFSStripedOutputStream#cellBuffers}} keeps all 
data cells before the stripe is full. So when {{hflush}} is called, client can 
transfer all {{cellBuffers}} to all parity DNs. Yes this will cause some 
additional data transfers. But the cell size is only 64KB.
# On the "parity DN", we can create special files (details to be discussed), 
each for a temporary data cell. These special files will be appended to for 
future {{hflush}} operations. Parity blocks will be operated *without any 
overwriting*.
# Client read logic needs to be extended to read special "data cell" files when 
needed. I think that means the length of the parity block is shorter than 
expected (calculated from the length of the logical block group). 
Alternatively, "parity DN" can locally apply the "data cell" files through 
encoding, and transfer the longer version of parity block to client reader.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-03-23 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209581#comment-15209581
 ] 

GAO Rui commented on HDFS-7661:
---

[~liuml07], thanks for uploading the new design doc 
[^HDFS-EC-file-flush-sync-design-v20160323.pdf]. Will try to add read client 
part and illustration figures soon. 

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)