[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234667#comment-15234667 ] GAO Rui commented on HDFS-7661: --- Thanks for your comments, [~drankye],[~tlipcon],[~zhz],[~walter.k.su]. I am totally agree that we should not let this JIRA block the 3.0 release. We could make hflush/hsync of EC files as TO-BE-IMPLEMENTED, and release 3.0 . For the future phrase, along with the new decode/encode library and hardware support of EC, maybe we could use striped EC instead of replication as the default and major file format of HDFS. The function of hflush/hsync would become necessary, so [~liuml07] and I may should continue to implement hflush/hsync based on the latest design. We would break down this JIRA to several sub tasks and solved them one by one. :D > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232103#comment-15232103 ] Walter Su commented on HDFS-7661: - Great design/discussion. Since we come back to discuss the use cases, and "effort vs benefit“. I'm thinking if the use cases are rare, we can provide a simpler workaround. We provide: 1. a fake "flush", which only flushes the full stripe, and doesn't flush the last partial stripe. It won't make sure every byte is safe, but it helps recovery logic to recover more data. 2. a real "flush". The easiest way to do this is to start a new block group. It makes sure the data written before the "flush" is safe and visible. It saves user the trouble of closing and appending the same file. Since we support variable-length blocks, it's totally doable. I need to mention that the implementation of appending striped file also utilizes variable-length blocks. The trouble is creating too many block groups. But if there's too many small blocks, and if they are adjacent in the same file, we can concatenate them to a bigger block, although striped blocks concatenation seems not easy either. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231367#comment-15231367 ] Zhe Zhang commented on HDFS-7661: - Thanks for the helpful discussion [~drankye], [~tlipcon]. I agree that at least for the HBase case, hflush-on-EC is not very useful. [~jingzhao], [~szetszwo], [~liuml07], [~demongaorui] Are you aware of other important use cases of hflush-on-EC? If not, I think we should target on having this as an unsupported feature for 3.0. After some production experience with EC I think we'll have more insights which could lead to a better (simpler and more robust) solution. I'm also planning to make a pass on remaining subtasks under HDFS-8031 and find out those "3.0 blockers". For example, HDFS-9869 could add metrics keys, and we are also planning to change some of the EC policy names. Any comments and suggestions on this are very welcome. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229509#comment-15229509 ] Kai Zheng commented on HDFS-7661: - Thanks [~tlipcon] for the quick response and insights for us! > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229490#comment-15229490 ] Todd Lipcon commented on HDFS-7661: --- I'm not super active on either HDFS or HBase anymore, but Kai asked me to take a look at the issue, especially with regard to his latest comment. My (slightly ill-informed) opinion is that he's right -- this sounds like a very complicated feature to get right, and maybe has minimal benefit. For HBase, the use case for hflush is for the WALs, which typically make up a vast minority of the disk space usage of the cluster. So, the benefits of EC from a space-savings perspective are not so large. The benefits from a throughput perspective due to striping sound enticing at first, but I think it's probably better addressed by the "multi-WAL" feature which already allows striping at the application level. So, my gut feel is that for the first EC-supporting release it might be safest to not include the feature, or to do so only as an experimental feature that has to be enabled by a config (somewhat like dfs.support.append was, way back in the day when it wasnt super stable). > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229446#comment-15229446 ] Kai Zheng commented on HDFS-7661: - Thanks all for the great discussions and hard work. Sorry for my interruption. The proposed way that sends all cellBuffers to all parity DNs additionally sounds to work because all the real data in question or to be flushed are replicated to the parity DNs thus the parity cells can be computed on DNs on demand. On the other hand, this will complicate the effort to support advanced erasure codecs because any codec/coder will then need to aware the trick played here. I understand there is no easy solution to this problem. I'm wondering the involved complexity and overhead for the support may be too much because we don't have any practical solid use cases that requires hflush/hsync in striping mode. Looks like HBase workload did be mentioned, but I doubt if there is any benchmark that proves striping will be suitable for HBase. Any HBase fellow could cast some thoughts here? How about making hflush/hsync api as NOT-SUPPORTED or TO-BE-IMPLEMENTED for striping files? This may resolve a blocker for the 3.0 release, and when any practical use cases or requirement for such, this effort can then be revisited. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229367#comment-15229367 ] Zhe Zhang commented on HDFS-7661: - Addressing Mingliang's analysis first: bq. what if the writer fails when the full stripe has not been sent to all parity DNs? In this case, some of the parity DN has deleted the cellBuffers so the last flushed data is not available any more, while the other parity DNs have not received the parity cell for the full stripe. In that case, any parity DN which "have not received the parity cell for the full stripe" will have the cellBuffers. Note that we don't need all parity DNs to have the full set of cellBuffers. As long as 1 parity DN has the full set of cellBuffers we can recover to the state of the last successful flush. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227810#comment-15227810 ] GAO Rui commented on HDFS-7661: --- Hi [~zhz], [~liuml07] and I have discussed about {{two version cells undo log design}}. I have attached an illustration graph [^Undo-Log-Design-20160406.jpg]. In the design, Undo Log is consisted of three parts, first and second part used to store latest flushed parity cell and current flushed parity cell. The length of these two parts was depended on EC policy. Each part could to long enough to store a full cell and it's checksum. The third part of Undo Log is a list of flush records just the same as described in the design document, only the latest successfully flushed cell pointer is added. This list could be appended as much times as needed(Generally, this would not cause the Undo Log to be too big). With the third part of Undo Log, two phrase commit mechanism could be used to control the data safety. For example: 1. The last successfully flushed cell was stored in parity-cell-1(the second part of Undo Log). 2. Current flush happens. 3. The first part of the Undo Log file updated according to current flushed parity cell. 4. New record added to the third part(the record list) of Undo Log file. For failure happens during step.3 (The first part of the Undo Log file updated.), we still have latest successful flushed parity cell in the second part of Undo Log. And the last record of the record list is pointing the second part as well. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227433#comment-15227433 ] Mingliang Liu commented on HDFS-7661: - Thanks for the detailed explanation, [~zhz]. I got the idea of avoiding overwritten, which may simplify the problem. I agree that once the stripe is full (for all parity DN) the cellBuffers for the current stripe is useless. I'm not sure I understand your idea of managing cellBuffer across stripes. One quick comment is that, if parity DN deletes the cellBuffers once it finds the current stripe is full, what if the writer fails when the full stripe has not been sent to all parity DNs? In this case, some of the parity DN has deleted the cellBuffers so the last flushed data is not available any more, while the other parity DNs have not received the parity cell for the full stripe. The reader client is not able to read either last flushed data or full stripe data in this case. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227406#comment-15227406 ] Zhe Zhang commented on HDFS-7661: - Thanks for the discussions [~demongaorui], [~jingzhao], [~liuml07]. Addressing Mingliang's comment first: bq. If we manage all cellBuffers across different stripes in one file, we also need some kind of "undo" mechanism for rolling back to last flushed data to handle failure. Otherwise, managing those individual files may be nontrivial. I was proposing to only store non-full cellBuffers on parity DNs. If a stripe is full, we always generate the parity cells. At that time the cellBuffers stored on parity DNs will be meaningless and should be deleted. So at any given time each parity DN will store 6 cellBuffers (for each stored parity block). In other words, for any given stripe, we always check if the parity cell is available (by checking the length of the stored parity block). If so we always use the parity cell itself in decoding. We use the stored cellBuffers only if the parity cell is not generated. I should probably draw an example. Will do soon. bq. If the writer fails during the hflush operation, we have to make sure last flushed cellBuffer is still available If the last flush was in the same stripe (as the current flush), the last flushed cellBuffers will always be available. If the last flush was in an earlier stripe, we'll just use the parity cells. So essentially, all files stored on parity DNs (including parity blocks and flushed cellBuffers) will only be appended to. They'll never be overwritten. The proposal is basically to temporarily turn the blockGroup into replication mode. I'm still trying to fully understand Rui's proposal using 2 versions of parity cells. Will post a comment soon. [~demongaorui] If you could provide some more details of how to use the 2 versions to handle failures that'd be very helpful. [~jingzhao] If we store cellBuffers I think the "last time flushed data" is always safe because they are not overwritten. If the current flush failed you are always guaranteed to have the same amount or more safe (uncorrupt) data than the last flush. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227058#comment-15227058 ] Mingliang Liu commented on HDFS-7661: - Thanks for proposing solutions to this problem. I agree with [~jingzhao] that there are still challenges that should be addressed even if we keep non-full cellBuffers on the "parity DN". Moreover, for different stripe, we have to maintain multiple cellBuffers on the "parity DN" side as the "append" operation only happens in the same stripe. If the writer fails during the hflush operation, we have to make sure last flushed cellBuffer is still available, which may be in a previous (different) stripe. If we manage all cellBuffers across different stripes in one file, we also need some kind of "undo" mechanism for rolling back to last flushed data to handle failure. Otherwise, managing those individual files may be nontrivial. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224630#comment-15224630 ] Jing Zhao commented on HDFS-7661: - I agree with [~demongaorui]'s comment here. In general, storing data cells will not simplify the problem when we have multiple flushes within the same stripe. Starting from the 2nd flush, to "keep the last time flushed data" safe during the current flush, we need to have similar mechanism proposed in the current design. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223571#comment-15223571 ] GAO Rui commented on HDFS-7661: --- Hi [~zhz]. The reason of using two versions is we need to keep the latest flushed parity cells and the current flushed cells, in case of the current flush failing. We could still keep the safety of latest flushed datas even if the current flush operation failed. So the minimum request is using two partial parity cell files. I think using two versions in Parity DNS could limit both the writing and reading operations changes to mainly source code of datanode. For writing/reading client, only minor changes need to be implemented. While, storing data cells on parity DNs need totally different logical for writing/reading client implementation. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222637#comment-15222637 ] Zhe Zhang commented on HDFS-7661: - Thanks for sharing the thoughts [~demongaorui]. Having 2 versions of partial parity block sounds a little arbitrary (e.g. why not 1 or 3 versions). I think the main benefit of storing data cells on parity DNs is that there's no risk of returning wrong data. Hence no need to undo and manage versioning. I think we can create a mechanism to associate the "data cell files" to the parity block (though file naming etc.). > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217583#comment-15217583 ] GAO Rui commented on HDFS-7661: --- Very creative idea, [~zhz]. {{Without any overwriting}} actually could simplify {{hflush/hsync}}. Inspired by your idea, I have came up with some new thoughts. It may be a little strange to store data cells to parity DNs. Instead, maybe we could store IPB(Internal Parity Block) file as two parts(two seperate files). The first part is parity data which would not be modified. The second part is the flushed parity cell of the being written stripe. For the second part, we could keep the latest two version, for example, {{last-flushed-parity-cell-0}} and {{last-flushed-parity-cell-1}}. And the structure of {{last-flushed-parity-cell-X}} could be: logical block group length + parity cell data. So, for writing, whenever the being written stipe is been hflush/hsync, we replace the older {{last-flushed-parity-cell-X}} file with the new flushed logical block group length and new parity cell data. For reading, parity DN locally choose on of the two {{last-flushed-parity-cell-X}} files based on read client requests. With this kind of design we avoid {{overwriting}} IPB file, which simplify code implementation as well. Also we always keep the safety of the last flushed data by switch from two files names ({{last-flushed-parity-cell-0}} and {{last-flushed-parity-cell-1}}). > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217030#comment-15217030 ] Zhe Zhang commented on HDFS-7661: - Thanks for the discussions [~demongaorui], [~liuml07], [~szetszwo]. I read through the design doc and agree that overwriting the parity blocks is very complex. So here's an alternative thought: # On the high level, we don't create temporary parity blocks when {{hflush}} is called. Instead we can send the actual data cells to the "parity DNs". # On the client write path, {{DFSStripedOutputStream#cellBuffers}} keeps all data cells before the stripe is full. So when {{hflush}} is called, client can transfer all {{cellBuffers}} to all parity DNs. Yes this will cause some additional data transfers. But the cell size is only 64KB. # On the "parity DN", we can create special files (details to be discussed), each for a temporary data cell. These special files will be appended to for future {{hflush}} operations. Parity blocks will be operated *without any overwriting*. # Client read logic needs to be extended to read special "data cell" files when needed. I think that means the length of the parity block is shorter than expected (calculated from the length of the logical block group). Alternatively, "parity DN" can locally apply the "data cell" files through encoding, and transfer the longer version of parity block to client reader. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209581#comment-15209581 ] GAO Rui commented on HDFS-7661: --- [~liuml07], thanks for uploading the new design doc [^HDFS-EC-file-flush-sync-design-v20160323.pdf]. Will try to add read client part and illustration figures soon. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)