[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490841#comment-13490841 ] Tsz Wo (Nicholas), SZE commented on HDFS-3979: -- Thanks for the update, Lars. +1 patch looks good. I will commit it if there is no more comments. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt, hdfs-3979-v4.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490887#comment-13490887 ] Luke Lu commented on HDFS-3979: --- bq. I think it will decrease the performance for non-sync write. It'll be nice if we can show/quantify the decrease in performance for non-sync writes. It may not be wise to introduce complexity and make hflush less robust if this is a non-issue. bq. The existing tests: TestFiPipelines and TestFiHFlush do not cover the other scenarios you worry about? It seems that TestFiHFlush doesn't cover the failure scenarios. All the test cases are positive assertions (pipeline can recover in spite of disk error exceptions), which seems not very useful given the ack is done before the disk error exceptions are triggered. A new TestFiHSync seems necessary especially for the new patch, where the ack code path diverged from hflush. Basically, I want to make sure that hsync would be guaranteed to get an error if the pipeline cannot be recovered (e.g., due to required datanodes ran out of disk space etc). Anyway, I'm fine with filing another jira for these hflush/hsync improvement. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt, hdfs-3979-v4.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490268#comment-13490268 ] Tsz Wo (Nicholas), SZE commented on HDFS-3979: -- Sure, will check the patch. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490285#comment-13490285 ] Tsz Wo (Nicholas), SZE commented on HDFS-3979: -- The patch moves ack to the end in order to fix the sync semantics. I think it will decrease the performance for non-sync write. How about keeping enqueue early when syncBlock == false? Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490435#comment-13490435 ] Kan Zhang commented on HDFS-3979: - bq. I think it will decrease the performance for non-sync write. I'd welcome some clarity on whether writing to OS buffers is a real concern here. bq. How about keeping enqueue early when syncBlock == false? To be on the conservative side, I'm OK with this. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490445#comment-13490445 ] Lars Hofhansl commented on HDFS-3979: - I'll make that change. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490069#comment-13490069 ] Kan Zhang commented on HDFS-3979: - bq. This little change makes TestHSync fail most of the time - without the rest of the patch, and never with this patch. Lars, I don't quite understand your above comment. What's the behavior of TestHSync with and w/o your latest patch? Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490113#comment-13490113 ] Lars Hofhansl commented on HDFS-3979: - Hi Kan, the only difference between v2 and v3 is that in v3 the fsync metric is updated after the actual sync to the FS (BlockReceiver.flushOrSync). This exposes the race condition we want to fix and makes TestHSync fail almost every run (the client return from hsync before the datanode could update the metric). With the rest of this patch applies this race is removed and TestHSync never fails. So now we have a test case for the race condition. [~vicaya] The existing tests: TestFiPipelines and TestFiHFlush do not cover the other scenarios you worry about? Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490121#comment-13490121 ] Kan Zhang commented on HDFS-3979: - +1 Thanks, Lars. Patch looks good to me. Nicholas, would appreciate if you could also take a look. Thx! Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489609#comment-13489609 ] Luke Lu commented on HDFS-3979: --- The patch lgtm, even though it lacks tests for failure cases for hsync. bq. This issue is a blocker for HBASE-5954, it would be better resolve asap You can help by testing the patch and show some results. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489212#comment-13489212 ] liang xie commented on HDFS-3979: - Is there any objections on it, or more comments ? This issue is a blocker for HBASE-5954, it would be better resolve asap:) Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475546#comment-13475546 ] Hadoop QA commented on HDFS-3979: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549002/hdfs-3979-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3327//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3327//console This message is automatically generated. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470823#comment-13470823 ] Kan Zhang commented on HDFS-3979: - bq. Why API4 is needed for HBase? API3 or API4, it probably doesn't make a huge difference, IMHO. On the other hand, assuming the performance penalty of going from API3 to API4 is negligible, it's probably not worth complicating the code to support API3 (instead of API4). bq. Lastly, we can play with this. For example only one of the replicas could sync to disk and the other's just guarantee the data in the OS buffers (API4.5 ). Yes, it would be very interesting to see if it saves to sync only the local replica or acknowledge to the client upon the first successful sync of any replica. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469628#comment-13469628 ] Luke Lu commented on HDFS-3979: --- bq. You don't think the existing pipeline tests cover the failure scenarios? Given the existing hflush/hsync semantics (ack can reach client before any pipeline exceptions), I don't think the new semantics is covered by existing tests. I'm worried about the race between the ack and write errors. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469671#comment-13469671 ] Lars Hofhansl commented on HDFS-3979: - I've seen that race when I write a test for HDFS-744. I fixed it there by updating the metrics first... Ugh :) I think I can make a test that fails at least with reasonable probability with the current semantics. The race between ack and write errors should be reduced (eliminated) with this patch. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469712#comment-13469712 ] Kan Zhang commented on HDFS-3979: - bq. The race between ack and write errors should be reduced (eliminated) with this patch. It should be eliminated with this patch. When there is write error, ack will not be queued. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469725#comment-13469725 ] Luke Lu commented on HDFS-3979: --- bq. It should be eliminated with this patch. When there is write error, ack will not be queued. I think so too, but it'll be nice to have a test to cover the case for future maintenance/refactor. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469827#comment-13469827 ] Lars Hofhansl commented on HDFS-3979: - Thanks Luke and Kan. I'll come up with a test once I get some spare cycles (quite busy with HBase atm). Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469952#comment-13469952 ] Tsz Wo (Nicholas), SZE commented on HDFS-3979: -- {quote} For applications like HBase we'd like API4 as well as API5. (API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss of acknowledged data) {quote} Why API4 is needed for HBase? As everyone known, there are usually 3 replicas in HDFS. If only one of the datanodes is killed, the data is still available in the other two datanodes. That's why we have invented hflush (i.e. API 3) in HDFS-265. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469978#comment-13469978 ] Luke Lu commented on HDFS-3979: --- bq. Why API4 is needed for HBase? Many configuration management system (simplest: pdsh -a hadoop-daemon.sh stop datanode) shutdown/restart HDFS by kill -9 datanodes in parallel. Having to acquiesce any OLTP like workload is error prone. How about a simple ops error: pdsh -a killall -9 java to the wrong window (hence the wrong cluster). IMO, API4 is not robust enough for HBase. Unless the performance difference is huge ( 20% for hflush), which I doubt, it's not worth the risk, again IMO. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470016#comment-13470016 ] Lars Hofhansl commented on HDFS-3979: - API4 is hflush (with change in OS buffers). That's an interesting discussion by itself. hsync'ing every edit in HBase is prohibitive. I have some simple numbers in HBASE-5954. Although, I need to do that test again with the sync_file_range changes in HDFS-2465 (that would hopefully do most of the data sync'ing asynchronously and only sync the last changes and metadata synchronously upon client request). Many applications do not need every edit to be guaranteed on disk, but have sync points. That is what I am aiming for in HBase. The application will know the specific semantics. What is really important for HBase (IMHO) is that every block is synced to disk when it is closed. HBase constantly rewrites existing data via compactions so without syncing arbitrarily old data can be lost during a rack or DC outage. Lastly, we can play with this. For example only one of the replicas could sync to disk and the other's just guarantee the data in the OS buffers (API4.5 :) ). Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467882#comment-13467882 ] Lars Hofhansl commented on HDFS-3979: - You don't think the existing pipeline tests cover the failure scenarios? I see if I can get some performance numbers. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467270#comment-13467270 ] Lars Hofhansl commented on HDFS-3979: - Do we want this change? Seems to me that HDFS-265 broke hsync/hflush and this would fix it. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467292#comment-13467292 ] Luke Lu commented on HDFS-3979: --- bq. Do we want the change? I do think that the change is required for the correct hsync semantics (and better hflush guarantee). I'm not too sure if the change is complete without some reasonable test cases for failure scenarios. BTW, do you have any new performance numbers for comparison as well? Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465915#comment-13465915 ] Lars Hofhansl commented on HDFS-3979: - Enqueing the seqno at end seems like the best approach. (Indeed this is done in the 0.20.x code as both of you said). I wonder why this was changed? Will have a new patch momentarily. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465987#comment-13465987 ] Hadoop QA commented on HDFS-3979: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547049/hdfs-3979-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3247//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3247//console This message is automatically generated. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466082#comment-13466082 ] Kan Zhang commented on HDFS-3979: - bq. I wonder why this was changed? My guess is HDFS-265 intends to implement API3 rather than API4. https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12710542page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12710542 Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466113#comment-13466113 ] Lars Hofhansl commented on HDFS-3979: - I see. Thanks Kan. So now we we have API4 and (with HDFS-744) API5. For applications like HBase we'd like API4 as well as API5. (API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss of acknowledged data) Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465331#comment-13465331 ] Kan Zhang commented on HDFS-3979: - I agree it's probably not a good idea to enqueue in a finally block. The original code before HDFS-265 didn't do that either. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465386#comment-13465386 ] Lars Hofhansl commented on HDFS-3979: - Should we simply do the enqueue at the end of receivePacket(), then? So just to make sure: In the current code the seqno is already enqueued in the beginning, so if there's an exception later in the code it won't have any effect on the enqued seqno. The finally is just preserves this existing behavior. What happens when there is an exception and the seqno is never enqueued? (and if that is OK, why is it not a problem now.) Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464115#comment-13464115 ] Kan Zhang commented on HDFS-3979: - Thanks, Lars! BTW, you spell my name wrong. :-) Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that DN loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464133#comment-13464133 ] Lars Hofhansl commented on HDFS-3979: - (and sorry for misspelling you name) Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that DN loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464359#comment-13464359 ] Hadoop QA commented on HDFS-3979: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546734/hdfs-3979-sketch.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3239//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3239//console This message is automatically generated. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464402#comment-13464402 ] Luke Lu commented on HDFS-3979: --- TestHSync only tests the success code path, which makes me a bit nervous, as I'm not sure if putting the ack enqueue in the finally block is the right thing to do. I think you want the pipeline to fail and restart if there is an io exception. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464410#comment-13464410 ] Lars Hofhansl commented on HDFS-3979: - I'm not sure either. I am trying not to change the existing behavior. The enqueue used to happen in the beginning of receivePacket(...), so if that latter part of the method fails the ack would already be enqueued. Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: hdfs-3979-sketch.txt See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463515#comment-13463515 ] Lars Hofhansl commented on HDFS-3979: - Also see my comment here: https://issues.apache.org/jira/browse/HDFS-744?focusedCommentId=13279619page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13279619 Fix hsync and hflush semantics. --- Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that DN loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira