[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490841#comment-13490841
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
--

Thanks for the update, Lars.
+1 patch looks good.  I will commit it if there is no more comments.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
 hdfs-3979-v3.txt, hdfs-3979-v4.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-05 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490887#comment-13490887
 ] 

Luke Lu commented on HDFS-3979:
---

bq. I think it will decrease the performance for non-sync write.

It'll be nice if we can show/quantify the decrease in performance for non-sync 
writes. It may not be wise to introduce complexity and make hflush less robust 
if this is a non-issue.

bq. The existing tests: TestFiPipelines and TestFiHFlush do not cover the other 
scenarios you worry about?

It seems that TestFiHFlush doesn't cover the failure scenarios. All the test 
cases are positive assertions (pipeline can recover in spite of disk error 
exceptions), which seems not very useful given the ack is done before the disk 
error exceptions are triggered. A new TestFiHSync seems necessary especially 
for the new patch, where the ack code path diverged from hflush. Basically, I 
want to make sure that hsync would be guaranteed to get an error if the 
pipeline cannot be recovered (e.g., due to required datanodes ran out of disk 
space etc).

Anyway, I'm fine with filing another jira for these hflush/hsync improvement. 



 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
 hdfs-3979-v3.txt, hdfs-3979-v4.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490268#comment-13490268
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
--

Sure, will check the patch.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490285#comment-13490285
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
--

The patch moves ack to the end in order to fix the sync semantics.  I think it 
will decrease the performance for non-sync write.  How about keeping enqueue 
early when syncBlock == false?

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490435#comment-13490435
 ] 

Kan Zhang commented on HDFS-3979:
-

bq. I think it will decrease the performance for non-sync write. 

I'd welcome some clarity on whether writing to OS buffers is a real concern 
here.

bq. How about keeping enqueue early when syncBlock == false?

To be on the conservative side, I'm OK with this.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490445#comment-13490445
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'll make that change.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-03 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490069#comment-13490069
 ] 

Kan Zhang commented on HDFS-3979:
-

bq. This little change makes TestHSync fail most of the time - without the rest 
of the patch, and never with this patch.

Lars, I don't quite understand your above comment. What's the behavior of 
TestHSync with and w/o your latest patch?

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490113#comment-13490113
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Hi Kan, the only difference between v2 and v3 is that in v3 the fsync metric 
is updated after the actual sync to the FS (BlockReceiver.flushOrSync).

This exposes the race condition we want to fix and makes TestHSync fail almost 
every run (the client return from hsync before the datanode could update the 
metric). With the rest of this patch applies this race is removed and TestHSync 
never fails.

So now we have a test case for the race condition.

[~vicaya] The existing tests: TestFiPipelines and TestFiHFlush do not cover the 
other scenarios you worry about?


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-03 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490121#comment-13490121
 ] 

Kan Zhang commented on HDFS-3979:
-

+1

Thanks, Lars. Patch looks good to me.

Nicholas, would appreciate if you could also take a look. Thx!

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-02 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489609#comment-13489609
 ] 

Luke Lu commented on HDFS-3979:
---

The patch lgtm, even though it lacks tests for failure cases for hsync.

bq. This issue is a blocker for HBASE-5954, it would be better resolve asap

You can help by testing the patch and show some results.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-01 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489212#comment-13489212
 ] 

liang xie commented on HDFS-3979:
-

Is there any objections on it, or more comments ? 
This issue is a blocker for HBASE-5954, it would be better resolve asap:)

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475546#comment-13475546
 ] 

Hadoop QA commented on HDFS-3979:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12549002/hdfs-3979-v3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3327//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3327//console

This message is automatically generated.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-05 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470823#comment-13470823
 ] 

Kan Zhang commented on HDFS-3979:
-

bq. Why API4 is needed for HBase?

API3 or API4, it probably doesn't make a huge difference, IMHO. On the other 
hand, assuming the performance penalty of going from API3 to API4 is 
negligible, it's probably not worth complicating the code to support API3 
(instead of API4).

bq. Lastly, we can play with this. For example only one of the replicas could 
sync to disk and the other's just guarantee the data in the OS buffers (API4.5  
).

Yes, it would be very interesting to see if it saves to sync only the local 
replica or acknowledge to the client upon the first successful sync of any 
replica.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469628#comment-13469628
 ] 

Luke Lu commented on HDFS-3979:
---

bq. You don't think the existing pipeline tests cover the failure scenarios?

Given the existing hflush/hsync semantics (ack can reach client before any 
pipeline exceptions), I don't think the new semantics is covered by existing 
tests. I'm worried about the race between the ack and write errors.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469671#comment-13469671
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I've seen that race when I write a test for HDFS-744. I fixed it there by 
updating the metrics first... Ugh :)

I think I can make a test that fails at least with reasonable probability with 
the current semantics.

The race between ack and write errors should be reduced (eliminated) with this 
patch.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469712#comment-13469712
 ] 

Kan Zhang commented on HDFS-3979:
-

bq. The race between ack and write errors should be reduced (eliminated) with 
this patch.

It should be eliminated with this patch. When there is write error, ack will 
not be queued.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469725#comment-13469725
 ] 

Luke Lu commented on HDFS-3979:
---

bq. It should be eliminated with this patch. When there is write error, ack 
will not be queued.

I think so too, but it'll be nice to have a test to cover the case for future 
maintenance/refactor.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469827#comment-13469827
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Thanks Luke and Kan. I'll come up with a test once I get some spare cycles 
(quite busy with HBase atm).

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469952#comment-13469952
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
--

{quote}
For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged 
data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss 
of acknowledged data)
{quote}
Why API4 is needed for HBase?

As everyone known, there are usually 3 replicas in HDFS.  If only one of the 
datanodes is killed, the data is still available in the other two datanodes.  
That's why we have invented hflush (i.e. API 3) in HDFS-265.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469978#comment-13469978
 ] 

Luke Lu commented on HDFS-3979:
---

bq. Why API4 is needed for HBase?

Many configuration management system (simplest: pdsh -a hadoop-daemon.sh stop 
datanode) shutdown/restart HDFS by kill -9 datanodes in parallel. Having to 
acquiesce any OLTP like workload is error prone. How about a simple ops error: 
pdsh -a killall -9 java to the wrong window (hence the wrong cluster). IMO, 
API4 is not robust enough for HBase. Unless the performance difference is huge 
( 20% for hflush), which I doubt, it's not worth the risk, again IMO.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470016#comment-13470016
 ] 

Lars Hofhansl commented on HDFS-3979:
-

API4 is hflush (with change in OS buffers).

That's an interesting discussion by itself. hsync'ing every edit in HBase is 
prohibitive.
I have some simple numbers in HBASE-5954.

Although, I need to do that test again with the sync_file_range changes in 
HDFS-2465 (that would hopefully do most of the data sync'ing asynchronously and 
only sync the last changes and metadata synchronously upon client request).

Many applications do not need every edit to be guaranteed on disk, but have 
sync points. That is what I am aiming for in HBase. The application will know 
the specific semantics.

What is really important for HBase (IMHO) is that every block is synced to disk 
when it is closed. HBase constantly rewrites existing data via compactions so 
without syncing arbitrarily old data can be lost during a rack or DC outage.

Lastly, we can play with this. For example only one of the replicas could sync 
to disk and the other's just guarantee the data in the OS buffers (API4.5 :) ).


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467882#comment-13467882
 ] 

Lars Hofhansl commented on HDFS-3979:
-

You don't think the existing pipeline tests cover the failure scenarios? 
I see if I can get some performance numbers.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467270#comment-13467270
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Do we want this change?
Seems to me that HDFS-265 broke hsync/hflush and this would fix it.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-01 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467292#comment-13467292
 ] 

Luke Lu commented on HDFS-3979:
---

bq. Do we want the change?

I do think that the change is required for the correct hsync semantics (and 
better hflush guarantee). I'm not too sure if the change is complete without 
some reasonable test cases for failure scenarios.

BTW, do you have any new performance numbers for comparison as well?

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465915#comment-13465915
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Enqueing the seqno at end seems like the best approach. (Indeed this is done in 
the 0.20.x code as both of you said). 
I wonder why this was changed? Will have a new patch momentarily.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465987#comment-13465987
 ] 

Hadoop QA commented on HDFS-3979:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12547049/hdfs-3979-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3247//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3247//console

This message is automatically generated.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466082#comment-13466082
 ] 

Kan Zhang commented on HDFS-3979:
-

bq. I wonder why this was changed?

My guess is HDFS-265 intends to implement API3 rather than API4. 
https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12710542page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12710542

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466113#comment-13466113
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I see. Thanks Kan. So now we we have API4 and (with HDFS-744) API5.

For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged 
data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss 
of acknowledged data)


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-27 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465331#comment-13465331
 ] 

Kan Zhang commented on HDFS-3979:
-

I agree it's probably not a good idea to enqueue in a finally block. The 
original code before HDFS-265 didn't do that either.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465386#comment-13465386
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Should we simply do the enqueue at the end of receivePacket(), then?

So just to make sure: In the current code the seqno is already enqueued in the 
beginning, so if there's an exception later in the code it won't have any 
effect on the enqued seqno. The finally is just preserves this existing 
behavior.

What happens when there is an exception and the seqno is never enqueued? (and 
if that is OK, why is it not a problem now.)


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464115#comment-13464115
 ] 

Kan Zhang commented on HDFS-3979:
-

Thanks, Lars! BTW, you spell my name wrong. :-)

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl

 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464133#comment-13464133
 ] 

Lars Hofhansl commented on HDFS-3979:
-

(and sorry for misspelling you name)

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464359#comment-13464359
 ] 

Hadoop QA commented on HDFS-3979:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12546734/hdfs-3979-sketch.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3239//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3239//console

This message is automatically generated.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464402#comment-13464402
 ] 

Luke Lu commented on HDFS-3979:
---

TestHSync only tests the success code path, which makes me a bit nervous, as 
I'm not sure if putting the ack enqueue in the finally block is the right thing 
to do. I think you want the pipeline to fail and restart if there is an io 
exception.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464410#comment-13464410
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'm not sure either. I am trying not to change the existing behavior.

The enqueue used to happen in the beginning of receivePacket(...), so if that 
latter part of the method fails the ack would already be enqueued.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463515#comment-13463515
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Also see my comment here: 
https://issues.apache.org/jira/browse/HDFS-744?focusedCommentId=13279619page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13279619

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl

 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira