date:20091207


[ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786990#action_12786990
 ] 

Konstantin Boudnik commented on HDFS-804:
-

Thanks for the review, Hairong. It shouldn't be difficult to add two more test 
cases. However, I'd expect that in case of 0 blocks {{FSNamesystem}} will crap 
out in the line 1934
{noformat}
  curBlock = blocks[nrCompleteBlocks];
{noformat}
with {{ArrayIndexOutOfBoundException}}. Do you think we need this test until 
the code is fixed?

 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: HDFS-804.patch, HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-804) New unit tests for concurrent lease recovery


[ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787015#action_12787015
 ] 

Hairong Kuang commented on HDFS-804:


 Do you think we need this test until the code is fixed
Yes, please file a jira on this. Thanks a lot for addressing my new request.

 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: HDFS-804.patch, HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787019#action_12787019
]

Hairong Kuang commented on HDFS-101:

Is this the same as HDFS-795?
Yes, the only difference is that HDFS-795 describes the problem in a more
general way.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

Key: HDFS-101
URL: https://issues.apache.org/jira/browse/HDFS-101
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Hairong Kuang
Priority: Blocker
Fix For: 0.21.0

When the first datanode's write to second datanode fails or times out
DFSClient ends up marking first datanode as the bad one and removes it from
the pipeline. Similar problem exists on DataNode as well and it is fixed in
HADOOP-3339. From HADOOP-3339 :
The main issue is that BlockReceiver thread (and DataStreamer in the case of
DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty
coarse control. We don't know what state the responder is in and interrupting
has different effects depending on responder state. To fix this properly we
need to redesign how we handle these interactions.
When the first datanode closes its socket from DFSClient, DFSClient should
properly read all the data left in the socket.. Also, DataNode's closing of
the socket should not result in a TCP reset, otherwise I think DFSClient will
not be able to read from the socket.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-812) FSNamesystem#internalReleaseLease will throw ArrayIndexOutOfBoundException on an empty file's lease recovery

FSNamesystem#internalReleaseLease will throw ArrayIndexOutOfBoundException on 
an empty file's lease recovery


 Key: HDFS-812
 URL: https://issues.apache.org/jira/browse/HDFS-812
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik


{{FSNamesystem.internalReleaseLease()}} uses the result of 
{{iFile#numBlocks();}} call to get a number of an under construction file's 
blocks. {{numBlock()}} can return 0 if the file doesn't have any blocks yet. 
This will cause {{internalReleaseLease()}} to throw 
ArrayIndexOutOfBoundException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787030#action_12787030
]

Hairong Kuang commented on HDFS-101:

Assume that there is a pipeline consisting of DN0, ..., DNi, ..., where DN0 is
the closest to the client, here is the plan for handling errors detected by DNi:
1. If the error occurs when communicating with DNi+1, send an ack indicating
that DNi+1 failed and then shut down both block receiver and ack responder.
2. If the error is caused by DNi itself, simply shut down both block receiver
and ack responder. Shutting down the block receiver causes the connections to
DNi-1 to be closed, therefore DNi-1 will detect that DNi is failed immediately.
3. If the error is caused by DNi-1, handle it the same as 2.

Errors may be detected by either the block receiver or the ack responder. No
matter who detects it, it needs to notify the other one of the error so the
other one will stop and shut down itself.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

Key: HDFS-101
URL: https://issues.apache.org/jira/browse/HDFS-101
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Hairong Kuang
Priority: Blocker
Fix For: 0.21.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Problem with HDFS

2009-12-07 Thread Hairong Kuang

Most likely none of the data nodes is up.

Hairong

2009/12/7 Zuhair Khayyat zuhair.khay...@kaust.edu.sa

  Dear sir,

 I am running some experiments on Hadoop's MapReduce. I have a problem with
 the HDFS where I cannot copy to it any files! The error details are
 attached. I have tried to re format the HDFS and tried to run the DFS
 filesystem checking utility but the error still exsists. Would you please
 help me? Thank you

 Regards,
 Zuhair Khayyat

[jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

2009-12-07 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-101:
-

Affects Version/s: 0.20.1

That plan sounds entirely reasonable. I'm adding 0.20.1 as an affects version
since I can reproduce this in that version. Feel free to let me know if there's
anything I can do to help.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

Key: HDFS-101
URL: https://issues.apache.org/jira/browse/HDFS-101
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Raghu Angadi
Assignee: Hairong Kuang
Priority: Blocker
Fix For: 0.21.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

Status: Patch Available  (was: Open)

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 NN-EditsBug.patch, TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

Status: Open  (was: Patch Available)

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 NN-EditsBug.patch, TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

Attachment: NN-EditsBug.patch

This fixes findbugs.

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 NN-EditsBug.patch, TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-804) New unit tests for concurrent lease recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-804:


Attachment: HDFS-804.patch

This new version has all new test cases but one affected by HDFS-812. I'll 
create a separate patch for that JIRA.

Thanks you Hairong. I'm going to commit this as soon as the formal +1 is 
received :-)

 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: HDFS-804.patch, HDFS-804.patch, HDFS-804.patch, 
 HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-804) New unit tests for concurrent lease recovery


[ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787044#action_12787044
 ] 

Hairong Kuang commented on HDFS-804:


+1

 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: HDFS-804.patch, HDFS-804.patch, HDFS-804.patch, 
 HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-812) FSNamesystem#internalReleaseLease will throw ArrayIndexOutOfBoundException on an empty file's lease recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-812:


Attachment: HDFS-812.patch

I believe, this test version better reflects the expected reaction of 
FSnamesystem in case of 0 blocks length. Lease needs to be removed and file to 
be closed.

 FSNamesystem#internalReleaseLease will throw ArrayIndexOutOfBoundException on 
 an empty file's lease recovery
 

 Key: HDFS-812
 URL: https://issues.apache.org/jira/browse/HDFS-812
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
 Attachments: HDFS-812.patch, HDFS-812.patch


 {{FSNamesystem.internalReleaseLease()}} uses the result of 
 {{iFile#numBlocks();}} call to get a number of an under construction file's 
 blocks. {{numBlock()}} can return 0 if the file doesn't have any blocks yet. 
 This will cause {{internalReleaseLease()}} to throw 
 ArrayIndexOutOfBoundException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-804) New unit tests for concurrent lease recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-804:


   Resolution: Fixed
Fix Version/s: 0.22.0
   0.21.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed it and merged back to branch-0.21.

 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Fix For: 0.21.0, 0.22.0

 Attachments: HDFS-804.patch, HDFS-804.patch, HDFS-804.patch, 
 HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

Attachment: NN-EditsBug.patch

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 NN-EditsBug.patch, NN-EditsBug.patch, TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-192) TestBackupNode sometimes fails

2009-12-07 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787110#action_12787110
]

Hadoop QA commented on HDFS-192:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12427208/NN-EditsBug.patch
against trunk revision 887413.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/134/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/134/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/134/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/134/console

This message is automatically generated.

TestBackupNode sometimes fails
--

Key: HDFS-192
URL: https://issues.apache.org/jira/browse/HDFS-192
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
Fix For: 0.21.0

Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch,
NN-EditsBug.patch, NN-EditsBug.patch, TestBNFailure.log

TestBackupNode may fail with different reasons:
- Unable to open edit log file
.\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
- NullPointerException at
org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
- Fatal Error : All storage directories are inaccessible.
Will provide more information in the comments.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-192) TestBackupNode sometimes fails


[ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787117#action_12787117
 ] 

Konstantin Shvachko commented on HDFS-192:
--

Looks like Hadson did not pick up the latest patch.
Here are the patch results:
{code}
.[exec] There appear to be 119 release audit warnings before the patch and 
119 release audit warnings after applying the patch.
 [exec] +1 overall.  
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
BUILD SUCCESSFUL
Total time: 14 minutes 35 seconds
{code}

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 NN-EditsBug.patch, NN-EditsBug.patch, TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787120#action_12787120
]

Hairong Kuang commented on HDFS-101:

When the first datanode closes its socket from DFSClient, DFSClient should
properly read all the data left in the socket..
Kan, thanks for pointing this out, which is a very valid point. I think this
applies to data nodes in pipeline as well.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

Attachment: NN-EditsBug-21.patch

Here is patch for 0.21

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug-21.patch, 
 NN-EditsBug.patch, NN-EditsBug.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787137#action_12787137
]

Hairong Kuang commented on HDFS-101:

Would this work? When DNi detects an error when communicating with DNi+1, it
sends an ack indicating that DNi+1 failed and continue to run until DNi-1 or
the client closes the connection.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-192) TestBackupNode sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-192:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this.

 TestBackupNode sometimes fails
 --

 Key: HDFS-192
 URL: https://issues.apache.org/jira/browse/HDFS-192
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Konstantin Shvachko
 Fix For: 0.21.0

 Attachments: HADOOP-5573.patch, NN-EditsBug-21.patch, 
 NN-EditsBug.patch, NN-EditsBug.patch, NN-EditsBug.patch, NN-EditsBug.patch, 
 TestBNFailure.log


 TestBackupNode may fail with different reasons:
 - Unable to open edit log file 
 .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
 - NullPointerException at 
 org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
 - Fatal Error : All storage directories are inaccessible.
 Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-456) Problems with dfs.name.edits.dirs as URI


 [ 
https://issues.apache.org/jira/browse/HDFS-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-456:
-

Assignee: Konstantin Shvachko  (was: Luca Telloli)
  Status: Open  (was: Patch Available)

 Problems with dfs.name.edits.dirs as URI
 

 Key: HDFS-456
 URL: https://issues.apache.org/jira/browse/HDFS-456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
Priority: Blocker
 Fix For: 0.21.0

 Attachments: failing-tests.zip, HDFS-456-clean.patch, HDFS-456.patch, 
 HDFS-456.patch, HDFS-456.patch, HDFS-456.patch, HDFS-456.patch, 
 HDFS-456.patch, HDFS-456.patch, HDFS-456.patch


 There are several problems with recent commit of HDFS-396.
 # It does not work with default configuration file:///. Throws 
 {{IllegalArgumentException}}.
 # *ALL* hdfs tests fail on Windows because C:\mypath is treated as an 
 illegal URI. Backward compatibility is not provided.
 # {{IllegalArgumentException}} should not be thrown within hdfs code because 
 it is a {{RuntimException}}. We should throw {{IOException}} instead. This 
 was recently discussed in another jira.
 # Why do we commit patches without running unit tests and test-patch? This is 
 the minimum requirement for a patch to qualify as committable, right?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787177#action_12787177
]

Hairong Kuang commented on HDFS-101:

To be more specific, if the block receiver gets an error sending a packet to
DNi+1, it still queue the packet to the ack queue, but with a flag
mirrorError set to be true indicating that the packet has an error mirroring
to DNi+1. The block receiver continues to write the packet to disk and then
handle the next packet. A packet responder does not exit when detecting DNi+1
has an error.

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-07 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787224#action_12787224
 ] 

Todd Lipcon commented on HDFS-755:
--

I checked Raghu's idea and it's correct - since there's a buffer in BlockReader 
and it does some small reads at the beginning of the block (to get status code, 
etc) it misaligns the reads. I'm benchmarking various values for this buffer 
now with similar benchmarks as I ran in HADOOP-3205. (that JIRA should still be 
fine to commit)

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-423) Unbreak FUSE build and fuse_dfs_wrapper.sh


 [ 
https://issues.apache.org/jira/browse/HDFS-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-423:


   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Todd.

 Unbreak FUSE build and fuse_dfs_wrapper.sh
 --

 Key: HDFS-423
 URL: https://issues.apache.org/jira/browse/HDFS-423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Giridharan Kesavan
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: hdfs-423-2.patch, hdfs-423-3.patch, hdfs-423-4.patch, 
 hdfs423.patch, patch-4922.v1.txt


 fuse-dfs depends on libhdfs, and fuse-dfs build.xml still points to the 
 libhfds/libhdfs.so location but libhdfs now is build in a different location 
 please take a look at this bug for the location details 
 https://issues.apache.org/jira/browse/HADOOP-3344
 Thanks,
 Giri

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-793) DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet


 [ 
https://issues.apache.org/jira/browse/HDFS-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-793:
---

Attachment: separateSendRcvAck1.patch

This patch has the following chanages:
1. Bump up the data transfer protocol version number;
2. Remove the unused method ackReply;
3. Replace the use of seqno of -1 with PipelineAck.HEART_BEAT.getseqno();

 DataNode should first receive the whole packet ack message before it 
 constructs and sends its own ack message for the packet
 

 Key: HDFS-793
 URL: https://issues.apache.org/jira/browse/HDFS-793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0, 0.22.0

 Attachments: separateSendRcvAck.patch, separateSendRcvAck1.patch


 Currently BlockReceiver#PacketResponder interleaves receiving ack message and 
 sending ack message for the same packet. It reads a portion of the message, 
 sends a portion of its ack, and continues like this until it hits the end of 
 the message. The problem is that if it gets an error receiving the ack, it is 
 not able to send an ack that reflects the source of the error.
 The PacketResponder should receives the whole packet ack message first and 
 then constuct and sends out its ack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-423) Unbreak FUSE build and fuse_dfs_wrapper.sh


[ 
https://issues.apache.org/jira/browse/HDFS-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787256#action_12787256
 ] 

Hudson commented on HDFS-423:
-

Integrated in Hadoop-Hdfs-trunk-Commit #133 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/133/])
. Unbreak FUSE build and fuse_dfs_wrapper.sh. Contributed by Todd Lipcon.


 Unbreak FUSE build and fuse_dfs_wrapper.sh
 --

 Key: HDFS-423
 URL: https://issues.apache.org/jira/browse/HDFS-423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Giridharan Kesavan
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: hdfs-423-2.patch, hdfs-423-3.patch, hdfs-423-4.patch, 
 hdfs423.patch, patch-4922.v1.txt


 fuse-dfs depends on libhdfs, and fuse-dfs build.xml still points to the 
 libhfds/libhdfs.so location but libhdfs now is build in a different location 
 please take a look at this bug for the location details 
 https://issues.apache.org/jira/browse/HADOOP-3344
 Thanks,
 Giri

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-793) DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet


 [ 
https://issues.apache.org/jira/browse/HDFS-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-793:
---

Attachment: separateSendRcvAck1.patch

 DataNode should first receive the whole packet ack message before it 
 constructs and sends its own ack message for the packet
 

 Key: HDFS-793
 URL: https://issues.apache.org/jira/browse/HDFS-793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0, 0.22.0

 Attachments: separateSendRcvAck.patch, separateSendRcvAck1.patch


 Currently BlockReceiver#PacketResponder interleaves receiving ack message and 
 sending ack message for the same packet. It reads a portion of the message, 
 sends a portion of its ack, and continues like this until it hits the end of 
 the message. The problem is that if it gets an error receiving the ack, it is 
 not able to send an ack that reflects the source of the error.
 The PacketResponder should receives the whole packet ack message first and 
 then constuct and sends out its ack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-793) DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet


 [ 
https://issues.apache.org/jira/browse/HDFS-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-793:
---

Attachment: (was: separateSendRcvAck1.patch)

 DataNode should first receive the whole packet ack message before it 
 constructs and sends its own ack message for the packet
 

 Key: HDFS-793
 URL: https://issues.apache.org/jira/browse/HDFS-793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0, 0.22.0

 Attachments: separateSendRcvAck.patch, separateSendRcvAck1.patch


 Currently BlockReceiver#PacketResponder interleaves receiving ack message and 
 sending ack message for the same packet. It reads a portion of the message, 
 sends a portion of its ack, and continues like this until it hits the end of 
 the message. The problem is that if it gets an error receiving the ack, it is 
 not able to send an ack that reflects the source of the error.
 The PacketResponder should receives the whole packet ack message first and 
 then constuct and sends out its ack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-813) Enable the append test in TestReadWhileWriting

2009-12-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

Enable the append test in TestReadWhileWriting
--

 Key: HDFS-813
 URL: https://issues.apache.org/jira/browse/HDFS-813
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Fix For: 0.21.0, 0.22.0


The second part of TestReadWhileWriting should be enabled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-793) DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet


 [ 
https://issues.apache.org/jira/browse/HDFS-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-793:
---

Status: Patch Available  (was: Open)

 DataNode should first receive the whole packet ack message before it 
 constructs and sends its own ack message for the packet
 

 Key: HDFS-793
 URL: https://issues.apache.org/jira/browse/HDFS-793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0, 0.22.0

 Attachments: separateSendRcvAck.patch, separateSendRcvAck1.patch


 Currently BlockReceiver#PacketResponder interleaves receiving ack message and 
 sending ack message for the same packet. It reads a portion of the message, 
 sends a portion of its ack, and continues like this until it hits the end of 
 the message. The problem is that if it gets an error receiving the ack, it is 
 not able to send an ack that reflects the source of the error.
 The PacketResponder should receives the whole packet ack message first and 
 then constuct and sends out its ack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-804) New unit tests for concurrent lease recovery


[ 
https://issues.apache.org/jira/browse/HDFS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787341#action_12787341
 ] 

Hudson commented on HDFS-804:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #135 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/135/])
. New unit tests for concurrent lease recovery. Contributed by Konstantin 
Boudnik


 New unit tests for concurrent lease recovery
 

 Key: HDFS-804
 URL: https://issues.apache.org/jira/browse/HDFS-804
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Fix For: 0.21.0, 0.22.0

 Attachments: HDFS-804.patch, HDFS-804.patch, HDFS-804.patch, 
 HDFS-804.patch


 {{FSNamesystem}} code which process concurrent lease recovery isn't tested. 
 We need new test cases to cover these spots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-781) Metrics PendingDeletionBlocks is not decremented


[ 
https://issues.apache.org/jira/browse/HDFS-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787343#action_12787343
 ] 

Hudson commented on HDFS-781:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #135 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/135/])


 Metrics PendingDeletionBlocks is not decremented
 

 Key: HDFS-781
 URL: https://issues.apache.org/jira/browse/HDFS-781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Priority: Blocker
 Fix For: 0.20.2, 0.21.0, 0.22.0

 Attachments: hdfs-781.1.patch, hdfs-781.2.patch, hdfs-781.3.patch, 
 hdfs-781.4.patch, hdfs-781.patch, hdfs-781.rel20.1.patch, hdfs-781.rel20.patch


 PendingDeletionBlocks is not decremented decremented when blocks pending 
 deletion in {{BlockManager.recentInvalidateSets}} are sent to datanode for 
 deletion. This results in invalid value in the metrics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-423) Unbreak FUSE build and fuse_dfs_wrapper.sh


[ 
https://issues.apache.org/jira/browse/HDFS-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787344#action_12787344
 ] 

Hudson commented on HDFS-423:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #135 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/135/])
. Unbreak FUSE build and fuse_dfs_wrapper.sh. Contributed by Todd Lipcon.


 Unbreak FUSE build and fuse_dfs_wrapper.sh
 --

 Key: HDFS-423
 URL: https://issues.apache.org/jira/browse/HDFS-423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Giridharan Kesavan
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: hdfs-423-2.patch, hdfs-423-3.patch, hdfs-423-4.patch, 
 hdfs423.patch, patch-4922.v1.txt


 fuse-dfs depends on libhdfs, and fuse-dfs build.xml still points to the 
 libhfds/libhdfs.so location but libhdfs now is build in a different location 
 please take a look at this bug for the location details 
 https://issues.apache.org/jira/browse/HADOOP-3344
 Thanks,
 Giri

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-192) TestBackupNode sometimes fails