date:20120828


 [ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1490:
--

Status: Patch Available  (was: Open)

 TransferFSImage should timeout
 --

 Key: HDFS-1490
 URL: https://issues.apache.org/jira/browse/HDFS-1490
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Attachments: HDFS-1490.patch, HDFS-1490.patch


 Sometimes when primary crashes during image transfer secondary namenode would 
 hang trying to read the image from HTTP connection forever.
 It would be great to set timeouts on the connection so if something like that 
 happens there is no need to restart the secondary itself.
 In our case restarting components is handled by the set of scripts and since 
 the Secondary as the process is running it would just stay hung until we get 
 an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

[
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443034#comment-13443034
]

Hadoop QA commented on HDFS-1490:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542723/HDFS-1490.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3105//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3105//console

This message is automatically generated.

TransferFSImage should timeout
--

Key: HDFS-1490
URL: https://issues.apache.org/jira/browse/HDFS-1490
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
Attachments: HDFS-1490.patch, HDFS-1490.patch

Sometimes when primary crashes during image transfer secondary namenode would
hang trying to read the image from HTTP connection forever.
It would be great to set timeouts on the connection so if something like that
happens there is no need to restart the secondary itself.
In our case restarting components is handled by the set of scripts and since
the Secondary as the process is running it would just stay hung until we get
an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

2012-08-28 Thread Vinay (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443051#comment-13443051
 ] 

Vinay commented on HDFS-1490:
-

{code}  Call to equals() comparing different types in 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock){code}
Find bug warning unrelated to current patch

{code}Failed tests:   
testHdfsDelegationToken(org.apache.hadoop.hdfs.TestHftpDelegationToken): wrong 
tokens in user expected:2 but was:1{code}
Also test failure is unrelated to current patch

 TransferFSImage should timeout
 --

 Key: HDFS-1490
 URL: https://issues.apache.org/jira/browse/HDFS-1490
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Attachments: HDFS-1490.patch, HDFS-1490.patch


 Sometimes when primary crashes during image transfer secondary namenode would 
 hang trying to read the image from HTTP connection forever.
 It would be great to set timeouts on the connection so if something like that 
 happens there is no need to restart the secondary itself.
 In our case restarting components is handled by the set of scripts and since 
 the Secondary as the process is running it would just stay hung until we get 
 an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write

2012-08-28 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned HDFS-3847:
---

Assignee: (was: Devaraj K)

 using NFS As a shared storage for NameNode HA , how to ensure that only one 
 write
 -

 Key: HDFS-3847
 URL: https://issues.apache.org/jira/browse/HDFS-3847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.0-alpha, 2.0.1-alpha
Reporter: liaowenrui
Priority: Critical
 Fix For: 2.0.0-alpha




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write

2012-08-28 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned HDFS-3847:
---

Assignee: Devaraj K

 using NFS As a shared storage for NameNode HA , how to ensure that only one 
 write
 -

 Key: HDFS-3847
 URL: https://issues.apache.org/jira/browse/HDFS-3847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.0-alpha, 2.0.1-alpha
Reporter: liaowenrui
Assignee: Devaraj K
Priority: Critical
 Fix For: 2.0.0-alpha




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443056#comment-13443056
 ] 

Suresh Srinivas commented on HDFS-3860:
---

Jing, nice find. Submitting the patch.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443058#comment-13443058
 ] 

Suresh Srinivas commented on HDFS-3860:
---

BTW could you please also ensure that this pattern of code is not repeated in 
any other places.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


 [ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-3860:
--

Status: Patch Available  (was: Open)

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes


[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443075#comment-13443075
 ] 

Suresh Srinivas commented on HDFS-3791:
---

Uma sorry for the delay in reviewing this. +1 for the patch.

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes


 [ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-3791:
--

Attachment: HDFS-3791.patch

Rebased the patch on latest branch-1

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes


 [ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-3791.
---

   Resolution: Fixed
Fix Version/s: 1.2.0
 Hadoop Flags: Reviewed

I committed the patch. Thank you Uma.

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 1.2.0

 Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443103#comment-13443103
 ] 

Uma Maheswara Rao G commented on HDFS-3791:
---

Oh, I have just seen the comments.
{quote}
Uma sorry for the delay in reviewing this. +1 for the patch.
{quote}
No problem :-). Thanks a lot, Suresh for the reviews.
Also thanks for rebasing it. I will to get a patch for HDFS-2815 today in some 
time.

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 1.2.0

 Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

[
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443105#comment-13443105
]

Hadoop QA commented on HDFS-3860:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console

This message is automatically generated.

HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
-

Key: HDFS-3860
URL: https://issues.apache.org/jira/browse/HDFS-3860
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch

In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the
monitor thread will acquire the write lock of namesystem, and recheck the
safemode. If it is in safemode, the monitor thread will return from the
heartbeatCheck function without release the write lock. This may cause the
monitor thread wrongly holding the write lock forever.
The attached test case tries to simulate this bad scenario.

[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443113#comment-13443113
 ] 

Suresh Srinivas commented on HDFS-3837:
---

Seems to me the findbugs is not fixed by the new patch or is it Jenkins error. 

Fixing this issue quickly will help. Currently all Jenkins reports have 
findbugs -1 for precommit tests.

{noformat}
Call to equals() comparing different types in 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
Bug type EC_UNRELATED_TYPES (click for details) 
In class org.apache.hadoop.hdfs.server.datanode.DataNode
In method 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
Actual type org.apache.hadoop.hdfs.protocol.DatanodeInfo
Expected org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration
Value loaded from id
Value loaded from bpReg
org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration.equals(Object) used 
to determine equality
At DataNode.java:[line 1869]
{noformat}

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3837.txt, hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure


[ 
https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443123#comment-13443123
 ] 

Hudson commented on HDFS-3856:
--

Integrated in Hadoop-Hdfs-trunk #1148 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1148/])
Fixup CHANGELOG for HDFS-3856. (Revision 1377936)
HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. 
Contributed by Colin Patrick McCabe (Revision 1377934)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1377936
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1377934
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


 TestHDFSServerPorts failure is causing surefire fork failure
 

 Key: HDFS-3856
 URL: https://issues.apache.org/jira/browse/HDFS-3856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0-alpha
Reporter: Thomas Graves
Assignee: Eli Collins
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hdfs-3856.txt, hdfs-3856.txt


 We have been seeing the hdfs tests on trunk and branch-2 error out with fork 
 failures.  I see the hadoop jenkins trunk build is also seeing these:
 https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure


[ 
https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443151#comment-13443151
 ] 

Hudson commented on HDFS-3856:
--

Integrated in Hadoop-Mapreduce-trunk #1179 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1179/])
Fixup CHANGELOG for HDFS-3856. (Revision 1377936)
HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. 
Contributed by Colin Patrick McCabe (Revision 1377934)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1377936
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1377934
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


 TestHDFSServerPorts failure is causing surefire fork failure
 

 Key: HDFS-3856
 URL: https://issues.apache.org/jira/browse/HDFS-3856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0-alpha
Reporter: Thomas Graves
Assignee: Eli Collins
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hdfs-3856.txt, hdfs-3856.txt


 We have been seeing the hdfs tests on trunk and branch-2 error out with fork 
 failures.  I see the hadoop jenkins trunk build is also seeing these:
 https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HDFS-3852:
--

Attachment: HDFS-3852.patch

The test is attempting to insert two tokens with the same service. The UGI's
private creds is a list which happily accepted tokens with duplicate services
and even duplicate tokens. When I changed UGI in HADOOP-8225 to allow
extraction of a {{Credentials}} object from the UGI, it broke the test because
{{Credentials}} uses a map for tokens which naturally doesn't allow for service
dups. The test is really trying to ensure the correct token is retrieved for
htftp so I changed the 2nd token to have a different service to prevent it
replacing the first token.

Arguably, multiple tokens for the same service with different kinds should be
permissible. However in practice that is/was not possible because a
{{Credentials}} (which doesn't allow service dups) is used to build up tokens
to be dumped into the UGI.

TestHftpDelegationToken is broken after HADOOP-8225
---

Key: HDFS-3852
URL: https://issues.apache.org/jira/browse/HDFS-3852
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client, security
Affects Versions: 0.23.3, 2.1.0-alpha
Reporter: Aaron T. Myers
Assignee: Daryn Sharp
Attachments: HDFS-3852.patch

It's been failing in all builds for the last 2 days or so. Git bisect
indicates that it's due to HADOOP-8225.

[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3852:
--

Status: Patch Available  (was: Open)

 TestHftpDelegationToken is broken after HADOOP-8225
 ---

 Key: HDFS-3852
 URL: https://issues.apache.org/jira/browse/HDFS-3852
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, security
Affects Versions: 0.23.3, 2.1.0-alpha
Reporter: Aaron T. Myers
Assignee: Daryn Sharp
 Attachments: HDFS-3852.patch


 It's been failing in all builds for the last 2 days or so. Git bisect 
 indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225


[ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443212#comment-13443212
 ] 

Aaron T. Myers commented on HDFS-3852:
--

Got it. Makes sense. Thanks for the explanation, Daryn, and thanks for looking 
into this issue.

The patch looks good to me. +1 pending Jenkins.

 TestHftpDelegationToken is broken after HADOOP-8225
 ---

 Key: HDFS-3852
 URL: https://issues.apache.org/jira/browse/HDFS-3852
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, security
Affects Versions: 0.23.3, 2.1.0-alpha
Reporter: Aaron T. Myers
Assignee: Daryn Sharp
 Attachments: HDFS-3852.patch


 It's been failing in all builds for the last 2 days or so. Git bisect 
 indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443221#comment-13443221
 ] 

Robert Joseph Evans commented on HDFS-3731:
---

Any update on branch-0.23?  Do you want me to look into it?

 2.0 release upgrade must handle blocks being written from 1.0
 -

 Key: HDFS-3731
 URL: https://issues.apache.org/jira/browse/HDFS-3731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Colin Patrick McCabe
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch


 Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
 release. Problem reported by Brahma Reddy.
 The {{DataNode}} will only have one block pool after upgrading from a 1.x 
 release.  (This is because in the 1.x releases, there were no block pools-- 
 or equivalently, everything was in the same block pool).  During the upgrade, 
 we should hardlink the block files from the {{blocksBeingWritten}} directory 
 into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
 we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


 [ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3837:
--

Attachment: hdfs-3837.txt

The findbugs warning seems bogus - This method calls equals(Object) on two 
references of different class types with no common subclasses. Therefore, the 
objects being compared are unlikely to be members of the same class at 
runtime.  Both DatanodeInfo and DatanodeRegistration extend DatanodeID so they 
 both share the equals implementation.

Anyway, I'll put the relevant code back (cast the array) since this fixes the 
findbugs warning is is fine (just more verbose).

{code}
-DatanodeID[] datanodeids = rBlock.getLocations();
+DatanodeInfo[] targets = rBlock.getLocations();
+DatanodeID[] datanodeids = (DatanodeID[])targets;
{code}

Updated patch, includes the comments as well so it's clear both classes are 
using the same equals method.

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443245#comment-13443245
 ] 

Ted Yu commented on HDFS-3791:
--

Currently small deletion is determined by the constant BLOCK_DELETION_INCREMENT:
{code}
+  deleteNow = collectedBlocks.size() = BLOCK_DELETION_INCREMENT;
{code}
I wonder if there is use case where the increment should be configurable.

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 1.2.0

 Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

[
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443264#comment-13443264
]

Hadoop QA commented on HDFS-3852:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542779/HDFS-3852.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3107//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3107//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3107//console

This message is automatically generated.

TestHftpDelegationToken is broken after HADOOP-8225
---

It's been failing in all builds for the last 2 days or so. Git bisect
indicates that it's due to HADOOP-8225.

[jira] [Created] (HDFS-3861) Deadlock in DFSClient

Kihwal Lee created HDFS-3861:


 Summary: Deadlock in DFSClient
 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha


The deadlock is between DFSOutputStream#close() and DFSClient#close().



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3861) Deadlock in DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443269#comment-13443269
 ] 

Kihwal Lee commented on HDFS-3861:
--

DFSClient#getLeaseRenewer() doesn't have to be synchronized since 
LeaseManager.Factory methods are synchronized. Multiple callers are still 
guaranteed to get a single live renewer back.


{noformat}
Java stack information for the threads listed above:
===
Thread-28:
at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1729)
- waiting to lock 0xb5a05dc8 (a
org.apache.hadoop.hdfs.DFSOutputStream)
at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:674)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:691)
- locked 0xb5a06ed8 (a org.apache.hadoop.hdfs.DFSClient)
at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2386)
- locked 0xb44b00e8 (a org.apache.hadoop.fs.FileSystem$Cache)
at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2403)
- locked 0xb44b0100 (a
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Thread-1175:
at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:538)
- waiting to lock 0xb5a06ed8 (a org.apache.hadoop.hdfs.DFSClient)
at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:550)
at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1757)
- locked 0xb5a05dc8 (a org.apache.hadoop.hdfs.DFSOutputStream)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99)
at
org.apache.hadoop.hdfs.TestDatanodeDeath$Workload.run(TestDatanodeDeath.java:101)
{noformat}

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443271#comment-13443271
 ] 

Aaron T. Myers commented on HDFS-3860:
--

Oof, good catch, Jing. Fortunately this case seems like it would be pretty 
tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck 
will return early, so to hit this the NN would have to enter SM in a very short 
window of time. Still certainly worth fixing, though.

The patch looks good to me. The findbugs warning is unrelated and 
TestHftpDelegationToken is known to currently be failing.

+1, I'll commit this momentarily.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3861) Deadlock in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3861:
-

Attachment: hdfs-3861.patch.txt

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3861) Deadlock in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3861:
-

Status: Patch Available  (was: Open)

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


 [ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3860:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Jing.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443286#comment-13443286
 ] 

Suresh Srinivas commented on HDFS-3837:
---

If this is a findbugs issue, why not just add this to findbugs exclude?

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443289#comment-13443289
 ] 

Suresh Srinivas commented on HDFS-3860:
---

Thanks Aaron for committing the patch.

bq. BTW could you please also ensure that this pattern of code is not repeated 
in any other places.
Going back to my previous comment, Jing, if possible can you also see if there 
other such issues.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443292#comment-13443292
 ] 

Jing Zhao commented on HDFS-3860:
-

I just checked all the invocation of namesystem#writelock / writeunlock, and 
did not find similar problems. I will check other similar code too.

 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes


[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443296#comment-13443296
 ] 

Suresh Srinivas commented on HDFS-3791:
---

when I added this in trunk, I was not sure if there is a usecase. The whole 
idea was to give up lock once deleting some number of blocks. So the number 
currently is arbitrary.

 Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
 millions of files makes NameNode unresponsive for other commands until the 
 deletion completes
 

 Key: HDFS-3791
 URL: https://issues.apache.org/jira/browse/HDFS-3791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 1.2.0

 Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch


 Backport HDFS-173. 
 see the 
 [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3861) Deadlock in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-3861:


Assignee: Kihwal Lee

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-08-28 Thread Uma Maheswara Rao G (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uma Maheswara Rao G updated HDFS-2815:
--

Attachment: HDFS-2815-branch-1.patch

Namenode is not coming out of safemode when we perform ( NN crash + restart )
. Also FSCK report shows blocks missed.
--

Key: HDFS-2815
URL: https://issues.apache.org/jira/browse/HDFS-2815
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical
Fix For: 2.0.0-alpha, 3.0.0

Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch,
HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch

When tested the HA(internal) with continuous switch with some 5mins gap,
found some *blocks missed* and namenode went into safemode after next switch.

After the analysis, i found that this files already deleted by clients.
But i don't see any delete commands logs namenode log files. But namenode
added that blocks to invalidateSets and DNs deleted the blocks.
When restart of the namenode, it went into safemode and expecting some
more blocks to come out of safemode.
Here the reason could be that, file has been deleted in memory and added
into invalidates after this it is trying to sync the edits into editlog file.
By that time NN asked DNs to delete that blocks. Now namenode shuts down
before persisting to editlogs.( log behind)
Due to this reason, we may not get the INFO logs about delete, and when we
restart the Namenode (in my scenario it is again switch), Namenode expects
this deleted blocks also, as delete request is not persisted into editlog
before.
I reproduced this scenario with bedug points. *I feel, We should not add
the blocks to invalidates before persisting into Editlog*.
Note: for switch, we used kill -9 (force kill)
I am currently in 0.20.2 version. Same verified in 0.23 as well in normal
crash + restart scenario.

[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Status: Open  (was: Patch Available)

 FileContext HDFS implementation can leak socket caches
 --

 Key: HDFS-3373
 URL: https://issues.apache.org/jira/browse/HDFS-3373
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: John George
 Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch


 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
 and thus never calls DFSClient.close(). This means that, until finalizers 
 run, DFSClient will hold on to its SocketCache object and potentially have a 
 lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Attachment: HDFS-3373.trunk.patch.1

TestConnCache failure is related to this JIRA. I had moved testDisableCache() 
from that test to another test file because now it is not possible to change 
cache config per DFS. 

TestHftpDelegationToken is unrelated to this patch and has been failing in 
other builds as well.

Attaching a patch with testDisableCache() removed from TestConnCache to a new 
file

 FileContext HDFS implementation can leak socket caches
 --

 Key: HDFS-3373
 URL: https://issues.apache.org/jira/browse/HDFS-3373
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: John George
 Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, 
 HDFS-3373.trunk.patch.1


 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
 and thus never calls DFSClient.close(). This means that, until finalizers 
 run, DFSClient will hold on to its SocketCache object and potentially have a 
 lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Status: Patch Available  (was: Open)

 FileContext HDFS implementation can leak socket caches
 --

 Key: HDFS-3373
 URL: https://issues.apache.org/jira/browse/HDFS-3373
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: John George
 Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, 
 HDFS-3373.trunk.patch.1


 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
 and thus never calls DFSClient.close(). This means that, until finalizers 
 run, DFSClient will hold on to its SocketCache object and potentially have a 
 lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: recovery-mode.pdf

Here is an updated Recovery Mode design document.

Implement Recovery Mode
---

Key: HDFS-3004
URL: https://issues.apache.org/jira/browse/HDFS-3004
Project: Hadoop HDFS
Issue Type: New Feature
Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Fix For: 2.0.0-alpha

Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch,
HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch,
HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch,
HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch,
HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch,
HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch,
HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch,
HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch,
HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch,
HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch,
HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch,
HDFS-3004__namenode_recovery_tool.txt, recovery-mode.pdf

When the NameNode metadata is corrupt for some reason, we want to be able to
fix it. Obviously, we would prefer never to get in this case. In a perfect
world, we never would. However, bad data on disk can happen from time to
time, because of hardware errors or misconfigurations. In the past we have
had to correct it manually, which is time-consuming and which can result in
downtime.
Recovery mode is initialized by the system administrator. When the NameNode
starts up in Recovery Mode, it will try to load the FSImage file, apply all
the edits from the edits log, and then write out a new image. Then it will
shut down.
Unlike in the normal startup process, the recovery mode startup process will
be interactive. When the NameNode finds something that is inconsistent, it
will prompt the operator as to what it should do. The operator can also
choose to take the first option for all prompts by starting up with the '-f'
flag, or typing 'a' at one of the prompts.
I have reused as much code as possible from the NameNode in this tool.
Hopefully, the effort that was spent developing this will also make the
NameNode editLog and image processing even more robust than it already is.

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443338#comment-13443338
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3861) Deadlock in DFSClient

[
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443351#comment-13443351
]

Hadoop QA commented on HDFS-3861:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542787/hdfs-3861.patch.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3109//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3109//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3109//console

This message is automatically generated.

Deadlock in DFSClient
-

Key: HDFS-3861
URL: https://issues.apache.org/jira/browse/HDFS-3861
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

Attachments: hdfs-3861.patch.txt

The deadlock is between DFSOutputStream#close() and DFSClient#close().

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443353#comment-13443353
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Common-trunk-Commit #2651 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem


[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443367#comment-13443367
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


 HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
 -

 Key: HDFS-3860
 URL: https://issues.apache.org/jira/browse/HDFS-3860
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch


 In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
 monitor thread will acquire the write lock of namesystem, and recheck the 
 safemode. If it is in safemode, the monitor thread will return from the 
 heartbeatCheck function without release the write lock. This may cause the 
 monitor thread wrongly holding the write lock forever.
 The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1


[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443375#comment-13443375
 ] 

Colin Patrick McCabe commented on HDFS-3540:


Hi Nicholas,

Your summary seems reasonable to me overall.  I agree with you that the 
recommended setting for edit log toleration should be disabled.  Is there 
anything left to do for this JIRA?

 Further improvement on recovery mode and edit log toleration in branch-1
 

 Key: HDFS-3540
 URL: https://issues.apache.org/jira/browse/HDFS-3540
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.2.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

 *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
 recovery mode feature in branch-1 is dramatically different from the recovery 
 mode in trunk since the edit log implementations in these two branch are 
 different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
 in trunk.
 *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
 UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
 There are overlaps between these two features.  We study potential further 
 improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0


[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443377#comment-13443377
 ] 

Colin Patrick McCabe commented on HDFS-3731:


bq. Any update on branch-0.23? Do you want me to look into it?

There are some differences in the branch-0.23 BlockManager state machine, such 
that a straight port of the patch doesn't work.  The easiest thing to do would 
probably be to backport some of the BlockManager fixes and improvements to 
branch-0.23.  If you would look into that it would be good.

 2.0 release upgrade must handle blocks being written from 1.0
 -

 Key: HDFS-3731
 URL: https://issues.apache.org/jira/browse/HDFS-3731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Colin Patrick McCabe
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch


 Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
 release. Problem reported by Brahma Reddy.
 The {{DataNode}} will only have one block pool after upgrading from a 1.x 
 release.  (This is because in the 1.x releases, there were no block pools-- 
 or equivalently, everything was in the same block pool).  During the upgrade, 
 we should hardlink the block files from the {{blocksBeingWritten}} directory 
 into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
 we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

[
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443387#comment-13443387
]

Hadoop QA commented on HDFS-3837:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542780/hdfs-3837.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3108//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3108//console

This message is automatically generated.

Fix DataNode.recoverBlock findbugs warning
--

Key: HDFS-3837
URL: https://issues.apache.org/jira/browse/HDFS-3837
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt

HDFS-2686 introduced the following findbugs warning:
{noformat}
Call to equals() comparing different types in
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
{noformat}
Both are using DatanodeID#equals but it's a different method because
DNR#equals overrides equals for some reason (doesn't change behavior).

[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

[
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443394#comment-13443394
]

Hadoop QA commented on HDFS-2815:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12542794/HDFS-2815-branch-1.patch
against trunk revision .

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3111//console

This message is automatically generated.

Namenode is not coming out of safemode when we perform ( NN crash + restart )
. Also FSCK report shows blocks missed.
--

Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch,
HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch

When tested the HA(internal) with continuous switch with some 5mins gap,
found some *blocks missed* and namenode went into safemode after next switch.

[jira] [Commented] (HDFS-3861) Deadlock in DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443401#comment-13443401
 ] 

Kihwal Lee commented on HDFS-3861:
--

- The test failures are not related to this patch.
- No test was added. Existing test case exposed this bug (TestDataNodeDeath).
- The findbugs warning is not caused by this patch.

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.


 [ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3849:
---

Attachment: HDFS-3849.003.patch

* don't set DT config

 When re-loading the FSImage, we should clear the existing genStamp and leases.
 --

 Key: HDFS-3849
 URL: https://issues.apache.org/jira/browse/HDFS-3849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
 HDFS-3849.003.patch


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 This is an issue in the 2NN, because it sometimes clears the existing FSImage 
 and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.


[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443445#comment-13443445
 ] 

Aaron T. Myers commented on HDFS-3849:
--

+1 pending Jenkins.

 When re-loading the FSImage, we should clear the existing genStamp and leases.
 --

 Key: HDFS-3849
 URL: https://issues.apache.org/jira/browse/HDFS-3849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
 HDFS-3849.003.patch


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 This is an issue in the 2NN, because it sometimes clears the existing FSImage 
 and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3861) Deadlock in DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443463#comment-13443463
 ] 

Colin Patrick McCabe commented on HDFS-3861:


Looks good to me.

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-28 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443476#comment-13443476
 ] 

Steve Loughran commented on HDFS-3859:
--

Isn't MD5 overkill? Can't a good CRC (like TCP Jumbo Frames uses) suffice?

 QJM: implement md5sum verification
 --

 Key: HDFS-3859
 URL: https://issues.apache.org/jira/browse/HDFS-3859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 When the QJM passes journal segments between nodes, it should use an md5sum 
 field to make sure the data doesn't get corrupted during transit. This also 
 serves as an extra safe-guard to make sure that the data is consistent across 
 all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification


[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443483#comment-13443483
 ] 

Todd Lipcon commented on HDFS-3859:
---

Sure, it's overkill, but it's not that expensive and we already have an 
implementation of it sitting around. It's also handy because md5sum is 
commonly available on the command line, and we use it for FSImages already as 
well. Performance-wise, my laptop can md5sum at about 500MB/sec, so given that 
log segments under recovery are likely to be much smaller than 500M, I don't 
think we should be concerned about that.

 QJM: implement md5sum verification
 --

 Key: HDFS-3859
 URL: https://issues.apache.org/jira/browse/HDFS-3859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 When the QJM passes journal segments between nodes, it should use an md5sum 
 field to make sure the data doesn't get corrupted during transit. This also 
 serves as an extra safe-guard to make sure that the data is consistent across 
 all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics

Todd Lipcon created HDFS-3862:
-

 Summary: QJM: don't require a fencer to be configured if shared 
storage has built-in single-writer semantics
 Key: HDFS-3862
 URL: https://issues.apache.org/jira/browse/HDFS-3862
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon


Currently, NN HA requires that the administrator configure a fencing method to 
ensure that only a single NameNode may write to the shared storage at a time. 
Some shared edits storage implementations (like QJM) inherently enforce 
single-writer semantics at the storage level, and thus the user should not be 
forced to specify one.

We should extend the JournalManager interface so that the HA code can operate 
without a configured fencer if the JM has such built-in fencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics


[ 
https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443518#comment-13443518
 ] 

Todd Lipcon commented on HDFS-3862:
---

I think this might be the case for BookKeeper as well. Any of the folks working 
on BKJM want to take this on? I anticipate we would add a simple API to 
JournalManager like: {{boolean isNativelySingleWriter();}} or {{boolean 
needsExternalFencing();}}. Then the failover code could check the shared 
storage dir to see if this is the case, and if so, not error out if the user 
doesn't specify a fence method.

 QJM: don't require a fencer to be configured if shared storage has built-in 
 single-writer semantics
 ---

 Key: HDFS-3862
 URL: https://issues.apache.org/jira/browse/HDFS-3862
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon

 Currently, NN HA requires that the administrator configure a fencing method 
 to ensure that only a single NameNode may write to the shared storage at a 
 time. Some shared edits storage implementations (like QJM) inherently enforce 
 single-writer semantics at the storage level, and thus the user should not be 
 forced to specify one.
 We should extend the JournalManager interface so that the HA code can operate 
 without a configured fencer if the JM has such built-in fencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3373) FileContext HDFS implementation can leak socket caches

[
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443524#comment-13443524
]

Hadoop QA commented on HDFS-3373:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12542795/HDFS-3373.trunk.patch.1
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3110//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3110//console

This message is automatically generated.

FileContext HDFS implementation can leak socket caches
--

Key: HDFS-3373
URL: https://issues.apache.org/jira/browse/HDFS-3373
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: John George
Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch,
HDFS-3373.trunk.patch.1

As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method,
and thus never calls DFSClient.close(). This means that, until finalizers
run, DFSClient will hold on to its SocketCache object and potentially have a
lot of outstanding sockets/fds held on to.

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

[
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443526#comment-13443526
]

Todd Lipcon commented on HDFS-1490:
---

- I dont like reusing the ipc ping interval for this timeout here. It's from an
entirely separate module, and I don't see why one should correlate to the
other. Why not introduce a new config which defaults to something like 1 minute?
- In the test case, shouldn't you somehow notify the servlet to exit? Currently
it waits on itself, but nothing notifies it.

TransferFSImage should timeout
--

[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

[
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443541#comment-13443541
]

Hadoop QA commented on HDFS-3849:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542806/HDFS-3849.003.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3112//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3112//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3112//console

This message is automatically generated.

When re-loading the FSImage, we should clear the existing genStamp and leases.
--

Key: HDFS-3849
URL: https://issues.apache.org/jira/browse/HDFS-3849
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch,
HDFS-3849.003.patch

When re-loading the FSImage, we should clear the existing genStamp and leases.
This is an issue in the 2NN, because it sometimes clears the existing FSImage
and reloads a new one in order to get back in sync with the NN.

[jira] [Created] (HDFS-3863) QJM: track last committed txid

Todd Lipcon created HDFS-3863:
-

 Summary: QJM: track last committed txid
 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Per some discussion with [~stepinto] 
[here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
 we should keep track of the last committed txid on each JournalNode. Then 
during any recovery operation, we can sanity-check that we aren't asked to 
truncate a log to an earlier transaction.

This is also a necessary step if we want to support reading from in-progress 
segments in the future (since we should only allow reads up to the commit point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443551#comment-13443551
 ] 

Robert Joseph Evans commented on HDFS-3731:
---

Do you have a list of ones you know about?  If not I can start pulling on that 
thread tomorrow.

 2.0 release upgrade must handle blocks being written from 1.0
 -

 Key: HDFS-3731
 URL: https://issues.apache.org/jira/browse/HDFS-3731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Colin Patrick McCabe
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch


 Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
 release. Problem reported by Brahma Reddy.
 The {{DataNode}} will only have one block pool after upgrading from a 1.x 
 release.  (This is because in the 1.x releases, there were no block pools-- 
 or equivalently, everything was in the same block pool).  During the upgrade, 
 we should hardlink the block files from the {{blocksBeingWritten}} directory 
 into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
 we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3863) QJM: track last committed txid


[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443556#comment-13443556
 ] 

Todd Lipcon commented on HDFS-3863:
---

The design here is pretty simple, given the way our journaling protocol works. 
In particular, we only have one outstanding batch of transactions at once. We 
never send a batch of transactions beginning at txid N until the prior batch 
(up through N-1) has been accepted at a quorum of nodes. Thus, any 
{{sendEdits()}} call with {{firstTxId}} N implies a {{commit(N-1)}}.

So, my plan is as follows:

- Introduce a new file inside the journal directory called {{committed-txid}}. 
This would include a single numeric text line, similar to the {{seen_txid}} 
that the NameNode maintains.
- Since this whole feature is not required for correctness, we don't need to 
fsync this file on every update. Instead, we can let the operating system write 
it out to disk whenever it so chooses. If, after a system crash, it reverts to 
an earlier value, this is OK, since our recovery protocol doesn't depend on it 
being up-to-date in any way. Put another way, the invariant is that the file 
contains a value which is a lower bound on the latest committed txn.

The data would be when any sendEdits() call is made -- the call implicitly 
commits all edits prior to the current batch.

This alone is enough for a good sanity check. If we want to also support 
reading the committed transactions while in-progress, it's not quite sufficient 
-- the last batch of transactions will never be readable if the NN stops 
writing new batches for a protracted period of time. To solve this, we can add 
a timer thread to the client which periodically (eg once or twice a second) 
sends an RPC to update the committed-txid on all of the nodes. The periodic 
timer will also have the nice property of causing a NN which has been fenced to 
abort itself even if no write transactions are taking place.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0


[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443577#comment-13443577
 ] 

Colin Patrick McCabe commented on HDFS-3731:


bq. Do you have a list of ones you know about? If not I can start pulling on 
that thread tomorrow.

Sorry, I just took a preliminary look, didn't have time to go in depth.

The state machine errors are pretty clear in the test.  You may need to wait a 
while for them to appear since surefire does a lot of buffering.

 2.0 release upgrade must handle blocks being written from 1.0
 -

 Key: HDFS-3731
 URL: https://issues.apache.org/jira/browse/HDFS-3731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Colin Patrick McCabe
Priority: Blocker
 Fix For: 2.2.0-alpha

 Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch


 Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
 release. Problem reported by Brahma Reddy.
 The {{DataNode}} will only have one block pool after upgrading from a 1.x 
 release.  (This is because in the 1.x releases, there were no block pools-- 
 or equivalently, everything was in the same block pool).  During the upgrade, 
 we should hardlink the block files from the {{blocksBeingWritten}} directory 
 into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
 we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

Aaron T. Myers created HDFS-3864:


 Summary: NN does not update internal file mtime for OP_CLOSE when 
reading from the edit log
 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN 
does not apply these values to the in-memory FS data structure. Because of 
this, a file's mtime or atime may appear to go back in time after an NN 
restart, or an HA failover.

Most of the time this will be harmless and folks won't notice, but in the event 
one of these files is being used in the distributed cache of an MR job when an 
HA failover occurs, the job might notice that the mtime of a cache file has 
changed, which in MR2 will cause the job to fail with an exception like the 
following:

{noformat}
java.io.IOException: Resource 
hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
 changed on src filesystem (expected 1342137814599, was 1342137814473
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}

Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access

2012-08-28 Thread Andy Isaacson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443584#comment-13443584
 ] 

Andy Isaacson commented on HDFS-3733:
-

OK, backing up -- I think my addition of CurClient just duplicates 
functionality already provided by NamenodeWebHdfsMethods#REMOTE_ADDRESS .  So I 
can drop that new ThreadLocal and just teach NameNodeRpcServer to use 
REMOTE_ADDRESS appropriately.

Or am I missing something?

bq. getRemoteIp should not just return NamenodeWebHdfsMethods#getRemoteAddress

(I assume you are referring to my newly added {{FSNamesystem#getRemoteIp}}.)

Agreed, FSNamesystem should support all remote methods: RPC, WebHdfs ... and 
Hftp?  The {{FSNamesystem#getRemoteIp}} should handle them all.

The helper {{NameNodeRpcServer#getRemoteIp}} implements the WebHdfs portion of 
{{FSNamesystem#getRemoteIp}} just as {{Server#getRemoteIp}} implements the RPC 
portion.

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Status: Patch Available  (was: Open)

 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Attachment: HDFS-3864.patch

Here's a patch which addresses the issue. Fortunately, the fix is quite simply 
- just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
:distributed cache object changed errors which caused this issue to be 
discovered.

 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3865) TestDistCp is @ignored

Colin Patrick McCabe created HDFS-3865:
--

 Summary: TestDistCp is @ignored
 Key: HDFS-3865
 URL: https://issues.apache.org/jira/browse/HDFS-3865
 Project: Hadoop HDFS
  Issue Type: Test
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor


We should fix TestDistCp so that it actually runs, rather than being ignored.

{code}
@ignore
public class TestDistCp {
  private static final Log LOG = LogFactory.getLog(TestDistCp.class);
  private static ListPath pathList = new ArrayListPath();
  ...
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.


 [ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3849:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Colin.

 When re-loading the FSImage, we should clear the existing genStamp and leases.
 --

 Key: HDFS-3849
 URL: https://issues.apache.org/jira/browse/HDFS-3849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
 HDFS-3849.003.patch


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 This is an issue in the 2NN, because it sometimes clears the existing FSImage 
 and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Attachment: HDFS-3864.patch

Thanks a lot for the quick review, Todd.

Here's an updated patch which lowers the sleep time to 10 milliseconds.

 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.


[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443598#comment-13443598
 ] 

Hudson commented on HDFS-3849:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2716 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2716/])
HDFS-3849. When re-loading the FSImage, we should clear the existing 
genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 --

 Key: HDFS-3849
 URL: https://issues.apache.org/jira/browse/HDFS-3849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
 HDFS-3849.003.patch


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 This is an issue in the 2NN, because it sometimes clears the existing FSImage 
 and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443586#comment-13443586
 ] 

Aaron T. Myers edited comment on HDFS-3864 at 8/29/12 9:21 AM:
---

Here's a patch which addresses the issue. Fortunately, the fix is quite simple 
- just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
distributed cache object changed errors which caused this issue to be 
discovered.

  was (Author: atm):
Here's a patch which addresses the issue. Fortunately, the fix is quite 
simply - just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
:distributed cache object changed errors which caused this issue to be 
discovered.
  
 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation

[
https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443610#comment-13443610
]

Aaron T. Myers commented on HDFS-2264:
--

Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly
a year!)

I just encountered this issue again in a user's cluster. My new thinking is
that we should just remove the expected client principal from the
NamenodeProtocol entirely. I think this makes sense the 2NN, SBN, BN, and
balancer all potentially use this interface, so there's no single client
principal that could reasonably be expected. The balancer, in particular,
should be able to be run from any node, even one not running a daemon at all.

I think to do what I propose here all we have to do is remove the
clientPrincipal parameter from the SecurityInfo annotation on the
NamenodeProtocol, and make sure that all of the methods exposed by this
interface definitely check for super user privileges. I think most of them do,
but we should ensure that they all do.

How does this sound to you?

NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo
annotation
---

Key: HDFS-2264
URL: https://issues.apache.org/jira/browse/HDFS-2264
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Harsh J
Fix For: 0.24.0

Attachments: HDFS-2264.r1.diff

The {{@KerberosInfo}} annotation specifies the expected server and client
principals for a given protocol in order to look up the correct principal
name from the config. The {{NamenodeProtocol}} has the wrong value for the
client config key. This wasn't noticed because most setups actually use the
same *value* for for both the NN and 2NN principals ({{hdfs/_HOST@REALM}}),
in which the {{_HOST}} part gets replaced at run-time. This bug therefore
only manifests itself on secure setups which explicitly specify the NN and
2NN principals.

[jira] [Comment Edited] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation

[
https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443610#comment-13443610
]

Aaron T. Myers edited comment on HDFS-2264 at 8/29/12 9:45 AM:
---

Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly
a year!)

I just encountered this issue again in a user's cluster. My new thinking is
that we should just remove the expected client principal from the
NamenodeProtocol entirely. I think this makes sense since the 2NN, SBN, BN, and
balancer all potentially use this interface, so there's no single client
principal that could reasonably be expected. The balancer, in particular,
should be able to be run from any node, even one not running a daemon at all.

How does this sound to you?

was (Author: atm):
Hey Jitendra, sorry for forgetting about this JIRA for so long (almost
exactly a year!)

How does this sound to you?

NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo
annotation
---

Attachments: HDFS-2264.r1.diff

[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.


[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443624#comment-13443624
 ] 

Hudson commented on HDFS-3849:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2682 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2682/])
HDFS-3849. When re-loading the FSImage, we should clear the existing 
genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 --

 Key: HDFS-3849
 URL: https://issues.apache.org/jira/browse/HDFS-3849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Critical
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
 HDFS-3849.003.patch


 When re-loading the FSImage, we should clear the existing genStamp and leases.
 This is an issue in the 2NN, because it sometimes clears the existing FSImage 
 and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-b1-2.patch

Here's a patch that incorporates Eli's feedback.

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-trunk.patch

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk-2.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: (was: hdfs-3466-trunk.patch)

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk-2.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: (was: hdfs-3466-trunk.patch)

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk-2.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-trunk-2.patch

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk-2.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

[
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443645#comment-13443645
]

Hadoop QA commented on HDFS-3466:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javac. The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3115//console

This message is automatically generated.

The SPNEGO filter for the NameNode should come out of the web keytab file
-

Key: HDFS-3466
URL: https://issues.apache.org/jira/browse/HDFS-3466
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch,
hdfs-3466-trunk-2.patch

Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find
the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to
do it.

[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

[
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443654#comment-13443654
]

Hadoop QA commented on HDFS-3466:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javac. The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3116//console

This message is automatically generated.

The SPNEGO filter for the NameNode should come out of the web keytab file
-

Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find
the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to
do it.

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

[
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443657#comment-13443657
]

Hadoop QA commented on HDFS-3864:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542840/HDFS-3864.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3113//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3113//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//console

This message is automatically generated.

NN does not update internal file mtime for OP_CLOSE when reading from the
edit log
--

Key: HDFS-3864
URL: https://issues.apache.org/jira/browse/HDFS-3864
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Attachments: HDFS-3864.patch, HDFS-3864.patch

When logging an OP_CLOSE to the edit log, the NN writes out an updated file
mtime and atime. However, when reading in an OP_CLOSE from the edit log, the
NN does not apply these values to the in-memory FS data structure. Because of
this, a file's mtime or atime may appear to go back in time after an NN
restart, or an HA failover.
Most of the time this will be harmless and folks won't notice, but in the
event one of these files is being used in the distributed cache of an MR job
when an HA failover occurs, the job might notice that the mtime of a cache
file has changed, which in MR2 will cause the job to fail with an exception
like the following:
{noformat}
java.io.IOException: Resource
hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
changed on src filesystem (expected 1342137814599, was 1342137814473
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
Credit to Sujay Rau for discovering this issue.

[jira] [Updated] (HDFS-3855) Replace hardcoded strings with the already defined config keys in DataNode.java

2012-08-28 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3855:
-

Description: Replace hardcoded strings with the already defined config keys 
in DataNode.java 

 Replace hardcoded strings with the already defined config keys in 
 DataNode.java 
 

 Key: HDFS-3855
 URL: https://issues.apache.org/jira/browse/HDFS-3855
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 1.2.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Trivial
 Attachments: HDFS-3855.branch-1.patch


 Replace hardcoded strings with the already defined config keys in 
 DataNode.java 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3135) Build a war file for HttpFS instead of packaging the server (tomcat) along with the application.

2012-08-28 Thread Ryan Hennig (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443710#comment-13443710
]

Ryan Hennig commented on HDFS-3135:
---

I'm troubleshooting a broken build that fails on the Tomcat download, because
our Jenkins server doesn't have internet access (by design). Rather, all
components are supposed to be fetched from our internal Maven Repository
(Artifactory). So while I don't need the war file change, I do think this
direct download should be removed.

Build a war file for HttpFS instead of packaging the server (tomcat) along
with the application.

Key: HDFS-3135
URL: https://issues.apache.org/jira/browse/HDFS-3135
Project: Hadoop HDFS
Issue Type: Improvement
Components: build
Affects Versions: 0.23.2
Reporter: Ravi Prakash
Labels: build

There are several reason why web applications should not be packaged along
with the server that is expected to serve them. For one not all organisations
use vanilla tomcat. There are other reasons I won't go into.
I'm filing this bug because some of our builds failed in trying to download
the tomcat.tar.gz file. We then had to manually wget the file and place it in
downloads/ to make the build pass. I suspect the download failed because of
an overloaded server (Frankly, I don't really know). If someone has ideas,
please share them.

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

[
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443712#comment-13443712
]

Hadoop QA commented on HDFS-3864:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12542846/HDFS-3864.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken
org.apache.hadoop.hdfs.web.TestWebHDFS
org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3114//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3114//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//console

This message is automatically generated.

NN does not update internal file mtime for OP_CLOSE when reading from the
edit log
--

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443715#comment-13443715
 ] 

Aaron T. Myers commented on HDFS-3864:
--

The findbugs warning is unrelated and I'm confident that the test failures are 
unrelated as well.

I'm going to commit this patch momentarily.

 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2. Thanks a lot for the review, 
Todd.

 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443737#comment-13443737
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2717 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2717/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443738#comment-13443738
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Common-trunk-Commit #2654 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2654/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log


[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443752#comment-13443752
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2683 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2683/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


 NN does not update internal file mtime for OP_CLOSE when reading from the 
 edit log
 --

 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 2.2.0-alpha

 Attachments: HDFS-3864.patch, HDFS-3864.patch


 When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
 mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
 NN does not apply these values to the in-memory FS data structure. Because of 
 this, a file's mtime or atime may appear to go back in time after an NN 
 restart, or an HA failover.
 Most of the time this will be harmless and folks won't notice, but in the 
 event one of these files is being used in the distributed cache of an MR job 
 when an HA failover occurs, the job might notice that the mtime of a cache 
 file has changed, which in MR2 will cause the job to fail with an exception 
 like the following:
 {noformat}
 java.io.IOException: Resource 
 hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
  changed on src filesystem (expected 1342137814599, was 1342137814473
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

2012-08-28 Thread Vinay (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443791#comment-13443791
]

Vinay commented on HDFS-1490:
-

{quote}Why not introduce a new config which defaults to something like 1
minute?{quote}
Ok, agree. Will introduce new config for this.
{quote}In the test case, shouldn't you somehow notify the servlet to exit?
Currently it waits on itself, but nothing notifies it.{quote}
That was just added make the client call get timeout. Ideally while stopping
the server, that will be interrupted. Anyway I will add a timeout for that also.

Thanks todd, for comments. I will post new patch in sometime.

TransferFSImage should timeout
--

[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file


[ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443810#comment-13443810
 ] 

Eli Collins commented on HDFS-3466:
---

Hey Owen, I think you meant to remove the 2nd initialization of httpKeytab.
 
{code}
+String httpKeytab = conf.get(
+  DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY);
+if (httpKeytab == null) {
+  httpKeytab = conf.get(DFSConfigKeys.DFS_NAMENODE_KEYTAB_FILE_KEY);
+}
 String httpKeytab = conf
   .get(DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY);
{code}

 The SPNEGO filter for the NameNode should come out of the web keytab file
 -

 Key: HDFS-3466
 URL: https://issues.apache.org/jira/browse/HDFS-3466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 1.1.0, 2.0.0-alpha
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
 hdfs-3466-trunk-2.patch


 Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
 the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
 do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3865) TestDistCp is @ignored


[ 
https://issues.apache.org/jira/browse/HDFS-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443812#comment-13443812
 ] 

Eli Collins commented on HDFS-3865:
---

Looks like some of the tests are commented out as well (eg 
testUniformSizeDistCp).


 TestDistCp is @ignored
 --

 Key: HDFS-3865
 URL: https://issues.apache.org/jira/browse/HDFS-3865
 Project: Hadoop HDFS
  Issue Type: Test
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 We should fix TestDistCp so that it actually runs, rather than being ignored.
 {code}
 @ignore
 public class TestDistCp {
   private static final Log LOG = LogFactory.getLog(TestDistCp.class);
   private static ListPath pathList = new ArrayListPath();
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-282) Serialize ipcPort in DatanodeID instead of DatanodeRegistration and DatanodeInfo


 [ 
https://issues.apache.org/jira/browse/HDFS-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-282.
--

Resolution: Not A Problem

No longer an issue now that the writable methods have been removed.

 Serialize ipcPort in DatanodeID instead of DatanodeRegistration and 
 DatanodeInfo
 

 Key: HDFS-282
 URL: https://issues.apache.org/jira/browse/HDFS-282
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE

 The field DatanodeID.ipcPort is currently serialized in DatanodeRegistration 
 and DatanodeInfo.  Once HADOOP-2797 (remove the codes for handling old layout 
 ) is committed, DatanodeID.ipcPort should be serialized in DatanodeID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443827#comment-13443827
 ] 

Eli Collins commented on HDFS-3837:
---

I investigated some more and confirmed findbugs isn't searching back far enough 
for the common subclass. Eg if I swap variables in the equals call I get:

{noformat}
org.apache.hadoop.hdfs.protocol.DatanodeInfo.equals(Object) used to determine 
equality
org.apache.hadoop.hdfs.server.common.JspHelper$NodeRecord.equals(Object) used 
to determine equality
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.equals(Object) 
used to determine equality
At DataNode.java:[line 1871]
{noformat}

It stops at DatanodeDescriptor#equals even though this calls super.equals 
(DatanodeInfo) which calls super.equals (DatanodeID). Just like the current 
warning stops at DatanodeRegistration#equals which calls super.equals 
(DatanodeID).

It would be better (and findbugs wouldn't choke) if the various classes that 
extend DatanodeID have a member instead. I looked at this for HDFS-3237 and it 
required a ton of changes that probably aren't worth it.

Given this I'll update the patch per your suggestion Surresh to ignore the 
warning in DataNode#recoverBlock.

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, 
 hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning