[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity fails intermittently

2014-06-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030330#comment-14030330
 ] 

Yongjun Zhang commented on HDFS-6518:
-

HI [~andrew.wang], thanks for looking into this issue. Since I found this 
problem while working on HDFS-6475, review of my patch to HDFS-6475 is very 
much appreciated too:-) Thanks.
 

> TestCacheDirectives#testExceedsCapacity fails intermittently
> 
>
> Key: HDFS-6518
> URL: https://issues.apache.org/jira/browse/HDFS-6518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Andrew Wang
> Attachments: HDFS-6518.001.patch
>
>
> Observed from 
> https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/
> Test 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
> fails intermittently
> {code}
> Failing for the past 1 build (Since Failed#7080 )
> Took 7.3 sec.
> Stacktrace
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437)
> {code}
> A second run with the same code is successful,
> https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/
> Running it locally is also successful.
>  HDFS-6257 mentioned about possible race, maybe the issue is still there.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Tags: dfsadmin
Target Version/s: 3.0.0, 2.5.0  (was: 3.0.0)
  Status: Patch Available  (was: Open)

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.1.patch

Attached initial version of the implementation.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4667) Capture renamed files/directories in snapshot diff report

2014-06-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4667:


Attachment: HDFS-4667.003.patch

Update the patch to add several new unit tests and fix a bug for capturing the 
correct source/target names of rename operations.

> Capture renamed files/directories in snapshot diff report
> -
>
> Key: HDFS-4667
> URL: https://issues.apache.org/jira/browse/HDFS-4667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Jing Zhao
>Assignee: Binglin Chang
> Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, 
> HDFS-4667.003.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, 
> getfullname-snapshot-support.patch
>
>
> Currently in the diff report we only show file/dir creation, deletion and 
> modification. After rename with snapshots is supported, renamed file/dir 
> should also be captured in the diff report.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6525) FsShell supports HDFS TTL

2014-06-13 Thread Zesheng Wu (JIRA)
Zesheng Wu created HDFS-6525:


 Summary: FsShell supports HDFS TTL
 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


This issue is used to track development of supporting  HDFS TTL for FsShell, 
for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6526) Implement HDFS TtlManager

2014-06-13 Thread Zesheng Wu (JIRA)
Zesheng Wu created HDFS-6526:


 Summary: Implement HDFS TtlManager
 Key: HDFS-6526
 URL: https://issues.apache.org/jira/browse/HDFS-6526
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


This issue is used to track development of HDFS TtlManager, for details see 
HDFS -6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-13 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030366#comment-14030366
 ] 

Zesheng Wu commented on HDFS-6382:
--

I filed two sub-tasks to track the development of this feature.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager

2014-06-13 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Description: This issue is used to track development of HDFS TtlManager, 
for details see HDFS-6382.  (was: This issue is used to track development of 
HDFS TtlManager, for details see HDFS -6382.)

> Implement HDFS TtlManager
> -
>
> Key: HDFS-6526
> URL: https://issues.apache.org/jira/browse/HDFS-6526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> This issue is used to track development of HDFS TtlManager, for details see 
> HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6524) chooseDataNode decides retry times considering with block replica number

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030374#comment-14030374
 ] 

Hadoop QA commented on HDFS-6524:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650227/HDFS-6524.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7106//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7106//console

This message is automatically generated.

> chooseDataNode decides retry times considering with block replica number
> 
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity fails intermittently

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030412#comment-14030412
 ] 

Hadoop QA commented on HDFS-6518:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650232/HDFS-6518.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7107//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7107//console

This message is automatically generated.

> TestCacheDirectives#testExceedsCapacity fails intermittently
> 
>
> Key: HDFS-6518
> URL: https://issues.apache.org/jira/browse/HDFS-6518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Andrew Wang
> Attachments: HDFS-6518.001.patch
>
>
> Observed from 
> https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/
> Test 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
> fails intermittently
> {code}
> Failing for the past 1 build (Since Failed#7080 )
> Took 7.3 sec.
> Stacktrace
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437)
> {code}
> A second run with the same code is successful,
> https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/
> Running it locally is also successful.
>  HDFS-6257 mentioned about possible race, maybe the issue is still there.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6524) chooseDataNode decides retry times considering with block replica number

2014-06-13 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6524:


Attachment: HDFS-6524.txt

> chooseDataNode decides retry times considering with block replica number
> 
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6524) chooseDataNode decides retry times considering with block replica number

2014-06-13 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6524:


Attachment: (was: HDFS-6524.txt)

> chooseDataNode decides retry times considering with block replica number
> 
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030481#comment-14030481
 ] 

Hadoop QA commented on HDFS-6507:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650246/HDFS-6507.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
  org.apache.hadoop.tools.TestTools
  org.apache.hadoop.hdfs.TestSnapshotCommands
  org.apache.hadoop.cli.TestHDFSCLI
  
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7108//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7108//console

This message is automatically generated.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030507#comment-14030507
 ] 

Zesheng Wu commented on HDFS-6507:
--

Mmm, seems that some test failed, I will figure it out soon.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030511#comment-14030511
 ] 

Hadoop QA commented on HDFS-4667:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650248/HDFS-4667.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7109//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7109//console

This message is automatically generated.

> Capture renamed files/directories in snapshot diff report
> -
>
> Key: HDFS-4667
> URL: https://issues.apache.org/jira/browse/HDFS-4667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Jing Zhao
>Assignee: Binglin Chang
> Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, 
> HDFS-4667.003.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, 
> getfullname-snapshot-support.patch
>
>
> Currently in the diff report we only show file/dir creation, deletion and 
> modification. After rename with snapshots is supported, renamed file/dir 
> should also be captured in the diff report.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6395) Skip checking xattr limits for non-user-visible namespaces

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030523#comment-14030523
 ] 

Hudson commented on HDFS-6395:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/582/])
HDFS-6395. Skip checking xattr limits for non-user-visible namespaces. 
Contributed by Yi Liu. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602288)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java


> Skip checking xattr limits for non-user-visible namespaces
> --
>
> Key: HDFS-6395
> URL: https://issues.apache.org/jira/browse/HDFS-6395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Yi Liu
> Fix For: 2.5.0
>
> Attachments: HDFS-6395.1.patch, HDFS-6395.patch
>
>
> It'd be nice to print messages during fsimage and editlog loading if we hit 
> either the # of xattrs per inode or the xattr size limits.
> We should also consider making the # of xattrs limit only apply to the user 
> namespace, or to each namespace separately, to prevent users from locking out 
> access to other namespaces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3493) Invalidate excess corrupted blocks as long as minimum replication is satisfied

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030529#comment-14030529
 ] 

Hudson commented on HDFS-3493:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/582/])
Fixup CHANGES.txt message for HDFS-3493 (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602292)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-3493. Invalidate corrupted blocks as long as minimum replication is 
satisfied. Contributed by Juan Yu and Vinayakumar B. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602291)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java


> Invalidate excess corrupted blocks as long as minimum replication is satisfied
> --
>
> Key: HDFS-3493
> URL: https://issues.apache.org/jira/browse/HDFS-3493
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Juan Yu
> Fix For: 2.5.0
>
> Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, 
> HDFS-3493.004.patch, HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> 
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 1 (33.32 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:  0
>  Missing replicas:1 (14.285714 %)
>  Number of data-nodes:3
>  Number of racks: 1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6524) chooseDataNode decides retry times considering with block replica number

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030587#comment-14030587
 ] 

Hadoop QA commented on HDFS-6524:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650259/HDFS-6524.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7110//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7110//console

This message is automatically generated.

> chooseDataNode decides retry times considering with block replica number
> 
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-13 Thread pascal oliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pascal oliva updated HDFS-4629:
---

Attachment: HDFS-4629-1.patch

Here the new patch from the trunk:
i used openJDK 1.7 : java version "1.7.0_55"

patch created with :
git diff --no-prefix trunk > ../hadoop-patches/HDFS-4629-1.patch





> Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
> JVM vendor specific. Breaks IBM JAVA
> -
>
> Key: HDFS-4629
> URL: https://issues.apache.org/jira/browse/HDFS-4629
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.3-alpha
> Environment: OS:fedora and RHEL (64 bit)
> Platform: x86, POWER, and SystemZ
> JVM Vendor = IBM
>Reporter: Amir Sanjar
> Attachments: HDFS-4629-1.patch, HDFS-4629.patch
>
>
> Porting to a non-JVM vendor solution by replacing:
> import com.sun.org.apache.xml.internal.serialize.OutputFormat;
> import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
> with 
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-6527:


 Summary: Edit log corruption due to defered INode removal
 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker


We have seen a SBN crashing with the following error:
{panel}
\[Edit log tailer\] ERROR namenode.FSEditLogLoader:
Encountered exception on operation AddBlockOp
[path=/xxx,
penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
RpcCallId=-2]
java.io.FileNotFoundException: File does not exist: /xxx
{panel}

This was caused by the deferred removal of deleted inodes from the inode map. 
Since startFile() acquires FSN read lock and then write lock, a deletion can 
happen in between. Because of deferred inode removal outside FSN write lock, 
startFile() can get the deleted inode from the inode map with FSN write lock 
held. This allow addition of a block to a deleted file.

As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
 OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
crashes.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-13 Thread pascal oliva (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030633#comment-14030633
 ] 

pascal oliva commented on HDFS-4629:


i added into the pach HDFS-4626-1 an update for 
hadoop-hdfs-project/hadoop-hdfs/pom.xml
to add a depency to get the package org.apache.xml.serialize

+   xerces
+   xercesImpl
+   2.9.0
+   



> Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
> JVM vendor specific. Breaks IBM JAVA
> -
>
> Key: HDFS-4629
> URL: https://issues.apache.org/jira/browse/HDFS-4629
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.3-alpha
> Environment: OS:fedora and RHEL (64 bit)
> Platform: x86, POWER, and SystemZ
> JVM Vendor = IBM
>Reporter: Amir Sanjar
> Attachments: HDFS-4629-1.patch, HDFS-4629.patch
>
>
> Porting to a non-JVM vendor solution by replacing:
> import com.sun.org.apache.xml.internal.serialize.OutputFormat;
> import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
> with 
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Status: Patch Available  (was: Open)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-6527:


Assignee: Kihwal Lee

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Attachment: HDFS-6527.trunk.patch
HDFS-6527.branch-2.4.patch

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Affects Version/s: 2.3.0

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-13 Thread pascal oliva (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030640#comment-14030640
 ] 

pascal oliva commented on HDFS-4629:


update from previous comment :
i added into the pach HDFS-4629-1 an update for 
hadoop-hdfs-project/hadoop-hdfs/pom.xml
 to add a depency to get the package org.apache.xml.serialize

+ xerces
 + xercesImpl
 + 2.9.0
 + 


> Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
> JVM vendor specific. Breaks IBM JAVA
> -
>
> Key: HDFS-4629
> URL: https://issues.apache.org/jira/browse/HDFS-4629
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.3-alpha
> Environment: OS:fedora and RHEL (64 bit)
> Platform: x86, POWER, and SystemZ
> JVM Vendor = IBM
>Reporter: Amir Sanjar
> Attachments: HDFS-4629-1.patch, HDFS-4629.patch
>
>
> Porting to a non-JVM vendor solution by replacing:
> import com.sun.org.apache.xml.internal.serialize.OutputFormat;
> import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
> with 
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Affects Version/s: (was: 2.3.0)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6395) Skip checking xattr limits for non-user-visible namespaces

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030650#comment-14030650
 ] 

Hudson commented on HDFS-6395:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/])
HDFS-6395. Skip checking xattr limits for non-user-visible namespaces. 
Contributed by Yi Liu. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602288)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java


> Skip checking xattr limits for non-user-visible namespaces
> --
>
> Key: HDFS-6395
> URL: https://issues.apache.org/jira/browse/HDFS-6395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Yi Liu
> Fix For: 2.5.0
>
> Attachments: HDFS-6395.1.patch, HDFS-6395.patch
>
>
> It'd be nice to print messages during fsimage and editlog loading if we hit 
> either the # of xattrs per inode or the xattr size limits.
> We should also consider making the # of xattrs limit only apply to the user 
> namespace, or to each namespace separately, to prevent users from locking out 
> access to other namespaces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3493) Invalidate excess corrupted blocks as long as minimum replication is satisfied

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030656#comment-14030656
 ] 

Hudson commented on HDFS-3493:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/])
Fixup CHANGES.txt message for HDFS-3493 (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602292)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-3493. Invalidate corrupted blocks as long as minimum replication is 
satisfied. Contributed by Juan Yu and Vinayakumar B. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602291)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java


> Invalidate excess corrupted blocks as long as minimum replication is satisfied
> --
>
> Key: HDFS-3493
> URL: https://issues.apache.org/jira/browse/HDFS-3493
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Juan Yu
> Fix For: 2.5.0
>
> Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, 
> HDFS-3493.004.patch, HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> 
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 1 (33.32 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:  0
>  Missing replicas:1 (14.285714 %)
>  Number of data-nodes:3
>  Number of racks: 1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030657#comment-14030657
 ] 

Kihwal Lee commented on HDFS-6527:
--

This bug has been there since 2.1.0-beta. It involves two client threads 
creating and deleting the same file.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since startFile() acquires FSN read lock and then write lock, a deletion can 
> happen in between. Because of deferred inode removal outside FSN write lock, 
> startFile() can get the deleted inode from the inode map with FSN write lock 
> held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030701#comment-14030701
 ] 

Daryn Sharp commented on HDFS-6527:
---

Is it possible to avoid having fsdir up-call to the fsn, apparently to clear 
leases, but rather fsdir removes the inodes from the map and fsn clears the 
leases?

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Description: 
We have seen a SBN crashing with the following error:
{panel}
\[Edit log tailer\] ERROR namenode.FSEditLogLoader:
Encountered exception on operation AddBlockOp
[path=/xxx,
penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
RpcCallId=-2]
java.io.FileNotFoundException: File does not exist: /xxx
{panel}

This was caused by the deferred removal of deleted inodes from the inode map. 
Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
deletion can happen in between. Because of deferred inode removal outside FSN 
write lock, getAdditionalBlock() can get the deleted inode from the inode map 
with FSN write lock held. This allow addition of a block to a deleted file.

As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
 OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
crashes.


  was:
We have seen a SBN crashing with the following error:
{panel}
\[Edit log tailer\] ERROR namenode.FSEditLogLoader:
Encountered exception on operation AddBlockOp
[path=/xxx,
penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
RpcCallId=-2]
java.io.FileNotFoundException: File does not exist: /xxx
{panel}

This was caused by the deferred removal of deleted inodes from the inode map. 
Since startFile() acquires FSN read lock and then write lock, a deletion can 
happen in between. Because of deferred inode removal outside FSN write lock, 
startFile() can get the deleted inode from the inode map with FSN write lock 
held. This allow addition of a block to a deleted file.

As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
 OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
crashes.



> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030702#comment-14030702
 ] 

Steve Loughran commented on HDFS-4629:
--

there wasn't a dependency on xerces already?

> Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
> JVM vendor specific. Breaks IBM JAVA
> -
>
> Key: HDFS-4629
> URL: https://issues.apache.org/jira/browse/HDFS-4629
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.3-alpha
> Environment: OS:fedora and RHEL (64 bit)
> Platform: x86, POWER, and SystemZ
> JVM Vendor = IBM
>Reporter: Amir Sanjar
> Attachments: HDFS-4629-1.patch, HDFS-4629.patch
>
>
> Porting to a non-JVM vendor solution by replacing:
> import com.sun.org.apache.xml.internal.serialize.OutputFormat;
> import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
> with 
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030707#comment-14030707
 ] 

Kihwal Lee commented on HDFS-6527:
--

Instead of moving inode removal inside lock, we could have getAdditionalBlock() 
to do additional check after acquiring the FSN write lock.  One possibility is 
to have it acquire FSDirectory read lock and check the parent of the inode. If 
deleted, it should be null.  Or we could make checkLease() do more than just 
checking the clientName recorded in the inode.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Status: Open  (was: Patch Available)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6395) Skip checking xattr limits for non-user-visible namespaces

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030720#comment-14030720
 ] 

Hudson commented on HDFS-6395:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/])
HDFS-6395. Skip checking xattr limits for non-user-visible namespaces. 
Contributed by Yi Liu. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602288)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java


> Skip checking xattr limits for non-user-visible namespaces
> --
>
> Key: HDFS-6395
> URL: https://issues.apache.org/jira/browse/HDFS-6395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Yi Liu
> Fix For: 2.5.0
>
> Attachments: HDFS-6395.1.patch, HDFS-6395.patch
>
>
> It'd be nice to print messages during fsimage and editlog loading if we hit 
> either the # of xattrs per inode or the xattr size limits.
> We should also consider making the # of xattrs limit only apply to the user 
> namespace, or to each namespace separately, to prevent users from locking out 
> access to other namespaces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3493) Invalidate excess corrupted blocks as long as minimum replication is satisfied

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030726#comment-14030726
 ] 

Hudson commented on HDFS-3493:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/])
Fixup CHANGES.txt message for HDFS-3493 (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602292)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-3493. Invalidate corrupted blocks as long as minimum replication is 
satisfied. Contributed by Juan Yu and Vinayakumar B. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602291)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java


> Invalidate excess corrupted blocks as long as minimum replication is satisfied
> --
>
> Key: HDFS-3493
> URL: https://issues.apache.org/jira/browse/HDFS-3493
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Juan Yu
> Fix For: 2.5.0
>
> Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, 
> HDFS-3493.004.patch, HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> 
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 1 (33.32 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:  0
>  Missing replicas:1 (14.285714 %)
>  Number of data-nodes:3
>  Number of racks: 1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.2.patch

Fix breaked tests.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4629) Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is JVM vendor specific. Breaks IBM JAVA

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030815#comment-14030815
 ] 

Hadoop QA commented on HDFS-4629:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650308/HDFS-4629-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7111//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7111//console

This message is automatically generated.

> Using com.sun.org.apache.xml.internal.serialize.* in XmlEditsVisitor.java is 
> JVM vendor specific. Breaks IBM JAVA
> -
>
> Key: HDFS-4629
> URL: https://issues.apache.org/jira/browse/HDFS-4629
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.3-alpha
> Environment: OS:fedora and RHEL (64 bit)
> Platform: x86, POWER, and SystemZ
> JVM Vendor = IBM
>Reporter: Amir Sanjar
> Attachments: HDFS-4629-1.patch, HDFS-4629.patch
>
>
> Porting to a non-JVM vendor solution by replacing:
> import com.sun.org.apache.xml.internal.serialize.OutputFormat;
> import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
> with 
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.XMLSerializer;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)
Stephen Chu created HDFS-6528:
-

 Summary: Add XAttrs to TestOfflineImageViewer
 Key: HDFS-6528
 URL: https://issues.apache.org/jira/browse/HDFS-6528
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor


We should test that the OfflineImageViewer can run successfully against an 
fsimage with the new XAttr ops.

In this patch, we set and remove XAttrs when preparing the fsimage in 
TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-6528:
--

Attachment: HDFS-6528.001.patch

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-6528:
--

Status: Patch Available  (was: Open)

Small patch to make sure the fsimage tested in TestOfflineImageViewer contains 
XAttr ops.

Ran this test locally successfully for multiple iterations.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Status: Patch Available  (was: Open)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030899#comment-14030899
 ] 

Andrew Wang commented on HDFS-6528:
---

Hey Stephen, correct me if I'm wrong, but doesn't deleting the file at the end 
wipe out the xattrs? As is, I think this would exercise the edit log, but not 
the fsimage.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Attachment: HDFS-6527.v2.patch

The new patch simply checks the parent of inode against null.  This is done in 
checkLease(), which is called by getAdditionalBlock() after acquring the FSN 
writelock.

Also added is a new test case that reproduces the race between delete() and 
getAdditionalBlock(). Without the change in checkLease(), the test case fails. 
Its failure means getAdditionalBlock() was successful even after delete(). This 
causes the problematic edit log sequence.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Status: Open  (was: Patch Available)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Status: Patch Available  (was: Open)

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky

2014-06-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6470:
--

Assignee: Ming Ma

> TestBPOfferService.testBPInitErrorHandling is flaky
> ---
>
> Key: HDFS-6470
> URL: https://issues.apache.org/jira/browse/HDFS-6470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Ming Ma
> Attachments: HDFS-6470.patch
>
>
> Saw some test flakage in a test-patch run, stacktrace:
> {code}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem

2014-06-13 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030924#comment-14030924
 ] 

Haohui Mai commented on HDFS-6480:
--

The v1 patch is rebased on to the current trunk.

> Move waitForReady() from FSDirectory to FSNamesystem
> 
>
> Key: HDFS-6480
> URL: https://issues.apache.org/jira/browse/HDFS-6480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch
>
>
> Currently FSDirectory implements a barrier in {{waitForReady()}} / 
> {{setReady()}} so that it only serve requests once the FSImage is fully 
> loaded.
> As a part of the effort to evolve {{FSDirectory}} to a class which focuses on 
> implementing the data structure of the namespace, this jira proposes to move 
> the barrier one level higher to {{FSNamesystem}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem

2014-06-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6480:
-

Status: Patch Available  (was: Open)

> Move waitForReady() from FSDirectory to FSNamesystem
> 
>
> Key: HDFS-6480
> URL: https://issues.apache.org/jira/browse/HDFS-6480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch
>
>
> Currently FSDirectory implements a barrier in {{waitForReady()}} / 
> {{setReady()}} so that it only serve requests once the FSImage is fully 
> loaded.
> As a part of the effort to evolve {{FSDirectory}} to a class which focuses on 
> implementing the data structure of the namespace, this jira proposes to move 
> the barrier one level higher to {{FSNamesystem}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem

2014-06-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6480:
-

Attachment: HDFS-6480.001.patch

> Move waitForReady() from FSDirectory to FSNamesystem
> 
>
> Key: HDFS-6480
> URL: https://issues.apache.org/jira/browse/HDFS-6480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch
>
>
> Currently FSDirectory implements a barrier in {{waitForReady()}} / 
> {{setReady()}} so that it only serve requests once the FSImage is fully 
> loaded.
> As a part of the effort to evolve {{FSDirectory}} to a class which focuses on 
> implementing the data structure of the namespace, this jira proposes to move 
> the barrier one level higher to {{FSNamesystem}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6330) Move mkdirs() to FSNamesystem

2014-06-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6330:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
contribution.

> Move mkdirs() to FSNamesystem
> -
>
> Key: HDFS-6330
> URL: https://issues.apache.org/jira/browse/HDFS-6330
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HDFS-6330.000.patch, HDFS-6330.001.patch, 
> HDFS-6330.002.patch, HDFS-6330.003.patch
>
>
> Currently mkdir() automatically creates all ancestors for a directory. This 
> is implemented in FSDirectory, by calling unprotectedMkdir() along the path. 
> This jira proposes to move the function to FSNamesystem to simplify the 
> primitive that FSDirectory needs to provide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-6528:
--

Attachment: HDFS-6528.002.patch

Thanks, Andrew. You're right.

Uploaded a new patch that doesn't remove the xattr directory, so after saving 
the namespace xattrs should be in the fsimage.

Adjusted the number of total dirs checked in testFileDistributionCalculator and 
testWebImageViewer.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030944#comment-14030944
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks for the fix, [~kihwal]! 

bq. The new patch simply checks the parent of inode against null
Currently if the inode is in a snapshot then its parent will not be set to null 
after deletion. In that case can we run into a scenario where a block is added 
to a deleted file that is in our read-only snapshot? Maybe we also need to 
check FileWithSnapshotFeature#isCurrentFileDeleted?

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-6528:
--

Attachment: HDFS-6528.003.patch

Sorry, one last thing. Take out removal of an XAttr, as that step is relevant 
to the edit log but not relevant to the fsimage.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
> HDFS-6528.003.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6330) Move mkdirs() to FSNamesystem

2014-06-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030925#comment-14030925
 ] 

Jing Zhao edited comment on HDFS-6330 at 6/13/14 6:17 PM:
--

I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
review.


was (Author: wheat9):
I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
contribution.

> Move mkdirs() to FSNamesystem
> -
>
> Key: HDFS-6330
> URL: https://issues.apache.org/jira/browse/HDFS-6330
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HDFS-6330.000.patch, HDFS-6330.001.patch, 
> HDFS-6330.002.patch, HDFS-6330.003.patch
>
>
> Currently mkdir() automatically creates all ancestors for a directory. This 
> is implemented in FSDirectory, by calling unprotectedMkdir() along the path. 
> This jira proposes to move the function to FSNamesystem to simplify the 
> primitive that FSDirectory needs to provide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky

2014-06-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030960#comment-14030960
 ] 

Andrew Wang commented on HDFS-6470:
---

+1 makes sense to me, thanks Ming. Will commit shortly.

> TestBPOfferService.testBPInitErrorHandling is flaky
> ---
>
> Key: HDFS-6470
> URL: https://issues.apache.org/jira/browse/HDFS-6470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Ming Ma
> Attachments: HDFS-6470.patch
>
>
> Saw some test flakage in a test-patch run, stacktrace:
> {code}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030978#comment-14030978
 ] 

Hadoop QA commented on HDFS-6507:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650323/HDFS-6507.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7113//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7113//console

This message is automatically generated.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6330) Move mkdirs() to FSNamesystem

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030988#comment-14030988
 ] 

Hudson commented on HDFS-6330:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5702 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5702/])
HDFS-6330. Move mkdirs() to FSNamesystem. Contributed by Haohui Mai. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602484)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java


> Move mkdirs() to FSNamesystem
> -
>
> Key: HDFS-6330
> URL: https://issues.apache.org/jira/browse/HDFS-6330
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HDFS-6330.000.patch, HDFS-6330.001.patch, 
> HDFS-6330.002.patch, HDFS-6330.003.patch
>
>
> Currently mkdir() automatically creates all ancestors for a directory. This 
> is implemented in FSDirectory, by calling unprotectedMkdir() along the path. 
> This jira proposes to move the function to FSNamesystem to simplify the 
> primitive that FSDirectory needs to provide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030994#comment-14030994
 ] 

Hadoop QA commented on HDFS-6480:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650353/HDFS-6480.001.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7116//console

This message is automatically generated.

> Move waitForReady() from FSDirectory to FSNamesystem
> 
>
> Key: HDFS-6480
> URL: https://issues.apache.org/jira/browse/HDFS-6480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch
>
>
> Currently FSDirectory implements a barrier in {{waitForReady()}} / 
> {{setReady()}} so that it only serve requests once the FSImage is fully 
> loaded.
> As a part of the effort to evolve {{FSDirectory}} to a class which focuses on 
> implementing the data structure of the namespace, this jira proposes to move 
> the barrier one level higher to {{FSNamesystem}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky

2014-06-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6470:
--

   Resolution: Fixed
Fix Version/s: 2.5.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks Ming!

> TestBPOfferService.testBPInitErrorHandling is flaky
> ---
>
> Key: HDFS-6470
> URL: https://issues.apache.org/jira/browse/HDFS-6470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Ming Ma
> Fix For: 2.5.0
>
> Attachments: HDFS-6470.patch
>
>
> Saw some test flakage in a test-patch run, stacktrace:
> {code}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031018#comment-14031018
 ] 

Steve Loughran commented on HDFS-6382:
--

My comments

# this can be done as an MR job. 
# If you are worried about excessive load, start exactly one mapper, and 
consider throttling requests. As some object stores throttle heavy load & 
reject on a very high DELETE rate, throttling is going to be needed for 
anything that works against them.
# you can then use OOzie as the scheduler.
# MR restart handles failures: you just re-enum the directories and deleted 
files don't show up.
# If you really, really can't do it as MR, write it as a one-node YARN app, for 
which I'd recommend apache twill as the starting point. In fact, this project 
would make for a nice example.

Don't rush to write a new service here for an intermittent job. that just adds 
a new cost "A service to install and monitor". Especially when you consider 
that this new service will need
# a launcher entry point
# tests
# commitment from the HDFS team to maintain it

{quote}
We can implement TTL within a MapReduce job that is similar with DistCp. We 
could run this MapReduce job over and over again or nightly or weekly to delete 
the expired files and directories.
{quote}

Yes, and schedule with oozie
{quote}
 (1) Advantages:
The major advantage of the MapReduce framework is concurrency control, if we 
want to run multiple tasks concurrently, choose a MapReduce approach will ease 
of concurrency control.
{quote}

There are other advantages
# The MR job will be simple to write and can be submitted remotely. 
# it's trivial to test and therefore maintain. 
# no need to wait for a new version of Hadoop. You can evolve it locally.
# different users, submitting jobs with different kerberos tickets can work on 
their own files securely.
# there's no need to install and maintain a new service.

{quote}
(2) Disadvantages:
For implementing the TTL functionality, one task is enough, multiple tasks will 
give too much race and load to the NameNode. 
{quote}

# Demonstrate this by writing an MR job and assessing its load when you have a 
throttled executor.
{quote}

On another hand, use a MapReduce job will introduce additional dependencies and 
have additional overheads.
{quote}

# additional dependencies? In a cluster with MapReduce installed? The only 
additional dependency is the JAR with the mapper and the reducer.
# What "additional overheads"? Are they really any less than running another 
service in your cluster, with its own classpath, failure modes, security needs?
 
My recommendation, before writing a single line of a new service, is to write 
it as an MR job. You will find it easy to write and maintain; server load is 
handled by making sleep time a configurable parameter. 

If you can then actually demonstrate that this is inadequate on a large 
cluster, then consider a service. But start with MapReduce first. If you 
haven't written an MR job before, don't worry -it doesn't take that long to 
learn, and having done it you'll understand your user's workflow better.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031077#comment-14031077
 ] 

Hadoop QA commented on HDFS-6528:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650334/HDFS-6528.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.qjournal.TestNNWithQJM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7114//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7114//console

This message is automatically generated.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
> HDFS-6528.003.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031086#comment-14031086
 ] 

Hudson commented on HDFS-6470:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5703 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5703/])
HDFS-6470. TestBPOfferService.testBPInitErrorHandling is flaky. Contributed by 
Ming Ma. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602490)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java


> TestBPOfferService.testBPInitErrorHandling is flaky
> ---
>
> Key: HDFS-6470
> URL: https://issues.apache.org/jira/browse/HDFS-6470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Ming Ma
> Fix For: 2.5.0
>
> Attachments: HDFS-6470.patch
>
>
> Saw some test flakage in a test-patch run, stacktrace:
> {code}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031114#comment-14031114
 ] 

Kihwal Lee commented on HDFS-6527:
--

Thanks for the comment, [~jingzhao]. We can null out the client name while 
deleting files. Then lease check is guaranteed to fail.

In {{INodeFile#destroyAndCollectBlocks()}}, we can delete the client name.
{code}
 if (sf != null) {
   sf.clearDiffs();
 }
+
+// Delete client name if under construction. This destroys a half of
+// the lease. The other half will be removed later from LeaseManager.
+FileUnderConstructionFeature uc = getFileUnderConstructionFeature();
+if (uc != null) {
+  uc.setClientName(null);
+}
   }
{code}

And in {{FSNamesystem#checkLease()}}, we can have the following check instead 
of the parent == null check.
{code}
 String clientName = file.getFileUnderConstructionFeature().getClientName();
+if (clientName == null) {
+  // clientName is removed when the file is deleted.
+  throw new FileNotFoundException(src);
+}
{code}

This will make lease checks to fail once the "real" file is deleted, whether it 
is in a snapshot or not.  Do you think it is reasonable?

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created HDFS-6529:
---

 Summary: Debug logging for RemoteBlockReader2 to identify remote 
datanode and file being read
 Key: HDFS-6529
 URL: https://issues.apache.org/jira/browse/HDFS-6529
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Anubhav Dhoot
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031142#comment-14031142
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650347/HDFS-6527.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1260 javac 
compiler warnings (more than the trunk's current 1259 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7115//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7115//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//console

This message is automatically generated.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated HDFS-6529:


Description: 
The scenario is some download is stuck and we dont know which data node is the 
client waiting and for which path.
By turning this file trace logging we will get a running trace of which 
requests have started and which have completed.
If trace is not enabled, there is no overhead other than checking if trace is 
enabled

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Anubhav Dhoot
>Priority: Minor
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6527:
-

Attachment: HDFS-6527.v3.patch

The new v3 patch implements what I suggested above. It nulls out the client 
name field.  Any further client actions against the file will be rejected.  
Also fixed the javac warning caused by the use of the deprecated delete() 
method in the new test case.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch, HDFS-6527.v3.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated HDFS-6529:


Status: Patch Available  (was: Open)

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated HDFS-6529:


Attachment: HDFS-6529.patch

Verified it compiles fine and ran some sample jobs that show logging happening 
when turned on.

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-6529:


Assignee: Anubhav Dhoot

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6529:
-

 Target Version/s: 2.5.0
Affects Version/s: 2.4.0

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031199#comment-14031199
 ] 

Hadoop QA commented on HDFS-6528:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650355/HDFS-6528.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7117//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7117//console

This message is automatically generated.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
> HDFS-6528.003.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer

2014-06-13 Thread Stephen Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031237#comment-14031237
 ] 

Stephen Chu commented on HDFS-6528:
---

The first test run was on the first rev of the patch. We only modify 
TestOfflineImageViewer, so the TestNNWithQJM failure is unrelated. The next 
test run ran on the latest rev successfully.

> Add XAttrs to TestOfflineImageViewer
> 
>
> Key: HDFS-6528
> URL: https://issues.apache.org/jira/browse/HDFS-6528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, 
> HDFS-6528.003.patch
>
>
> We should test that the OfflineImageViewer can run successfully against an 
> fsimage with the new XAttr ops.
> In this patch, we set and remove XAttrs when preparing the fsimage in 
> TestOfflineImageViewer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6530) Fix Balancer documentation

2014-06-13 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6530:
--

Attachment: h6530_20140613.patch

h6530_20140613.patch: 1st patch.

> Fix Balancer documentation
> --
>
> Key: HDFS-6530
> URL: https://issues.apache.org/jira/browse/HDFS-6530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h6530_20140613.patch
>
>
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> - Typo:  Policy "node" should be "datanode"
> - Typo: "Balander" should be "Balancer".
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Rebalancer
> - Change the name "rebalancer" to "balancer".
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/CommandsManual.html#balancer
> - Add -policy argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6530) Fix Balancer documentation

2014-06-13 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-6530:
-

 Summary: Fix Balancer documentation
 Key: HDFS-6530
 URL: https://issues.apache.org/jira/browse/HDFS-6530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6530_20140613.patch

In 
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
- Typo:  Policy "node" should be "datanode"
- Typo: "Balander" should be "Balancer".

In 
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Rebalancer
- Change the name "rebalancer" to "balancer".

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/CommandsManual.html#balancer
- Add -policy argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6530) Fix Balancer documentation

2014-06-13 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031280#comment-14031280
 ] 

Arpit Agarwal commented on HDFS-6530:
-

+1 for the patch.

> Fix Balancer documentation
> --
>
> Key: HDFS-6530
> URL: https://issues.apache.org/jira/browse/HDFS-6530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h6530_20140613.patch
>
>
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> - Typo:  Policy "node" should be "datanode"
> - Typo: "Balander" should be "Balancer".
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Rebalancer
> - Change the name "rebalancer" to "balancer".
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/CommandsManual.html#balancer
> - Add -policy argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031284#comment-14031284
 ] 

Hadoop QA commented on HDFS-6480:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650353/HDFS-6480.001.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7120//console

This message is automatically generated.

> Move waitForReady() from FSDirectory to FSNamesystem
> 
>
> Key: HDFS-6480
> URL: https://issues.apache.org/jira/browse/HDFS-6480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch
>
>
> Currently FSDirectory implements a barrier in {{waitForReady()}} / 
> {{setReady()}} so that it only serve requests once the FSImage is fully 
> loaded.
> As a part of the effort to evolve {{FSDirectory}} to a class which focuses on 
> implementing the data structure of the namespace, this jira proposes to move 
> the barrier one level higher to {{FSNamesystem}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed

2014-06-13 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-5546:


Attachment: HDFS-5546.2.000.patch

[~cmccabe] This new patch is just catching the FNF exception right on 
{{getDirectoryContent()}} for {{ls/lsr}} command. 

Could you take a look of it? Thanks!

> race condition crashes "hadoop ls -R" when directories are moved/removed
> 
>
> Key: HDFS-5546
> URL: https://issues.apache.org/jira/browse/HDFS-5546
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Colin Patrick McCabe
>Assignee: Kousuke Saruta
>Priority: Minor
> Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch
>
>
> This seems to be a rare race condition where we have a sequence of events 
> like this:
> 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
> 2. someone deletes or moves directory D
> 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
> calls DFS#listStatus(D). This throws FileNotFoundException.
> 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6386) HDFS Encryption Zones

2014-06-13 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6386:
---

Attachment: HDFS-6386.8.patch

Andrew,

Thanks for the detailed preliminary review. I agree with you that we should 
split out create/delete/list EZ, and that's what the .8 patch does.


> HDFS Encryption Zones
> -
>
> Key: HDFS-6386
> URL: https://issues.apache.org/jira/browse/HDFS-6386
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, security
>Reporter: Alejandro Abdelnur
>Assignee: Charles Lamb
> Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
> Attachments: HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, 
> HDFS-6386.8.patch
>
>
> Define the required security xAttributes for directories and files within an 
> encryption zone and how they propagate to children. Implement the logic to 
> create/delete encryption zones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031345#comment-14031345
 ] 

Aaron T. Myers commented on HDFS-6529:
--

Patch looks fine to me. +1 pending Jenkins.

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031346#comment-14031346
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650386/HDFS-6527.v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7118//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//console

This message is automatically generated.

> Edit log corruption due to defered INode removal
> 
>
> Key: HDFS-6527
> URL: https://issues.apache.org/jira/browse/HDFS-6527
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch, HDFS-6527.v3.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031348#comment-14031348
 ] 

Hadoop QA commented on HDFS-6529:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650387/HDFS-6529.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7119//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7119//console

This message is automatically generated.

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6530) Fix Balancer documentation

2014-06-13 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6530:
--

Attachment: h6530_20140613b.patch

h6530_20140613b.patch: slightly improves the doc.

> Fix Balancer documentation
> --
>
> Key: HDFS-6530
> URL: https://issues.apache.org/jira/browse/HDFS-6530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h6530_20140613.patch, h6530_20140613b.patch
>
>
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> - Typo:  Policy "node" should be "datanode"
> - Typo: "Balander" should be "Balancer".
> In 
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Rebalancer
> - Change the name "rebalancer" to "balancer".
> http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/CommandsManual.html#balancer
> - Add -policy argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6529) Debug logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031375#comment-14031375
 ] 

Aaron T. Myers commented on HDFS-6529:
--

No tests are needed since this is a logging-only change. I'm going to commit 
this momentarily.

> Debug logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Trace logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6529:
-

Summary: Trace logging for RemoteBlockReader2 to identify remote datanode 
and file being read  (was: Debug logging for RemoteBlockReader2 to identify 
remote datanode and file being read)

> Trace logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6529) Trace logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6529:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Anubhav.

> Trace logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6529) Trace logging for RemoteBlockReader2 to identify remote datanode and file being read

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031382#comment-14031382
 ] 

Hudson commented on HDFS-6529:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5705 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5705/])
HDFS-6529. Trace logging for RemoteBlockReader2 to identify remote datanode and 
file being read. Contributed by Anubhav Dhoot. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602538)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java


> Trace logging for RemoteBlockReader2 to identify remote datanode and file 
> being read
> 
>
> Key: HDFS-6529
> URL: https://issues.apache.org/jira/browse/HDFS-6529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6529.patch
>
>
> The scenario is some download is stuck and we dont know which data node is 
> the client waiting and for which path.
> By turning this file trace logging we will get a running trace of which 
> requests have started and which have completed.
> If trace is not enabled, there is no overhead other than checking if trace is 
> enabled



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6499) can't tell why FileJournalManager's call to java.io.File.renameTo() fails

2014-06-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031387#comment-14031387
 ] 

Aaron T. Myers commented on HDFS-6499:
--

The latest patch looks pretty good to me. One small comment:

{code}
+if (dst.exists()) {
+  if (!dst.delete()) {
+throw new IOException("Couldn't delete " + dst);
+  }
 }
+NativeIO.renameTo(src, dst);
{code}

Though this patch has now solved the problem of the rename operation failing 
without any helpful info, it's still got the problem that the File#delete() 
call could fail without providing any helpful info.

I think this patch is still a net improvement despite this issue, so I'm fine 
committing it as-is and we can file a follow-up JIRA to improve it further if 
you'd like. Let me know what you'd like to do, [~yzhangal].

> can't tell why FileJournalManager's call to java.io.File.renameTo() fails
> -
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6403) Add metrics for log warnings reported by JVM pauses

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6403:
-

Target Version/s: 2.5.0
 Summary: Add metrics for log warnings reported by JVM pauses  
(was: Add metrics for log warnings reported by HADOOP-9618)

> Add metrics for log warnings reported by JVM pauses
> ---
>
> Key: HDFS-6403
> URL: https://issues.apache.org/jira/browse/HDFS-6403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch
>
>
> HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed 
> as a metric, then they can be monitored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6499) can't tell why FileJournalManager's call to java.io.File.renameTo() fails

2014-06-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031399#comment-14031399
 ] 

Yongjun Zhang commented on HDFS-6499:
-

Thanks a lot ATM!  I'd prefer committing it now and I will follow up addressing 
the issue you pointed out.



> can't tell why FileJournalManager's call to java.io.File.renameTo() fails
> -
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6499) can't tell why FileJournalManager's call to java.io.File.renameTo() fails

2014-06-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031405#comment-14031405
 ] 

Aaron T. Myers commented on HDFS-6499:
--

OK then, I'll commit this momentarily.

> can't tell why FileJournalManager's call to java.io.File.renameTo() fails
> -
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6499) Use NativeIO#renameTo instead of File#renameTo in FileJournalManager

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6499:
-

Summary: Use NativeIO#renameTo instead of File#renameTo in 
FileJournalManager  (was: can't tell why FileJournalManager's call to 
java.io.File.renameTo() fails)

> Use NativeIO#renameTo instead of File#renameTo in FileJournalManager
> 
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6499) Use NativeIO#renameTo instead of File#renameTo in FileJournalManager

2014-06-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6499:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Yongjun.

> Use NativeIO#renameTo instead of File#renameTo in FileJournalManager
> 
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.5.0
>
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6499) Use NativeIO#renameTo instead of File#renameTo in FileJournalManager

2014-06-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031411#comment-14031411
 ] 

Yongjun Zhang commented on HDFS-6499:
-

Many thanks ATM! I just filed HDFS-6531 as a follow-up,


> Use NativeIO#renameTo instead of File#renameTo in FileJournalManager
> 
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.5.0
>
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6531) Create a native api to delete file, like the renameTo method in NativeIO, for better error reporting

2014-06-13 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-6531:
---

 Summary: Create a native api to delete file, like the renameTo 
method in NativeIO, for better error reporting
 Key: HDFS-6531
 URL: https://issues.apache.org/jira/browse/HDFS-6531
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


File.delete() returns boolean to indicate success or failure. 

As a follow-up of HDFS-6499, filing this jira to provide a native API to delete 
file, like the renameTo method in NativeIO. So caller can tell better what's 
the reason of failure.

And fix FileJournalManager etc places for better error reporting.

Thanks [~atm] for reviewing the fix of HDFS-6499 and pointing out this issue.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6499) Use NativeIO#renameTo instead of File#renameTo in FileJournalManager

2014-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031413#comment-14031413
 ] 

Hudson commented on HDFS-6499:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5706 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5706/])
HDFS-6499. Use NativeIO#renameTo instead of File#renameTo in 
FileJournalManager. Contributed by Yongjun Zhang. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602542)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java


> Use NativeIO#renameTo instead of File#renameTo in FileJournalManager
> 
>
> Key: HDFS-6499
> URL: https://issues.apache.org/jira/browse/HDFS-6499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.5.0
>
> Attachments: HDFS-6499.001.patch, HDFS-6499.002.patch
>
>
> java.io.File's method renameTo()  returns boolean (true for success and false 
> for failure). If any call to this method failed, the caller can't tell why it 
> failed.
> Filing this jira to address this issue in FileJournalManager by using hadoop 
> nativeio alternative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031415#comment-14031415
 ] 

Aaron T. Myers commented on HDFS-6475:
--

The latest patch looks pretty good to me. Two very small comments:

# I think that the method comment for 
{{testDelegationTokenStandbyNNAppearFirst}} is a bit misleading. Seems like 
it's implying that the Standby NN is now throwing a different exception, when 
in fact I believe that the exception that's thrown is not changed by this 
patch, but rather that the client-side handling of the unwrapping of the 
exception is changed.
# There should be no need to restore the state of the standby/active NNs at the 
end of the test, since the cluster is always shut down at the end of every test 
in this class.

+1 from me once the above are addressed.

[~daryn] and [~jingzhao] - does the latest patch look OK to you?

> WebHdfs clients fail without retry because incorrect handling of 
> StandbyException
> -
>
> Key: HDFS-6475
> URL: https://issues.apache.org/jira/browse/HDFS-6475
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, webhdfs
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
> HDFS-6475.003.patch, HDFS-6475.003.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is 
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map 
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
> the NN based on the order, so likely the first one it runs into is StandbyNN. 
> If the StandbyNN doesn't have the updated client crediential, it will throw a 
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient 
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
> obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
> at kclient1.kclient$1.run(kclient.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at kclient1.kclient.main(kclient.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031461#comment-14031461
 ] 

Yongjun Zhang commented on HDFS-6475:
-

Hi ATM,

Thanks a lot for the review! Sorry I didn't make it clear earlier. The change 
in ExceptionHandler does happen on the server side. Basically the 
ExceptionHandler class processes the original exception thrown at server side 
and pass a possibly revised exception to client. The original exception thrown 
at the server side is SecurityException (from UserProvider class) which has 
cause InvalidToken which in turn has cause StandbyException, The 
ExceptionHandler processes and and pass StandbyException to client.

I'm uploading a revised version to address both of your comments. Hopefully the 
revised comments made it more clear. Thanks in advance for reviewing the new 
revision!


> WebHdfs clients fail without retry because incorrect handling of 
> StandbyException
> -
>
> Key: HDFS-6475
> URL: https://issues.apache.org/jira/browse/HDFS-6475
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, webhdfs
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
> HDFS-6475.003.patch, HDFS-6475.003.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is 
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map 
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
> the NN based on the order, so likely the first one it runs into is StandbyNN. 
> If the StandbyNN doesn't have the updated client crediential, it will throw a 
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient 
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
> obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
> at kclient1.kclient$1.run(kclient.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at kclient1.kclient.main(kclient.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >