[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-06-10 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026152#comment-14026152
 ] 

Vinayakumar B commented on HDFS-5723:
-

Hi Stanley,
If the issue is different, then you can file a separate Jira for that.

 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-5723.patch, HDFS-5723.patch


 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.

2014-06-10 Thread LiuLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-6494:
-

Attachment: hedged-read-test-case.patch

One test cace base on HDFS-6231 patch.

 In some case, the  hedged read will lead to client  infinite wait.
 --

 Key: HDFS-6494
 URL: https://issues.apache.org/jira/browse/HDFS-6494
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: LiuLei
Assignee: Liang Xie
 Attachments: hedged-read-bug.patch, hedged-read-test-case.patch


 When I use hedged read, If there is only one live datanode, the reading 
 from  the datanode throw TimeoutException and ChecksumException., the Client 
 will infinite wait.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-6506:
---

Assignee: Binglin Chang

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-10 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6475:


Attachment: HDFS-6475.002.patch

 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026188#comment-14026188
 ] 

Binglin Chang commented on HDFS-6506:
-

Look at the log and code more throughly. The reason some block replica is 
invalidated is:
1. balancer round 1: move blk0 from dn0 to dn1, at this time block map haven't 
updated yet(so dn0 still have blk0)
2. balancer round 2 starts, and try to move blk0 from dn0 to dn2
3. dn2 copy data from dn0 
4. dn0 heartbeat and get cmd to delete blk0
5. try to move blk0 from dn0 to dn2 , it canot find dn0, but it has to delete a 
replica, so it delete dn1

To prevent this, balancer need to wait some time to make sure the block 
movements in last round is fully committed, otherwise the movements in last 
round may be invalided.



 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, 

[jira] [Created] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-10 Thread Zesheng Wu (JIRA)
Zesheng Wu created HDFS-6507:


 Summary: Improve DFSAdmin to support HA cluster better
 Key: HDFS-6507
 URL: https://issues.apache.org/jira/browse/HDFS-6507
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
1.ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
a. If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

2.Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

3.ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026198#comment-14026198
 ] 

Yongjun Zhang commented on HDFS-6475:
-

Hello [~jingzhao], 

Thanks for your earlier suggestion. Sorry for a bit delay to take care of it. 

I just uploaded a patch. I was able to verify it works with a real cluster 
where I saw the problem and to see the patch fixed the issue. However, I was 
not successful creating a testcase for it. Since this new patch reused the 
method getTrueCause() in Server.java, the remaining thing to be checked by a 
unit test would be the change I made in ExceptionHandler. 

The change in ExceptionHandler is, for ContainerException and 
SecurityException, call getTrueCause() to find the real exception based on the 
cause chain of the ContainerException/SecurityException. The original code in 
ExceptionHandler only does one level of cause-seeking for ContainerException.

Would you please help take a look at the patch to see if this patch can be 
committed without a unit testcase? or if you have any other advice?

Thanks a lot.



 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026215#comment-14026215
 ] 

Binglin Chang commented on HDFS-6506:
-

Balancer already sleep 2*DFS_HEARTBEAT_INTERVAL seconds between rounds, but in 
TestBalancer.java:
{code}
conf.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L);
{code}
replica state update speed is related to DFS_NAMENODE_REPLICATION_INTERVAL too, 
which is 3 by default.
TestBalancer only change heartbeat interval(which changes heartbeat interval 
and balancer iteration sleep time), but doesn't change ReplicationMonitor check 
interval, so the sleep time is too small to wait for movements getting 
committed.
The other thing is 2*DFS_HEARTBEAT_INTERVAL still seems a little dangerous. 
maybe change it to 2*DFS_HEARTBEAT_INTERVAL + DFS_NAMENODE_REPLICATION_INTERVAL


 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, 

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Attachment: HDFS-6506.v1.patch

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Status: Patch Available  (was: Open)

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-10 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Description: 
Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
# ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
#* If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
#* If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

# Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

# ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

  was:
Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
1.ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
a. If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

2.Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

3.ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.


 Improve DFSAdmin to support HA cluster better
 

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-10 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Description: 
Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
a. If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.

Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

  was:
Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
# ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
#* If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
#* If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

# Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

# ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.


 Improve DFSAdmin to support HA cluster better
 

[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026288#comment-14026288
 ] 

Hadoop QA commented on HDFS-6475:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649538/HDFS-6475.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7071//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7071//console

This message is automatically generated.

 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian 

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026314#comment-14026314
 ] 

Hadoop QA commented on HDFS-6506:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649548/HDFS-6506.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7072//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7072//console

This message is automatically generated.

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 

[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026340#comment-14026340
 ] 

Hudson commented on HDFS-6257:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #579 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/579/])
HDFS-6257. TestCacheDirectives#testExceedsCapacity fails occasionally (cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601473)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity fails occasionally
 --

 Key: HDFS-6257
 URL: https://issues.apache.org/jira/browse/HDFS-6257
 Project: Hadoop HDFS
  Issue Type: Test
  Components: caching
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6257.001.patch


 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ :
 REGRESSION:  
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 {code}
 Error Message:
 Namenode should not send extra CACHE commands expected:0 but was:2
 Stack Trace:
 java.lang.AssertionError: Namenode should not send extra CACHE commands 
 expected:0 but was:2
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6399) Add note about setfacl in HDFS permissions guide

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026338#comment-14026338
 ] 

Hudson commented on HDFS-6399:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #579 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/579/])
HDFS-6399. Add note about setfacl in HDFS permissions guide. Contributed by 
Chris Nauroth. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601476)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm


 Add note about setfacl in HDFS permissions guide
 

 Key: HDFS-6399
 URL: https://issues.apache.org/jira/browse/HDFS-6399
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation, namenode
Affects Versions: 2.4.0
Reporter: Charles Lamb
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch


 The ACL operations in FSNamesystem don't currently check isPermissionEnabled 
 before calling checkOwner(). This patch corrects that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026339#comment-14026339
 ] 

Hudson commented on HDFS-6460:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #579 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/579/])
HDFS-6460. Ignore stale and decommissioned nodes in 
NetworkTopology#sortByDistance. Contributed by Yongjun Zhang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601535)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java


 Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
 ---

 Key: HDFS-6460
 URL: https://issues.apache.org/jira/browse/HDFS-6460
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Minor
 Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, 
 HDFS-6460.002.patch


 Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can 
 improve the sorting result and save a bit runtime.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6508) Add an XAttr to specify the cipher mode

2014-06-10 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-6508:
--

 Summary: Add an XAttr to specify the cipher mode
 Key: HDFS-6508
 URL: https://issues.apache.org/jira/browse/HDFS-6508
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Charles Lamb
Assignee: Charles Lamb


We should specify the cipher mode in the xattrs for compatibility sake. Crypto 
changes over time and we need to prepare for that.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6509) distcp vs Data At Rest Encryption

2014-06-10 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb reassigned HDFS-6509:
--

Assignee: Charles Lamb

 distcp vs Data At Rest Encryption
 -

 Key: HDFS-6509
 URL: https://issues.apache.org/jira/browse/HDFS-6509
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Reporter: Charles Lamb
Assignee: Charles Lamb

 distcp needs to work with Data At Rest Encryption



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6509) distcp vs Data At Rest Encryption

2014-06-10 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-6509:
--

 Summary: distcp vs Data At Rest Encryption
 Key: HDFS-6509
 URL: https://issues.apache.org/jira/browse/HDFS-6509
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Charles Lamb


distcp needs to work with Data At Rest Encryption



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-10 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026440#comment-14026440
 ] 

Daryn Sharp commented on HDFS-6475:
---

I had to hack around this problem in HDFS-6222...  I'm a bit uneasy about tying 
the webhdfs servlet to the IPC server.  I'd rather see the logic contained 
within webhdfs.  I think {{UserProvider}} should throw a different exception 
that its {{ExceptionHandler}} specially knows to unwrap.

 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6222) Remove background token renewer from webhdfs

2014-06-10 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026441#comment-14026441
 ] 

Daryn Sharp commented on HDFS-6222:
---

The test failure is unrelated.

 Remove background token renewer from webhdfs
 

 Key: HDFS-6222
 URL: https://issues.apache.org/jira/browse/HDFS-6222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-6222.branch-2.patch, HDFS-6222.branch-2.patch, 
 HDFS-6222.trunk.patch, HDFS-6222.trunk.patch


 The background token renewer is a source of problems for long-running 
 daemons.  Webhdfs should lazy fetch a new token when it receives an 
 InvalidToken exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026456#comment-14026456
 ] 

Hudson commented on HDFS-6460:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1770 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/])
HDFS-6460. Ignore stale and decommissioned nodes in 
NetworkTopology#sortByDistance. Contributed by Yongjun Zhang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601535)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java


 Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
 ---

 Key: HDFS-6460
 URL: https://issues.apache.org/jira/browse/HDFS-6460
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Minor
 Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, 
 HDFS-6460.002.patch


 Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can 
 improve the sorting result and save a bit runtime.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6399) Add note about setfacl in HDFS permissions guide

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026455#comment-14026455
 ] 

Hudson commented on HDFS-6399:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1770 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/])
HDFS-6399. Add note about setfacl in HDFS permissions guide. Contributed by 
Chris Nauroth. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601476)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm


 Add note about setfacl in HDFS permissions guide
 

 Key: HDFS-6399
 URL: https://issues.apache.org/jira/browse/HDFS-6399
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation, namenode
Affects Versions: 2.4.0
Reporter: Charles Lamb
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch


 The ACL operations in FSNamesystem don't currently check isPermissionEnabled 
 before calling checkOwner(). This patch corrects that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026457#comment-14026457
 ] 

Hudson commented on HDFS-6257:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1770 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/])
HDFS-6257. TestCacheDirectives#testExceedsCapacity fails occasionally (cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601473)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity fails occasionally
 --

 Key: HDFS-6257
 URL: https://issues.apache.org/jira/browse/HDFS-6257
 Project: Hadoop HDFS
  Issue Type: Test
  Components: caching
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6257.001.patch


 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ :
 REGRESSION:  
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 {code}
 Error Message:
 Namenode should not send extra CACHE commands expected:0 but was:2
 Stack Trace:
 java.lang.AssertionError: Namenode should not send extra CACHE commands 
 expected:0 but was:2
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026512#comment-14026512
 ] 

Binglin Chang commented on HDFS-6506:
-

The failed test is not related and is tracked in HDFS-3930, actually recent 
build also failed because of this.
https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/consoleText

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think 

[jira] [Commented] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond

2014-06-10 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026522#comment-14026522
 ] 

Juan Yu commented on HDFS-6493:
---

Thanks for the suggestion.

 Propose to change dfs.namenode.startup.delay.block.deletion to second 
 instead of millisecond
 --

 Key: HDFS-6493
 URL: https://issues.apache.org/jira/browse/HDFS-6493
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Trivial

 Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, 
 the delay will be at least 30 minutes or even hours. it's not very user 
 friendly to use milliseconds when it's likely measured in hours.
 I suggest to make the following change
 1. change the unit of this config to second
 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms 
 to dfs.namenode.startup.delay.block.deletion.sec
 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 
 minutes, one hour?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026558#comment-14026558
 ] 

Hudson commented on HDFS-6460:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1797 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1797/])
HDFS-6460. Ignore stale and decommissioned nodes in 
NetworkTopology#sortByDistance. Contributed by Yongjun Zhang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601535)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java


 Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
 ---

 Key: HDFS-6460
 URL: https://issues.apache.org/jira/browse/HDFS-6460
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Minor
 Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, 
 HDFS-6460.002.patch


 Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can 
 improve the sorting result and save a bit runtime.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond

2014-06-10 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Yu updated HDFS-6493:
--

Attachment: HDFS-6493.001.patch

 Propose to change dfs.namenode.startup.delay.block.deletion to second 
 instead of millisecond
 --

 Key: HDFS-6493
 URL: https://issues.apache.org/jira/browse/HDFS-6493
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Trivial
 Attachments: HDFS-6493.001.patch


 Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, 
 the delay will be at least 30 minutes or even hours. it's not very user 
 friendly to use milliseconds when it's likely measured in hours.
 I suggest to make the following change
 1. change the unit of this config to second
 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms 
 to dfs.namenode.startup.delay.block.deletion.sec
 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 
 minutes, one hour?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6399) Add note about setfacl in HDFS permissions guide

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026557#comment-14026557
 ] 

Hudson commented on HDFS-6399:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1797 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1797/])
HDFS-6399. Add note about setfacl in HDFS permissions guide. Contributed by 
Chris Nauroth. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601476)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm


 Add note about setfacl in HDFS permissions guide
 

 Key: HDFS-6399
 URL: https://issues.apache.org/jira/browse/HDFS-6399
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation, namenode
Affects Versions: 2.4.0
Reporter: Charles Lamb
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch


 The ACL operations in FSNamesystem don't currently check isPermissionEnabled 
 before calling checkOwner(). This patch corrects that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026559#comment-14026559
 ] 

Hudson commented on HDFS-6257:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1797 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1797/])
HDFS-6257. TestCacheDirectives#testExceedsCapacity fails occasionally (cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601473)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 TestCacheDirectives#testExceedsCapacity fails occasionally
 --

 Key: HDFS-6257
 URL: https://issues.apache.org/jira/browse/HDFS-6257
 Project: Hadoop HDFS
  Issue Type: Test
  Components: caching
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6257.001.patch


 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ :
 REGRESSION:  
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
 {code}
 Error Message:
 Namenode should not send extra CACHE commands expected:0 but was:2
 Stack Trace:
 java.lang.AssertionError: Namenode should not send extra CACHE commands 
 expected:0 but was:2
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6508) Add an XAttr to specify the cipher mode

2014-06-10 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026554#comment-14026554
 ] 

Charles Lamb commented on HDFS-6508:


[~tucu00] says:

Our current work is implementing standard AES-CTR streaming, this is pluggable 
to support multiple implementations (pure Java -current-, OpenSSL backed -based 
on Diceros-, etc).

We should store the encryption mode enum, for now simply AES-CTR.

To support future impls and backward/forward compatibility we should do 
something like:

* On create/open RPC request, client sends set of supported encryption modes to 
NN.
* On create RPC, if the NN does support any of the modes specified by the 
client, EXCEPTION.
* On open RPC, if the NN determines the client does not support the encryption 
mode used in the file on encryption, EXCEPTION.
* On create RPC response, NN sends back encryption initialization data (ie key, 
IV) plus the encryption mode mode the client must use.
* On open RPC response, NN send back encryption initialization data (ie key, 
IV) plus the encryption mode mode the client must use (data has been encrypted 
with this mode).
* The client would have a switch/case statement on the encryption mode to wrap 
the data streams with the right impl. At the moment here is one choice only.

Note that the implementation we use for a given encryption mode (first 
paragraph of this email) is independent of the encryption mode selection logic 
just described.

 Add an XAttr to specify the cipher mode
 ---

 Key: HDFS-6508
 URL: https://issues.apache.org/jira/browse/HDFS-6508
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Charles Lamb
Assignee: Charles Lamb

 We should specify the cipher mode in the xattrs for compatibility sake. 
 Crypto changes over time and we need to prepare for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond

2014-06-10 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Yu updated HDFS-6493:
--

Status: Patch Available  (was: In Progress)

 Propose to change dfs.namenode.startup.delay.block.deletion to second 
 instead of millisecond
 --

 Key: HDFS-6493
 URL: https://issues.apache.org/jira/browse/HDFS-6493
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Trivial
 Attachments: HDFS-6493.001.patch


 Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, 
 the delay will be at least 30 minutes or even hours. it's not very user 
 friendly to use milliseconds when it's likely measured in hours.
 I suggest to make the following change
 1. change the unit of this config to second
 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms 
 to dfs.namenode.startup.delay.block.deletion.sec
 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 
 minutes, one hour?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-742:
---

Attachment: HDFS-742.patch

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-2006) ability to support storing extended attributes per file

2014-06-10 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2006:
--

Attachment: HDFS-2006-Branch-2-Merge.patch

Sure. Thanks a lot, Andrew and Chris for your opinions.  I have created a 
branch-2 merge patch and attached here for the reference.
Andrew, Please use this patch if you plan to run the jenkins on it. I am 
running tests locally on it and will do some basic testing.

Please note, this patch contains only HDFS-2006 subtasks. Remaining top-level 
jiras will be merged in their respective jiras for tracking correctly.

 ability to support storing extended attributes per file
 ---

 Key: HDFS-2006
 URL: https://issues.apache.org/jira/browse/HDFS-2006
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: dhruba borthakur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: ExtendedAttributes.html, HDFS-2006-Branch-2-Merge.patch, 
 HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, 
 HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, 
 Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch


 It would be nice if HDFS provides a feature to store extended attributes for 
 files, similar to the one described here: 
 http://en.wikipedia.org/wiki/Extended_file_attributes. 
 The challenge is that it has to be done in such a way that a site not using 
 this feature does not waste precious memory resources in the namenode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6508) Add an XAttr to specify the cipher mode

2014-06-10 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026592#comment-14026592
 ] 

Charles Lamb commented on HDFS-6508:


bq. On create RPC, if the NN does support any of the modes specified by the 
client, EXCEPTION.

s/does support/does not support/


 Add an XAttr to specify the cipher mode
 ---

 Key: HDFS-6508
 URL: https://issues.apache.org/jira/browse/HDFS-6508
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Charles Lamb
Assignee: Charles Lamb

 We should specify the cipher mode in the xattrs for compatibility sake. 
 Crypto changes over time and we need to prepare for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026606#comment-14026606
 ] 

Hadoop QA commented on HDFS-2006:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12649598/HDFS-2006-Branch-2-Merge.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7074//console

This message is automatically generated.

 ability to support storing extended attributes per file
 ---

 Key: HDFS-2006
 URL: https://issues.apache.org/jira/browse/HDFS-2006
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: dhruba borthakur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: ExtendedAttributes.html, HDFS-2006-Branch-2-Merge.patch, 
 HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, 
 HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, 
 Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch


 It would be nice if HDFS provides a feature to store extended attributes for 
 files, similar to the one described here: 
 http://en.wikipedia.org/wiki/Extended_file_attributes. 
 The challenge is that it has to be done in such a way that a site not using 
 this feature does not waste precious memory resources in the namenode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026612#comment-14026612
 ] 

Mit Desai commented on HDFS-742:


Attaching the patch. Unfortunately I do not have a way to reproduce the issue 
so I'm unable to have a test to verify the change.
Here is the explanation of the part of the Balancer code makes it hang forever.

In the following while loop in Balancer.java, when the Balancer figures out 
that it should fetch more blocks, it gets the BlockList and decrements the 
blockToReceive by that many blocks. It again starts from the top of the loop 
after that.

{code}
 while(!isTimeUp  getScheduledSize()0 
  (!srcBlockList.isEmpty() || blocksToReceive0)) {
   
## SOME LINES OMITTED ##

filterMovedBlocks(); // filter already moved blocks
if (shouldFetchMoreBlocks()) {
  // fetch new blocks
  try {
blocksToReceive -= getBlockList();
continue;
  } catch (IOException e) {

## SOME LINES OMITTED ##

// check if time is up or not
if (Time.now()-startTime  MAX_ITERATION_TIME) {
  isTimeUp = true;
  continue;
}
## SOME LINES OMITTED ##

 }
{code}

The problem here is, if the datanode is decommissioned, the {{getBlockList()}} 
method will not return anything and the {{blocksToReceive}} will not be 
changed. It will keep on doing this indefinitely as the {{blocksToReceive}} 
will always be greater than 0. The {{isTimeUp}} will never be set to true as it 
will never reach that part of the code. In the patch that is submitted, the 
Time up condition is moved to the top of the loop. So it will check if 
{{isTimeUp}} is set and proceed ahead only if time up is not encountered.

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException

2014-06-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026642#comment-14026642
 ] 

Yongjun Zhang commented on HDFS-6475:
-

HI [~daryn],

Thanks a lot for the review and comments. I attempted to let UserProvider class 
throw different exception earlier, and found it inherits from classes of jersey 
package, which we won't be able  to change the interface spec. I could just 
make a duplicated copy of the IPC server code to the ExceptionHandler area so 
they are not tied to each other. I'm open for that.






 WebHdfs clients fail without retry because incorrect handling of 
 StandbyException
 -

 Key: HDFS-6475
 URL: https://issues.apache.org/jira/browse/HDFS-6475
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch


 With WebHdfs clients connected to a HA HDFS service, the delegation token is 
 previously initialized with the active NN.
 When clients try to issue request, the NN it contacts is stored in a map 
 returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
 the NN based on the order, so likely the first one it runs into is StandbyNN. 
 If the StandbyNN doesn't have the updated client crediential, it will throw a 
 s SecurityException that wraps StandbyException.
 The client is expected to retry another NN, but due to the insufficient 
 handling of SecurityException mentioned above, it failed.
 Example message:
 {code}
 {RemoteException={message=Failed to obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: 
 StandbyException, javaCl
 assName=java.lang.SecurityException, exception=SecurityException}}
 org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
 obtain user group information: 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
 at kclient1.kclient$1.run(kclient.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
 at kclient1.kclient.main(kclient.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6510) WebHdfs clients clears the delegation token on retry (for HA), thus failing retry requests

2014-06-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-6510:
---

 Summary: WebHdfs clients clears the delegation token on retry (for 
HA), thus failing retry requests
 Key: HDFS-6510
 URL: https://issues.apache.org/jira/browse/HDFS-6510
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


In WebHdfs clients connected to a HA HDFS service, when a failure (that is 
inferred as a failover) happens and retry is attempted, the delegation token 
previously initialized is cleared. For token based auth, this causes all the 
subsequent retries to fail due to auth errors.
See delegationToken = null in method WebHdfsFileSystem.resetStateToFailOver.
This issue would not only show up on failover, but happens more commonly when 
the first NN specified in the config is not reachable.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6502) incorrect description in distcp2 document

2014-06-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026658#comment-14026658
 ] 

Yongjun Zhang commented on HDFS-6502:
-

HI [~ajisakaa], thanks a lot for the quick patch. I will try to review the 
changes as soon as I can.

 incorrect description in distcp2 document
 -

 Key: HDFS-6502
 URL: https://issues.apache.org/jira/browse/HDFS-6502
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.2.1, 2.5.0
Reporter: Yongjun Zhang
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6502.patch


 In http://hadoop.apache.org/docs/r1.2.1/distcp2.html#UpdateAndOverwrite
 The first statement of the Update and Overwrite section says:
 {quote}
 -update is used to copy files from source that don't exist at the target, or 
 have different contents. -overwrite overwrites target-files even if they 
 exist at the source, or have the same contents.
 {quote}
 The Command Line Options table says :
 {quote}
   -overwrite: Overwrite destination
   -update: Overwrite if src size different from dst size
 {quote}
 Based on the implementation, making the following modification would be more 
 accurate:
 The first statement of the Update and Overwrite section:
 {code}
 -update is used to copy files from source that don't exist at the target, or 
 have different contents. -overwrite overwrites target-files if they exist at 
 the target.
 {code}
 The Command Line Options table:
 {code}
   -overwrite: Overwrite destination
   -update: Overwrite destination if source and destination have different 
 contents
 {code}
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6511) BlockManager#computeInvalidateWork() could do nothing

2014-06-10 Thread Juan Yu (JIRA)
Juan Yu created HDFS-6511:
-

 Summary: BlockManager#computeInvalidateWork() could do nothing
 Key: HDFS-6511
 URL: https://issues.apache.org/jira/browse/HDFS-6511
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Minor


BlockManager#computeInvalidateWork() uses a for loop to check certain number of 
DNs to do invalidation work. but it's possible that a DN has nothing to 
invalidate.  
computeInvalidateWork() should loop until really invalidate certain number of 
DNs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6395) Assorted improvements to xattr limit checking

2014-06-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026678#comment-14026678
 ] 

Chris Nauroth commented on HDFS-6395:
-

{code}
if (xAttr.getNameSpace() == XAttr.NameSpace.USER || 
xAttr.getNameSpace() == XAttr.NameSpace.TRUSTED) {
{code}

Minor nit-pick: this piece of logic is duplicated.  It might be worthwhile to 
put this in a helper method or possibly add an {{isUserVisible}} method on the 
{{NameSpace}} enum to document the intent.

I agree with Andrew's feedback about removing these prints.  If it's too tricky 
to log warnings during loading right now, then let's proceed with the other 
valuable changes in this patch.  We can always revisit the logging later if 
needed.

Thank you for working on this, Yi.

 Assorted improvements to xattr limit checking
 -

 Key: HDFS-6395
 URL: https://issues.apache.org/jira/browse/HDFS-6395
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Yi Liu
 Attachments: HDFS-6395.patch


 It'd be nice to print messages during fsimage and editlog loading if we hit 
 either the # of xattrs per inode or the xattr size limits.
 We should also consider making the # of xattrs limit only apply to the user 
 namespace, or to each namespace separately, to prevent users from locking out 
 access to other namespaces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6510) WebHdfs clients clears the delegation token on retry (for HA), thus failing retry requests

2014-06-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026694#comment-14026694
 ] 

Haohui Mai commented on HDFS-6510:
--

This should be a duplicate of HDFS-6312.

 WebHdfs clients clears the delegation token on retry (for HA), thus failing 
 retry requests
 --

 Key: HDFS-6510
 URL: https://issues.apache.org/jira/browse/HDFS-6510
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 In WebHdfs clients connected to a HA HDFS service, when a failure (that is 
 inferred as a failover) happens and retry is attempted, the delegation token 
 previously initialized is cleared. For token based auth, this causes all the 
 subsequent retries to fail due to auth errors.
 See delegationToken = null in method WebHdfsFileSystem.resetStateToFailOver.
 This issue would not only show up on failover, but happens more commonly when 
 the first NN specified in the config is not reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6488) HDFS superuser unable to access user's Trash files using NFSv3 mount

2014-06-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026716#comment-14026716
 ] 

Colin Patrick McCabe commented on HDFS-6488:


I think the issue is that on your local Linux, the {{hdfs}} user doesn't have 
any special permissions attached to it.  So Linux sees a file owned by a 
different user ({{schu}}) with mode {{0700}} and thinks that you just don't 
have permission to read it.

I don't know if there is a good resolution for this, since Linux's behavior 
probably can't be changed.  You're basically asking {{schu}} to behave like 
root inside the NFS mount, but not elsewhere, and that would require kernel 
changes to implement.  Maybe I'm missing something, but I don't see how we can 
implement that...

What's the behavior with the actual {{root}} user?  Do we implement 
{{root_squash}}?

 HDFS superuser unable to access user's Trash files using NFSv3 mount
 

 Key: HDFS-6488
 URL: https://issues.apache.org/jira/browse/HDFS-6488
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
Reporter: Stephen Chu

 As hdfs superuseruser on the NFS mount, I cannot cd or ls the 
 /user/schu/.Trash directory:
 {code}
 bash-4.1$ cd .Trash/
 bash: cd: .Trash/: Permission denied
 bash-4.1$ ls -la
 total 2
 drwxr-xr-x 4 schu 2584148964 128 Jan  7 10:42 .
 drwxr-xr-x 4 hdfs 2584148964 128 Jan  6 16:59 ..
 drwx-- 2 schu 2584148964  64 Jan  7 10:45 .Trash
 drwxr-xr-x 2 hdfs hdfs64 Jan  7 10:42 tt
 bash-4.1$ ls .Trash
 ls: cannot open directory .Trash: Permission denied
 bash-4.1$
 {code}
 When using FsShell as hdfs superuser, I have superuser permissions to schu's 
 .Trash contents:
 {code}
 bash-4.1$ hdfs dfs -ls -R /user/schu/.Trash
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current/user
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current/user/schu
 -rw-r--r--   1 schu supergroup  4 2014-01-07 10:48 
 /user/schu/.Trash/Current/user/schu/tf1
 {code}
 The NFSv3 logs don't produce any error when superuser tries to access schu 
 Trash contents. However, for other permission errors (e.g. schu tries to 
 delete a directory owned by hdfs), there will be a permission error in the 
 logs.
 I think this is not specific to the .Trash directory perhaps.
 I created a /user/schu/dir1 which has the same permissions as .Trash (700). 
 When I try cd'ing into the directory from the NFSv3 mount as hdfs superuser, 
 I get the same permission denied.
 {code}
 [schu@hdfs-nfs ~]$ hdfs dfs -ls
 Found 4 items
 drwx--   - schu supergroup  0 2014-01-07 10:57 .Trash
 drwx--   - schu supergroup  0 2014-01-07 11:05 dir1
 -rw-r--r--   1 schu supergroup  4 2014-01-07 11:05 tf1
 drwxr-xr-x   - hdfs hdfs0 2014-01-07 10:42 tt
 bash-4.1$ whoami
 hdfs
 bash-4.1$ pwd
 /hdfs_nfs_mount/user/schu
 bash-4.1$ cd dir1
 bash: cd: dir1: Permission denied
 bash-4.1$
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6391) Get the Key/IV from the NameNode for encrypted files in DFSClient

2014-06-10 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026721#comment-14026721
 ] 

Charles Lamb commented on HDFS-6391:


This has been subsumed by HDFS-6386. I will close this Jira when HDFS-6386 gets 
committed.

 Get the Key/IV from the NameNode for encrypted files in DFSClient
 -

 Key: HDFS-6391
 URL: https://issues.apache.org/jira/browse/HDFS-6391
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6391.1.patch


 When creating/opening and encrypted file, the DFSClient should get the 
 encryption key material and the IV for the file in the create/open RPC call.
 HDFS admin users would never get key material/IV on encrypted files 
 create/open.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6474) Namenode needs to get the actual keys and iv from the KeyProvider

2014-06-10 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026722#comment-14026722
 ] 

Charles Lamb commented on HDFS-6474:


This will be subsumed by HDFS-6386. I will close this when HDFS-6386 is 
committed.


 Namenode needs to get the actual keys and iv from the KeyProvider
 -

 Key: HDFS-6474
 URL: https://issues.apache.org/jira/browse/HDFS-6474
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Charles Lamb
Assignee: Charles Lamb

 The Namenode has code to connect to the KeyProvider, but it needs to actually 
 get the keys and return them to the client at the right time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs

2014-06-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026736#comment-14026736
 ] 

Colin Patrick McCabe commented on HDFS-6492:


Technically it could still be atomic if we create all the separate edit log ops 
under the FSN lock.  The only cases I can see that might be problematic are 
where the edit log got truncated (how?) or there was corruption.  But it's 
simple enough to add this to the ops, and perhaps it will be helpful when 
making the FSN lock finer grained...

I agree that we don't have to do any client-side API work here, since this is 
intended for encryption at the moment.

 Support create-time xattrs and atomically setting multiple xattrs
 -

 Key: HDFS-6492
 URL: https://issues.apache.org/jira/browse/HDFS-6492
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang

 Ongoing work in HDFS-6134 requires being able to set system namespace 
 extended attributes at create and mkdir time, as well as being able to 
 atomically set multiple xattrs at once. There's currently no need to expose 
 this functionality in the client API, so let's not unless we have to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not

2014-06-10 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6439:
-

Attachment: HDFS-6439.003.patch

 NFS should not reject NFS requests to the NULL procedure whether port 
 monitoring is enabled or not
 --

 Key: HDFS-6439
 URL: https://issues.apache.org/jira/browse/HDFS-6439
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Brandon Li
Assignee: Aaron T. Myers
 Attachments: HDFS-6439.003.patch, HDFS-6439.patch, HDFS-6439.patch, 
 linux-nfs-disallow-request-from-nonsecure-port.pcapng, 
 mount-nfs-requests.pcapng


 As discussed in HDFS-6406, this JIRA is to track the follow update:
 1. Port monitoring is the feature name with traditional NFS server and we may 
 want to make the config property (along with related variable 
 allowInsecurePorts) something as dfs.nfs.port.monitoring. 
 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt):
 {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT 
 reject NFS requests to the NULL procedure (procedure number 0). See 
 subsection 2.3.1, NULL procedure for a complete explanation. {quote}
 I do notice that NFS clients (most time) send mount NULL and nfs NULL from 
 non-privileged port. If we deny NULL call in mountd or nfs server, the client 
 can't mount the export even as user root.
 3. it would be nice to have the user guide updated for the port monitoring 
 feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6312) WebHdfs HA failover is broken on secure clusters

2014-06-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026764#comment-14026764
 ] 

Yongjun Zhang commented on HDFS-6312:
-

HI [~daryn],

I filed HDFS-6510 for the same issue and [~wheat9] sent me here (thanks 
Haohui).  I wonder if we can dedicate this jira for this particular issue 
instead of bundling with other fix?

Thanks.


 WebHdfs HA failover is broken on secure clusters
 

 Key: HDFS-6312
 URL: https://issues.apache.org/jira/browse/HDFS-6312
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker

 When webhdfs does a failover, it blanks out the delegation token.  This will 
 cause subsequent operations against the other NN to acquire a new token.  
 Tasks cannot acquire a token (no kerberos credentials) so jobs will fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6510) WebHdfs clients clears the delegation token on retry (for HA), thus failing retry requests

2014-06-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026763#comment-14026763
 ] 

Yongjun Zhang commented on HDFS-6510:
-

Thanks [~wheat9]!


 WebHdfs clients clears the delegation token on retry (for HA), thus failing 
 retry requests
 --

 Key: HDFS-6510
 URL: https://issues.apache.org/jira/browse/HDFS-6510
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 In WebHdfs clients connected to a HA HDFS service, when a failure (that is 
 inferred as a failover) happens and retry is attempted, the delegation token 
 previously initialized is cleared. For token based auth, this causes all the 
 subsequent retries to fail due to auth errors.
 See delegationToken = null in method WebHdfsFileSystem.resetStateToFailOver.
 This issue would not only show up on failover, but happens more commonly when 
 the first NN specified in the config is not reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-10 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6482:
---

Attachment: HDFS-6482.3.patch

Made changes suggested by Arpit. I don't think that deletion of empty 
directories is necessary -- it was not done in the previous scheme and the 
benefit in terms of faster directory listings and lookups seems marginal (and 
there is some chance that the directory will be recreated at a later time). I 
have added a third subdir level (with the 25th to 32nd bits of the block ID) to 
further reduce the likelihood of directory blowup in large clusters. For a 
cluster with N blocks (to clarify, this means that N blocks have been created 
over the lifetime of the cluster, but some may have been deleted), the upper 
bound on the number of files in any DN directory is now N/2^24, so even for 
clusters with 2^30 (~1 billion) blocks created over their lifetimes we should 
have fairly small directories. I don't think there's any need to implement 
further logic to prevent a directory from exceeding 256 entries, since this 
can't happen anyway with clusters with fewer than 2^32 blocks created, and even 
then the probability is very small.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026774#comment-14026774
 ] 

Hadoop QA commented on HDFS-6493:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649593/HDFS-6493.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7073//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7073//console

This message is automatically generated.

 Propose to change dfs.namenode.startup.delay.block.deletion to second 
 instead of millisecond
 --

 Key: HDFS-6493
 URL: https://issues.apache.org/jira/browse/HDFS-6493
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Trivial
 Attachments: HDFS-6493.001.patch


 Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, 
 the delay will be at least 30 minutes or even hours. it's not very user 
 friendly to use milliseconds when it's likely measured in hours.
 I suggest to make the following change
 1. change the unit of this config to second
 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms 
 to dfs.namenode.startup.delay.block.deletion.sec
 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 
 minutes, one hour?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026772#comment-14026772
 ] 

Colin Patrick McCabe commented on HDFS-6382:


bq. You mean that we scan the whole namespace at first and then split it into 5 
pieces according to hash of the path, why do we just complete the work during 
the first scanning process? If I misunderstand your meaning, please point out.

You need to make one RPC for each file or directory you delete.  In contrast, 
when listing a directory you make only one RPC for every {{dfs.ls.limit}} 
elements (by default 1000).  So if you have 5 workers all listing all 
directories, but only calling delete on some of the files, you still might come 
out ahead in terms of number of RPCs, provided you had a high ratio of files to 
directories.

There are other ways to partition the namespace which are smarter, but rely on 
some knowledge of what is in it, which you'd have to keep track of.

A single node design will work for now, though.  Considering that you probably 
want rate-limiting anyway.

bq. For the simplicity purpose, in the initial version, we will use logs to 
record which file/directory is deleted by TTL, and errors during the deleting 
process.

Even if it's not implemented at first, we should think about the configuration 
required here.  I think we want the ability to email the admins when things go 
wrong.  Possibly the notifier could be pluggable or have several policies.  
There was nothing in the doc about configuration in general, which I think we 
need to fix.  For example, how is rate limiting configurable?  How do we notify 
admins that the rate is too slow to finish in the time given?

bq. It doesn't need to be an administrator command, user only can setTtl on 
file/directory that they have write permission, and can getTtl on 
file/directory that they have read permission.

You can't delete a file in HDFS unless you have write permission on the 
containing directory.  Whether you have write permission on the file itself is 
not relevant.  So I would expect the same semantics here (probably enforced by 
setfacl itself).

 HDFS File/Directory TTL
 ---

 Key: HDFS-6382
 URL: https://issues.apache.org/jira/browse/HDFS-6382
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-TTL-Design.pdf


 In production environment, we always have scenario like this, we want to 
 backup files on hdfs for some time and then hope to delete these files 
 automatically. For example, we keep only 1 day's logs on local disk due to 
 limited disk space, but we need to keep about 1 month's logs in order to 
 debug program bugs, so we keep all the logs on hdfs and delete logs which are 
 older than 1 month. This is a typical scenario of HDFS TTL. So here we 
 propose that hdfs can support TTL.
 Following are some details of this proposal:
 1. HDFS can support TTL on a specified file or directory
 2. If a TTL is set on a file, the file will be deleted automatically after 
 the TTL is expired
 3. If a TTL is set on a directory, the child files and directories will be 
 deleted automatically after the TTL is expired
 4. The child file/directory's TTL configuration should override its parent 
 directory's
 5. A global configuration is needed to configure that whether the deleted 
 files/directories should go to the trash or not
 6. A global configuration is needed to configure that whether a directory 
 with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6364) Incorrect check for unknown datanode in Balancer

2014-06-10 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6364:
---

Attachment: HDFS-6364.patch

Attaching the newer patch

 Incorrect check for unknown datanode in Balancer
 

 Key: HDFS-6364
 URL: https://issues.apache.org/jira/browse/HDFS-6364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6364-6441.patch, HDFS-6364.patch, HDFS-6364.patch


 The Balancer makes a check to see if a block's location is known datanode. 
 But the variable it uses to check is wrong.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6488) HDFS superuser unable to access user's Trash files using NFSv3 mount

2014-06-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026781#comment-14026781
 ] 

Colin Patrick McCabe commented on HDFS-6488:


HDFS-6498 might offer a way to do this

 HDFS superuser unable to access user's Trash files using NFSv3 mount
 

 Key: HDFS-6488
 URL: https://issues.apache.org/jira/browse/HDFS-6488
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
Reporter: Stephen Chu

 As hdfs superuseruser on the NFS mount, I cannot cd or ls the 
 /user/schu/.Trash directory:
 {code}
 bash-4.1$ cd .Trash/
 bash: cd: .Trash/: Permission denied
 bash-4.1$ ls -la
 total 2
 drwxr-xr-x 4 schu 2584148964 128 Jan  7 10:42 .
 drwxr-xr-x 4 hdfs 2584148964 128 Jan  6 16:59 ..
 drwx-- 2 schu 2584148964  64 Jan  7 10:45 .Trash
 drwxr-xr-x 2 hdfs hdfs64 Jan  7 10:42 tt
 bash-4.1$ ls .Trash
 ls: cannot open directory .Trash: Permission denied
 bash-4.1$
 {code}
 When using FsShell as hdfs superuser, I have superuser permissions to schu's 
 .Trash contents:
 {code}
 bash-4.1$ hdfs dfs -ls -R /user/schu/.Trash
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current/user
 drwx--   - schu supergroup  0 2014-01-07 10:48 
 /user/schu/.Trash/Current/user/schu
 -rw-r--r--   1 schu supergroup  4 2014-01-07 10:48 
 /user/schu/.Trash/Current/user/schu/tf1
 {code}
 The NFSv3 logs don't produce any error when superuser tries to access schu 
 Trash contents. However, for other permission errors (e.g. schu tries to 
 delete a directory owned by hdfs), there will be a permission error in the 
 logs.
 I think this is not specific to the .Trash directory perhaps.
 I created a /user/schu/dir1 which has the same permissions as .Trash (700). 
 When I try cd'ing into the directory from the NFSv3 mount as hdfs superuser, 
 I get the same permission denied.
 {code}
 [schu@hdfs-nfs ~]$ hdfs dfs -ls
 Found 4 items
 drwx--   - schu supergroup  0 2014-01-07 10:57 .Trash
 drwx--   - schu supergroup  0 2014-01-07 11:05 dir1
 -rw-r--r--   1 schu supergroup  4 2014-01-07 11:05 tf1
 drwxr-xr-x   - hdfs hdfs0 2014-01-07 10:42 tt
 bash-4.1$ whoami
 hdfs
 bash-4.1$ pwd
 /hdfs_nfs_mount/user/schu
 bash-4.1$ cd dir1
 bash: cd: dir1: Permission denied
 bash-4.1$
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not

2014-06-10 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026784#comment-14026784
 ] 

Brandon Li commented on HDFS-6439:
--

Thank you, [~atm]. I've rebased your patch and uploaded a new one.
Basically, the new patch does the port monitoring in each NFS handlers. If we 
deny the request at RPC level, some NFS client might keep sending the same NFS 
request(e.g., GETATTR). For mountd, it only does the check for MNT request 
since some utilities (e.g., showmount) sends EXPORT request using 
non-privileged port which we don't want to fail. 
I also used the opportunity to do a cleanup for the NFS3Interface. 

Port monitor is by default disabled to not make the gateway easier to use, 
especially for Windows/MacOS NFS client and developers. 

Please review.


 NFS should not reject NFS requests to the NULL procedure whether port 
 monitoring is enabled or not
 --

 Key: HDFS-6439
 URL: https://issues.apache.org/jira/browse/HDFS-6439
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Brandon Li
Assignee: Aaron T. Myers
 Attachments: HDFS-6439.003.patch, HDFS-6439.patch, HDFS-6439.patch, 
 linux-nfs-disallow-request-from-nonsecure-port.pcapng, 
 mount-nfs-requests.pcapng


 As discussed in HDFS-6406, this JIRA is to track the follow update:
 1. Port monitoring is the feature name with traditional NFS server and we may 
 want to make the config property (along with related variable 
 allowInsecurePorts) something as dfs.nfs.port.monitoring. 
 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt):
 {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT 
 reject NFS requests to the NULL procedure (procedure number 0). See 
 subsection 2.3.1, NULL procedure for a complete explanation. {quote}
 I do notice that NFS clients (most time) send mount NULL and nfs NULL from 
 non-privileged port. If we deny NULL call in mountd or nfs server, the client 
 can't mount the export even as user root.
 3. it would be nice to have the user guide updated for the port monitoring 
 feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6364) Incorrect check for unknown datanode in Balancer

2014-06-10 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026788#comment-14026788
 ] 

Arpit Agarwal commented on HDFS-6364:
-

+1 pending Jenkins.

Per discussion with Benoy he will include the test case in a separate Jira due 
to the dependence on HDFS-6441.

 Incorrect check for unknown datanode in Balancer
 

 Key: HDFS-6364
 URL: https://issues.apache.org/jira/browse/HDFS-6364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6364-6441.patch, HDFS-6364.patch, HDFS-6364.patch


 The Balancer makes a check to see if a block's location is known datanode. 
 But the variable it uses to check is wrong.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6349) Bound checks missing in FSEditLogOp.AclEditLogUtil

2014-06-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-6349.
-

Resolution: Duplicate

Hi, [~kihwal].  This same issue is tracked in HDFS-5995, so I'm resolving 
HDFS-6349 as duplicate.  There is a lot of discussion and a proposed patch on 
the other issue, but no concrete resolution yet.

 Bound checks missing in FSEditLogOp.AclEditLogUtil
 --

 Key: HDFS-6349
 URL: https://issues.apache.org/jira/browse/HDFS-6349
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee

 AclEditLogUtil.read() can throw OutOfMemoryException when it encounters a 
 certain corrupt entry. This is because it reads the size and blindly tries to 
 allocate an ArrayList of that size.
 Because of this a test case always dumps the heap. The test doesn't fail 
 since edit log loading catches and handles the exception.
 {panel}
 Running org.apache.hadoop.hdfs.server.namenode.TestNameNodeRecovery
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid18667.hprof ...
 Heap dump file created \[43870820 bytes in 0.154 secs\]
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory

2014-06-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026835#comment-14026835
 ] 

Haohui Mai commented on HDFS-6315:
--

[~daryn], any updates? The patch is ready to go in. There are several patches 
depending on this one therefore I'd like to keep moving forward. Thanks.

 Decouple recording edit logs from FSDirectory
 -

 Key: HDFS-6315
 URL: https://issues.apache.org/jira/browse/HDFS-6315
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, 
 HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, 
 HDFS-6315.005.patch


 Currently both FSNamesystem and FSDirectory record edit logs. This design 
 requires both FSNamesystem and FSDirectory to be tightly coupled together to 
 implement a durable namespace.
 This jira proposes to separate the responsibility of implementing the 
 namespace and providing durability with edit logs. Specifically, FSDirectory 
 implements the namespace (which should have no edit log operations), and 
 FSNamesystem implement durability by recording the edit logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6512) Unnecessary synchronization on collection used for test

2014-06-10 Thread Benoy Antony (JIRA)
Benoy Antony created HDFS-6512:
--

 Summary: Unnecessary synchronization on collection used for test
 Key: HDFS-6512
 URL: https://issues.apache.org/jira/browse/HDFS-6512
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor


The function _SecurityUtil.getKerberosInfo()_  is a function used during 
authentication and authorization. 

It has two synchronized blocks and one of them is on testProviders. This is an 
unnecessary lock given that the testProviders is empty in real scenario.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6323) Compress files in transit for fs -get and -put operations

2014-06-10 Thread Tom Panning (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026863#comment-14026863
 ] 

Tom Panning commented on HDFS-6323:
---

Hi Andrew,

It could be related to those two issues, depending on how they are implemented. 
If files are compressed transparently, and they remain compressed until the 
-get and -put commands uncompress them on the local machine, that would solve 
my problem. But if the files are transparently uncompressed as they are read of 
the HDFS disk, then that wouldn't.

Allowing webhdfs to use compression would also solve my problem.

 Compress files in transit for fs -get and -put operations
 -

 Key: HDFS-6323
 URL: https://issues.apache.org/jira/browse/HDFS-6323
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tom Panning
Priority: Minor

 For the {{hadoop fs -get}} and {{hadoop fs -put}} commands, it would be nice 
 if there was an option to compress the file(s) in transit. For some people, 
 the Hadoop cluster is far away (in terms of the network) or must be accessed 
 through a VPN, and many files that are put on or retrieved from the cluster 
 are very large compared to the available bandwidth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-10 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6482:
---

Release Note: The directory structure for finalized replicas on DNs has 
been changed. Now, the directory that a finalized replica goes in is determined 
uniquely by its ID. Specifically, we use a three-level directory structure, 
with the 32nd through 25th bits of a block ID identifying the correct directory 
at the first level, the 24th through 17th bits identifying the correct 
directory at the second level, and the 16th through 8th identifying the correct 
directory at the third level.
  Status: Patch Available  (was: Open)

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6364) Incorrect check for unknown datanode in Balancer

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026985#comment-14026985
 ] 

Hadoop QA commented on HDFS-6364:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649629/HDFS-6364.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7076//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7076//console

This message is automatically generated.

 Incorrect check for unknown datanode in Balancer
 

 Key: HDFS-6364
 URL: https://issues.apache.org/jira/browse/HDFS-6364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6364-6441.patch, HDFS-6364.patch, HDFS-6364.patch


 The Balancer makes a check to see if a block's location is known datanode. 
 But the variable it uses to check is wrong.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027018#comment-14027018
 ] 

Hadoop QA commented on HDFS-6439:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649625/HDFS-6439.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

  org.apache.hadoop.fs.TestHdfsNativeCodeLoader

  The following test timeouts occurred in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

org.apache.hadoop.oncrpc.TestFrameDecoder

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7075//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7075//console

This message is automatically generated.

 NFS should not reject NFS requests to the NULL procedure whether port 
 monitoring is enabled or not
 --

 Key: HDFS-6439
 URL: https://issues.apache.org/jira/browse/HDFS-6439
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Brandon Li
Assignee: Aaron T. Myers
 Attachments: HDFS-6439.003.patch, HDFS-6439.patch, HDFS-6439.patch, 
 linux-nfs-disallow-request-from-nonsecure-port.pcapng, 
 mount-nfs-requests.pcapng


 As discussed in HDFS-6406, this JIRA is to track the follow update:
 1. Port monitoring is the feature name with traditional NFS server and we may 
 want to make the config property (along with related variable 
 allowInsecurePorts) something as dfs.nfs.port.monitoring. 
 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt):
 {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT 
 reject NFS requests to the NULL procedure (procedure number 0). See 
 subsection 2.3.1, NULL procedure for a complete explanation. {quote}
 I do notice that NFS clients (most time) send mount NULL and nfs NULL from 
 non-privileged port. If we deny NULL call in mountd or nfs server, the client 
 can't mount the export even as user root.
 3. it would be nice to have the user guide updated for the port monitoring 
 feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6364) Incorrect check for unknown datanode in Balancer

2014-06-10 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6364:


  Resolution: Fixed
   Fix Version/s: 2.5.0
  3.0.0
Target Version/s: 2.5.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I committed this to trunk and branch-2. Thanks for finding and fixing this 
[~benoyantony]!

 Incorrect check for unknown datanode in Balancer
 

 Key: HDFS-6364
 URL: https://issues.apache.org/jira/browse/HDFS-6364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6364-6441.patch, HDFS-6364.patch, HDFS-6364.patch


 The Balancer makes a check to see if a block's location is known datanode. 
 But the variable it uses to check is wrong.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6364) Incorrect check for unknown datanode in Balancer

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027037#comment-14027037
 ] 

Hudson commented on HDFS-6364:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5678 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5678/])
HDFS-6364. Incorrect check for unknown datanode in Balancer. (Contributed by 
Benoy Antony) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601771)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Incorrect check for unknown datanode in Balancer
 

 Key: HDFS-6364
 URL: https://issues.apache.org/jira/browse/HDFS-6364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6364-6441.patch, HDFS-6364.patch, HDFS-6364.patch


 The Balancer makes a check to see if a block's location is known datanode. 
 But the variable it uses to check is wrong.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5995) TestFSEditLogLoader#testValidateEditLogWithCorruptBody gets OutOfMemoryError and dumps heap.

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027038#comment-14027038
 ] 

Hadoop QA commented on HDFS-5995:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630406/HDFS-5995.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7077//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7077//console

This message is automatically generated.

 TestFSEditLogLoader#testValidateEditLogWithCorruptBody gets OutOfMemoryError 
 and dumps heap.
 

 Key: HDFS-5995
 URL: https://issues.apache.org/jira/browse/HDFS-5995
 Project: Hadoop HDFS
  Issue Type: Test
  Components: namenode, test
Affects Versions: 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-5995.1.patch


 {{TestFSEditLogLoader#testValidateEditLogWithCorruptBody}} is experiencing 
 {{OutOfMemoryError}} and dumping heap since the merge of HDFS-4685.  This 
 doesn't actually cause the test to fail, because it's a failure test that 
 corrupts an edit log intentionally.  Still, this might cause confusion if 
 someone reviews the build logs and thinks this is a more serious problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory

2014-06-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027172#comment-14027172
 ] 

Jing Zhao commented on HDFS-6315:
-

+1 for the latest patch. Any other comments [~daryn]?

 Decouple recording edit logs from FSDirectory
 -

 Key: HDFS-6315
 URL: https://issues.apache.org/jira/browse/HDFS-6315
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, 
 HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, 
 HDFS-6315.005.patch


 Currently both FSNamesystem and FSDirectory record edit logs. This design 
 requires both FSNamesystem and FSDirectory to be tightly coupled together to 
 implement a durable namespace.
 This jira proposes to separate the responsibility of implementing the 
 namespace and providing durability with edit logs. Specifically, FSDirectory 
 implements the namespace (which should have no edit log operations), and 
 FSNamesystem implement durability by recording the edit logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-06-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027179#comment-14027179
 ] 

Konstantin Shvachko commented on HDFS-6469:
---

h4.  Coordinated reads
The motivation for a configurable design based on file names is to make 
coordinated reads available to other applications without changing them.
An alternative is to provide a new option (parameter) for read operations 
specifying whether the read should be coordinated or not. Then the application 
developers rather than administrators will be in full control of what they 
coordinate.

* Journals everywhere*

If I follow your logic correctly  

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-06-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027179#comment-14027179
 ] 

Konstantin Shvachko edited comment on HDFS-6469 at 6/10/14 11:04 PM:
-

Sorry hit the submit button too early. Will repost shortly.


was (Author: shv):
h4.  Coordinated reads
The motivation for a configurable design based on file names is to make 
coordinated reads available to other applications without changing them.
An alternative is to provide a new option (parameter) for read operations 
specifying whether the read should be coordinated or not. Then the application 
developers rather than administrators will be in full control of what they 
coordinate.

* Journals everywhere*

If I follow your logic correctly  

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6386) HDFS Encryption Zones

2014-06-10 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6386:
---

Status: Patch Available  (was: Reopened)

The .4 patch implements encryption zones on the server side. Included in this 
is (1) setting the xattr for an EZ, validating that an EZ being created or 
deleted is empty, existing, and the root of an EZ, (2) setting the appropriate 
xattr for any files created within an EZ, (3) on the client side, determining 
if a file refers to an encrypted file and if so, setting up the right 
Crypto{Input,Output}Streams for encrypting/decrypting the data, (4) removing 
the earlier (temporary) KEY and IV constants, (5) adds several unit tests for 
the above.

This patch allows us to demonstrate end-to-end encryption.

 HDFS Encryption Zones
 -

 Key: HDFS-6386
 URL: https://issues.apache.org/jira/browse/HDFS-6386
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)


 Define the required security xAttributes for directories and files within an 
 encryption zone and how they propagate to children. Implement the logic to 
 create/delete encryption zones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder updated HDFS-6379:
-

Status: In Progress  (was: Patch Available)

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder updated HDFS-6379:
-

Attachment: (was: jira-HDFS-6379.patch)

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder updated HDFS-6379:
-

Attachment: jira-HDFS-6379.patch

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6508) Add an XAttr to specify the cipher mode

2014-06-10 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027213#comment-14027213
 ] 

Charles Lamb commented on HDFS-6508:


[~tucu00] also says: The IV length may depend on the algorithm being used 
(AES-GCM allows arbitrary lengths starting a 16bytes, if 16 it does the same 
logic as AES-CTR mode -uses first 8 bytes as counter-, if greater than 16 it 
does a funny hash computation of it to get 16bytes and then the counter logic).

Building on my previous email about the enum for the encryption mode, we could 
put the length of the IV in the encryption-mode enum itself. Then we can remove 
it from the the CryptoCodec and use the constant itself.

 Add an XAttr to specify the cipher mode
 ---

 Key: HDFS-6508
 URL: https://issues.apache.org/jira/browse/HDFS-6508
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Reporter: Charles Lamb
Assignee: Charles Lamb

 We should specify the cipher mode in the xattrs for compatibility sake. 
 Crypto changes over time and we need to prepare for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder updated HDFS-6379:
-

Attachment: jira-HDFS-6379.patch

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch, jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder updated HDFS-6379:
-

Status: Patch Available  (was: In Progress)

Updated patch with test case for ACLs turned off.

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch, jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027226#comment-14027226
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649627/HDFS-6482.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestBlockMissingException
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestMissingBlocksAlert
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
  org.apache.hadoop.hdfs.TestBlockReaderLocalLegacy
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages
  org.apache.hadoop.hdfs.protocol.TestLayoutVersion
  
org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart
  org.apache.hadoop.hdfs.server.namenode.TestXAttrConfigFlag
  org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
  org.apache.hadoop.hdfs.TestFileCorruption
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.TestBlockReaderLocal
  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  org.apache.hadoop.hdfs.server.datanode.TestCachingStrategy
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.TestDFSClientRetries
  
org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool
org.apache.hadoop.hdfs.server.namenode.TestFsck
org.apache.hadoop.hdfs.TestReplication
org.apache.hadoop.hdfs.TestDatanodeBlockScanner

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7078//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7078//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW

2014-06-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027252#comment-14027252
 ] 

Andrew Wang commented on HDFS-3493:
---

Thanks for taking this on Juan, fix looks good. Just a few nitty comments:

* some whitespace only changes in BlockManager
* some lines longer than 80 chars
* Maybe fold {{minReplicationSatisfied }} into {{corruptedDuringWrite}} like 
how you assign {{hasMoreCorruptReplicas}} for parity.
* In the test, do we need that sleep(1)? I'm always wary of sleeps, since 
they lead to test flakiness.
* I think the comment should also read something like DNs will detect new 
dummy blocks on restart. Would also be good to drop a comment about what 
you're doing with creating dummy blocks.
* I like to put nice conservative timeouts on my tests, e.g. 
{{@Test(timeout=12}}.

+1 pending these though. [~vinayrpet], maybe you'd like to take a look too?

 Replication is not happened for the block (which is recovered and in 
 finalized) to the Datanode which has got the same block with old generation 
 timestamp in RBW
 -

 Key: HDFS-3493
 URL: https://issues.apache.org/jira/browse/HDFS-3493
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 2.0.5-alpha
Reporter: J.Andreina
Assignee: Juan Yu
 Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, HDFS-3493.patch


 replication factor= 3, block report interval= 1min and start NN and 3DN
 Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
 Step 2:Stopped DN3
 Step 3:recovery happens and time stamp updated(blk_ts2)
 Step 4:close the file
 Step 5:blk_ts2 is finalized and available in DN1 and Dn2
 Step 6:now restarted DN3(which has got blk_ts1 in rbw)
 From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
 ask DN3 to make the block as corrupt .
 Replication of blk_ts2 to DN3 is not happened.
 NN logs:
 
 {noformat}
 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
 NameSystem.addToCorruptReplicasMap: duplicate requested for 
 blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
 /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
 COMPLETE block's genstamp in block map 1008
 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
 DatanodeRegistration(XX.XX.XX.XX, 
 storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
 ipcPort=50277, 
 storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
  blocks: 2, processing time: 1 msecs
 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
 blk_3927215081484173742_1008 from neededReplications as it has enough 
 replicas.
 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
 NameSystem.addToCorruptReplicasMap: duplicate requested for 
 blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
 /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
 COMPLETE block's genstamp in block map 1008
 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
 DatanodeRegistration(XX.XX.XX.XX, 
 storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
 ipcPort=50277, 
 storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
  blocks: 2, processing time: 1 msecs
 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
 able to place enough replicas, still in need of 1 to reach 1
 For more information, please enable DEBUG log level on 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
 {noformat}
 fsck Report
 ===
 {noformat}
 /file21:  Under replicated 
 BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
 Replicas is 3 but found 2 replica(s).
 .Status: HEALTHY
  Total size:  495 B
  Total dirs:  1
  Total files: 3
  Total blocks (validated):3 (avg. block size 165 B)
  Minimally replicated blocks: 3 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 1 (33.32 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  1
  Average block replication:   2.0
  Corrupt blocks:  0
  Missing replicas:1 (14.285714 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
 The filesystem under path '/' is HEALTHY
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6379) HTTPFS - Implement ACLs support

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027265#comment-14027265
 ] 

Hadoop QA commented on HDFS-6379:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649701/jira-HDFS-6379.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-httpfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7079//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7079//console

This message is automatically generated.

 HTTPFS - Implement ACLs support
 ---

 Key: HDFS-6379
 URL: https://issues.apache.org/jira/browse/HDFS-6379
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Alejandro Abdelnur
Assignee: Mike Yoder
 Fix For: 2.4.0

 Attachments: jira-HDFS-6379.patch, jira-HDFS-6379.patch


 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
 This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-06-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027179#comment-14027179
 ] 

Konstantin Shvachko edited comment on HDFS-6469 at 6/11/14 12:32 AM:
-

Todd, interesting points and you actually answered most of them yourself in 
your comments.

h4.  Coordinated reads
The motivation for a configurable design based on file names is to make 
coordinated reads available to other applications without changing them.
An alternative is to provide a new option (parameter) for read operations 
specifying whether the read should be coordinated or not. Then the application 
developers rather than administrators will be in full control of what they 
coordinate.

h4.  Journals everywhere
If I follow your logic correctly QJM being Paxos-based uses a journal by 
itself, so we are not increasing journaling here. When you look at the bigger 
picture we see more journals around. HBase uses WAL along with NN edits, which 
by itself persisted in ext4 a journaling file system.
As you said if you need to separate them one can use different drives, or SSDs.
Besides as I said in a previous comment one can choose to eliminate CNode edits 
completely.

h4.  Determinism
Determinism is not as hard as it may seem. Not harder than multi-grain locking. 
Say, you need to watch that incremental counters like genStamp are called only 
in agreements and never in proposals. Similar to how you watch that object 
locks are acquired in the right order. Most of that is already in the NameNode, 
thanks to StandbyNode implementation.

h4.  AA vs AS HA
There are several advantages of AA approach over the evolution of the current 
one outlined in your comment. All CNodes are writeable while in current 
approach only the Active NN is.
* If GC hits the active NN service is interrupted. CNodes can continue as other 
nodes can process writes.
* Reads from an SBN are always stale. And in order to write everybody should go 
to the active. So they will be writing to the namespace from the future so to 
speak. I think with mixed workloads all clients will end up working with the 
active NN only, and there won't be enough load balancing.

As you mentioned your design will be addressing the same problems as 
ConsensusNode. The question is what you would rather have in the end: 
active-active or active-read-only-standby.

h4.  why this design makes it any easier to implement a distributed namespace?
I meant the advantage of introducing of a coordination engine in general, not 
the CNode itself. 
I meant that making coordinated updates involving different parts of a 
distributed namespace is rather trivial with a coordination engine. Like an 
atomic rename, that is a move of a file from one parent to another when the 
parents are on different partitions.

h6.  locking
~This should really belong to a different issue. I just want to mention here 
that I don't see that one contradicts another. On the contrary they can be very 
much in sync. I remember we talked about an optimization schema for coordinated 
namepsace when different parts of the namespace are coordinated independently 
(and in parallel under different state machines.~


was (Author: shv):
Sorry hit the submit button too early. Will repost shortly.

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-10 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027325#comment-14027325
 ] 

Zesheng Wu commented on HDFS-6382:
--

bq. Even if it's not implemented at first, we should think about the 
configuration required here. I think we want the ability to email the admins 
when things go wrong. Possibly the notifier could be pluggable or have several 
policies. There was nothing in the doc about configuration in general, which I 
think we need to fix. For example, how is rate limiting configurable? How do we 
notify admins that the rate is too slow to finish in the time given?
OK, I will update the document and post a new version soon.

bq. You can't delete a file in HDFS unless you have write permission on the 
containing directory. Whether you have write permission on the file itself is 
not relevant. So I would expect the same semantics here (probably enforced by 
setfacl itself).
That's reasonable, I'll figure it out clearly in the document.

 HDFS File/Directory TTL
 ---

 Key: HDFS-6382
 URL: https://issues.apache.org/jira/browse/HDFS-6382
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-TTL-Design.pdf


 In production environment, we always have scenario like this, we want to 
 backup files on hdfs for some time and then hope to delete these files 
 automatically. For example, we keep only 1 day's logs on local disk due to 
 limited disk space, but we need to keep about 1 month's logs in order to 
 debug program bugs, so we keep all the logs on hdfs and delete logs which are 
 older than 1 month. This is a typical scenario of HDFS TTL. So here we 
 propose that hdfs can support TTL.
 Following are some details of this proposal:
 1. HDFS can support TTL on a specified file or directory
 2. If a TTL is set on a file, the file will be deleted automatically after 
 the TTL is expired
 3. If a TTL is set on a directory, the child files and directories will be 
 deleted automatically after the TTL is expired
 4. The child file/directory's TTL configuration should override its parent 
 directory's
 5. A global configuration is needed to configure that whether the deleted 
 files/directories should go to the trash or not
 6. A global configuration is needed to configure that whether a directory 
 with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL

2014-06-10 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6382:
-

Attachment: HDFS-TTL-Design -2.pdf

Updated the documents to address Colin's suggestions.
Thanks Colin for your valuable suggestions:)

 HDFS File/Directory TTL
 ---

 Key: HDFS-6382
 URL: https://issues.apache.org/jira/browse/HDFS-6382
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf


 In production environment, we always have scenario like this, we want to 
 backup files on hdfs for some time and then hope to delete these files 
 automatically. For example, we keep only 1 day's logs on local disk due to 
 limited disk space, but we need to keep about 1 month's logs in order to 
 debug program bugs, so we keep all the logs on hdfs and delete logs which are 
 older than 1 month. This is a typical scenario of HDFS TTL. So here we 
 propose that hdfs can support TTL.
 Following are some details of this proposal:
 1. HDFS can support TTL on a specified file or directory
 2. If a TTL is set on a file, the file will be deleted automatically after 
 the TTL is expired
 3. If a TTL is set on a directory, the child files and directories will be 
 deleted automatically after the TTL is expired
 4. The child file/directory's TTL configuration should override its parent 
 directory's
 5. A global configuration is needed to configure that whether the deleted 
 files/directories should go to the trash or not
 6. A global configuration is needed to configure that whether a directory 
 with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)