[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Status: Patch Available (was: In Progress) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: jira-HDFS-6379.patch HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021718#comment-14021718 ] Hadoop QA commented on HDFS-6379: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648919/jira-HDFS-6379.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1283 javac compiler warnings (more than the trunk's current 1277 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.fs.http.client.TestHttpFSFileSystemLocalFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7059//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7059//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7059//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7059//console This message is automatically generated. HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.
[ https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-6494: - Attachment: hedged-read-bug.patch In some case, the hedged read will lead to client infinite wait. -- Key: HDFS-6494 URL: https://issues.apache.org/jira/browse/HDFS-6494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: LiuLei Assignee: Liang Xie Attachments: hedged-read-bug.patch When I use hedged read, If there is only one live datanode, the reading from the datanode throw TimeoutException and ChecksumException., the Client will infinite wait. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.
[ https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021729#comment-14021729 ] LiuLei commented on HDFS-6494: -- Hi Liang, I upload one patch, I hope that is helpful for you. In some case, the hedged read will lead to client infinite wait. -- Key: HDFS-6494 URL: https://issues.apache.org/jira/browse/HDFS-6494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: LiuLei Assignee: Liang Xie Attachments: hedged-read-bug.patch When I use hedged read, If there is only one live datanode, the reading from the datanode throw TimeoutException and ChecksumException., the Client will infinite wait. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated HDFS-5442: -- Attachment: Disaster Recovery Solution for Hadoop.pdf Updated the design doc, add some detailed implementation. Zero loss HDFS data replication for multiple datacenters Key: HDFS-5442 URL: https://issues.apache.org/jira/browse/HDFS-5442 Project: Hadoop HDFS Issue Type: Improvement Reporter: Avik Dey Assignee: Dian Fu Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup. Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6465) Enable the configuration of multiple clusters
[ https://issues.apache.org/jira/browse/HDFS-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021786#comment-14021786 ] Dian Fu commented on HDFS-6465: --- update some design details about configurations: requirements: 1. Existing deployments must be able to use the existing configuration without any change. 2. As many as possible the configurations for different clusters must be the same. The special configuration required for different clusters should be minimal. Configurations added: • DFS_REGION_ID(dfs.region.id) : the region id of current cluster • DFS_REGIONS(dfs.regions) : the region ids of all clusters, including both the primary cluster and mirror clusters • DFS_REGION_PRIMARY(dfs.region.primary) : the region id of primary cluster Configurations must be suffixed with regionId: DFS_NAMENODE_RPC_ADDRESS_KEY, DFS_NAMENODE_SERVICE_RPC_ADDRESS_KEY, DFS_NAMENODE_HTTP_ADDRESS_KEY, DFS_NAMENODE_HTTPS_ADDRESS_KEY, DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY and DFS_NAMENODE_BACKUP_ADDRESS_KEY Configurations could be suffixed with regionId or not. These include all the configurations in NameNode.NAMENODE_SPECIFIC_KEYS and NameNode.NAMESERVICE_SPECIFIC_KEYS except the above configurations which must be suffixed with regionId: DFS_NAMENODE_RPC_BIND_HOST_KEY, DFS_NAMENODE_NAME_DIR_KEY, DFS_NAMENODE_EDITS_DIR_KEY, DFS_NAMENODE_SHARED_EDITS_DIR_KEY, DFS_NAMENODE_CHECKPOINT_DIR_KEY, DFS_NAMENODE_CHECKPOINT_EDITS_DIR_KEY, DFS_NAMENODE_SERVICE_RPC_BIND_HOST_KEY, DFS_NAMENODE_HTTP_BIND_HOST_KEY, DFS_NAMENODE_HTTPS_BIND_HOST_KEY, DFS_NAMENODE_KEYTAB_FILE_KEY, DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_KEY, DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY, DFS_NAMENODE_BACKUP_HTTP_ADDRESS_KEY, DFS_NAMENODE_BACKUP_SERVICE_RPC_ADDRESS_KEY, DFS_NAMENODE_KERBEROS_PRINCIPAL_KEY, DFS_NAMENODE_KERBEROS_INTERNAL_SPNEGO_PRINCIPAL_KEY, DFS_HA_FENCE_METHODS_KEY, DFS_HA_ZKFC_PORT_KEY and DFS_HA_AUTO_FAILOVER_ENABLED_KEY The above configurations can be configured in the following format to distinguish between clusters: configuration key.nameservice id.namenode id.region id If a configuration with a region id as suffix cannot be found, the configuration without region id as suffix will be used instead. All other configurations which aren’t mentioned should not be suffixed with regionId. Enable the configuration of multiple clusters - Key: HDFS-6465 URL: https://issues.apache.org/jira/browse/HDFS-6465 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Dian Fu Assignee: Dian Fu Attachments: HDFS-6465.1.patch, HDFS-6465.2.patch, HDFS-6465.patch Tracks the changes required for configuration DR. configurations added: DFS_REGION_ID(dfs.region.id) : the region id of current cluster DFS_REGIONS(dfs.regions) : the region ids of all clusters, including both the primary cluster and mirror cluster DFS_REGION_PRIMARY(dfs.region.primary) : the region id of primary cluster configurations modified: The configurations in NAMENODE.NAMENODE_SPECIFIC_KEYS can be configured in the following format to distinguish between clusters. If a configuration with a region id as suffix cannot be found, the configuration without region id as suffix will be used instead: configuration key.nameservice id.namenode id.region id The configurations in NAMENODE.NAMESERVICE_SPECIFIC_KEYS can be configured in the following format to distinguish between clusters. If a configuration with a region id as suffix cannot be found, the configuration without region id as suffix will be used instead: configuration key.nameservice id.region id -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6382: - Attachment: HDFS-TTL-Design.pdf An initial version of design doc. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
Zesheng Wu created HDFS-6503: Summary: Fix typo of DFSAdmin restoreFailedStorage Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
[ https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6503: - Status: Patch Available (was: Open) Fix typo of DFSAdmin restoreFailedStorage - Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Attachments: HDFS-6503.patch Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
[ https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zesheng Wu updated HDFS-6503: - Attachment: HDFS-6503.patch Fix typo of DFSAdmin restoreFailedStorage - Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Attachments: HDFS-6503.patch Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6481: - Assignee: Ted Yu DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs --- Key: HDFS-6481 URL: https://issues.apache.org/jira/browse/HDFS-6481 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Assignee: Ted Yu Attachments: hdfs-6481-v1.txt Ian Brooks reported the following stack trace: {code} 2014-06-03 13:05:03,915 WARN [DataStreamer for file /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475) 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: syncer encountered error, will retry. txid=211 org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) at
[jira] [Updated] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6481: - Target Version/s: 2.5.0 DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs --- Key: HDFS-6481 URL: https://issues.apache.org/jira/browse/HDFS-6481 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Assignee: Ted Yu Attachments: hdfs-6481-v1.txt Ian Brooks reported the following stack trace: {code} 2014-06-03 13:05:03,915 WARN [DataStreamer for file /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475) 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: syncer encountered error, will retry. txid=211 org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) at
[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs
[ https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025216#comment-14025216 ] Kihwal Lee commented on HDFS-6481: -- We can add sanity checks, but this should not happen unless we have a bug somewhere. The root cause needs to be addressed. DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs --- Key: HDFS-6481 URL: https://issues.apache.org/jira/browse/HDFS-6481 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Assignee: Ted Yu Attachments: hdfs-6481-v1.txt Ian Brooks reported the following stack trace: {code} 2014-06-03 13:05:03,915 WARN [DataStreamer for file /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200 block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475) 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: syncer encountered error, will retry. txid=211 org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 at
[jira] [Commented] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
[ https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025281#comment-14025281 ] Hadoop QA commented on HDFS-6503: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649235/HDFS-6503.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7060//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7060//console This message is automatically generated. Fix typo of DFSAdmin restoreFailedStorage - Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Attachments: HDFS-6503.patch Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
[ https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025285#comment-14025285 ] Zesheng Wu commented on HDFS-6503: -- Just fix typo, no need to add new tests. Fix typo of DFSAdmin restoreFailedStorage - Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Attachments: HDFS-6503.patch Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6403) Add metrics for log warnings reported by HADOOP-9618
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025326#comment-14025326 ] Yongjun Zhang commented on HDFS-6403: - HI [~tlipcon], as we chatted earlier, appreciate if you could help reviewing the patch. Thanks. Add metrics for log warnings reported by HADOOP-9618 Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025373#comment-14025373 ] Arpit Agarwal commented on HDFS-6159: - Hi [~djp], looks unrelated to HDFS-6362 from a quick look. I also took a quick look at HDFS-6424 and it appears unrelated. Please feel free a separate Jira for the test failure and attach the logs/analysis. TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success -- Key: HDFS-6159 URL: https://issues.apache.org/jira/browse/HDFS-6159 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, logs.txt The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative false failure if there is(are) data block(s) losing after balancer successfuly finishes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025391#comment-14025391 ] Uma Maheswara Rao G commented on HDFS-2006: --- XAttr support for DistCP(MAPREDUCE-5898) committed now to trunk. So, I plan to merge this to branch-2. Do we need separate voting for this? What do you say [~cnauroth] and [~andrew.wang] ? ability to support storing extended attributes per file --- Key: HDFS-2006 URL: https://issues.apache.org/jira/browse/HDFS-2006 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: dhruba borthakur Assignee: Yi Liu Fix For: 3.0.0 Attachments: ExtendedAttributes.html, HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch It would be nice if HDFS provides a feature to store extended attributes for files, similar to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. The challenge is that it has to be done in such a way that a site not using this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage
[ https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025418#comment-14025418 ] Akira AJISAKA commented on HDFS-6503: - +1 (non-binding). Fix typo of DFSAdmin restoreFailedStorage - Key: HDFS-6503 URL: https://issues.apache.org/jira/browse/HDFS-6503 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Priority: Minor Attachments: HDFS-6503.patch Fix typo: restoreFaileStorage should be restoreFailedStorage -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025438#comment-14025438 ] Andrew Wang commented on HDFS-6257: --- Idea look good, the current check definitely seems racy. Only q, maybe we should try and check more deterministically, e.g. pause DN cache reports and wait for a few refresh intervals (1s each) before doing the check. TestCacheDirectives#testExceedsCapacity fails occasionally in trunk --- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025453#comment-14025453 ] Andrew Wang commented on HDFS-2006: --- Hey Uma, we haven't done a vote for previous branch-2 merges (e.g. caching, ACLs). If you post a patch or a link to a branch, I'd be happy to review. Unless you already plan to do something similar, I can also do a full branch-2 test run on our internal jenkins. ability to support storing extended attributes per file --- Key: HDFS-2006 URL: https://issues.apache.org/jira/browse/HDFS-2006 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: dhruba borthakur Assignee: Yi Liu Fix For: 3.0.0 Attachments: ExtendedAttributes.html, HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch It would be nice if HDFS provides a feature to store extended attributes for files, similar to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. The challenge is that it has to be done in such a way that a site not using this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc
[ https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025455#comment-14025455 ] Daryn Sharp commented on HDFS-2856: --- Chris asked that I take a look, so I'll try to review this week. Fix block protocol so that Datanodes don't require root or jsvc --- Key: HDFS-2856 URL: https://issues.apache.org/jira/browse/HDFS-2856 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, security Affects Versions: 3.0.0, 2.4.0 Reporter: Owen O'Malley Assignee: Chris Nauroth Attachments: Datanode-Security-Design.pdf, Datanode-Security-Design.pdf, Datanode-Security-Design.pdf, HDFS-2856.1.patch, HDFS-2856.prototype.patch Since we send the block tokens unencrypted to the datanode, we currently start the datanode as root using jsvc and get a secure ( 1024) port. If we have the datanode generate a nonce and send it on the connection and the sends an hmac of the nonce back instead of the block token it won't reveal any secrets. Thus, we wouldn't require a secure port and would not require root or jsvc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025472#comment-14025472 ] Andrew Wang commented on HDFS-6460: --- Hey Yongjun, thanks for working on this. Just one review comment, two of the DNs have the same IP of 11.11.11.11. Otherwise +1 pending Jenkins. To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025495#comment-14025495 ] Colin Patrick McCabe commented on HDFS-6257: The current check should always succeed if the code being tested is correct, so it's not racy in that sense. We could wait for more DN cache reports, but since the DNs are full they shouldn't change. Since we test the cache reports elsewhere, I think it's probably fine as-is, what do you think? TestCacheDirectives#testExceedsCapacity fails occasionally in trunk --- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025499#comment-14025499 ] Jing Zhao commented on HDFS-6315: - The patch looks good to me in general. Some comments: # After moving persistBlocks/persistNewBlocks/closeFile from FSDirectory to FSNamesystem, we may no longer need to add DIR* FSDirectory into the log information. # Looks like FSNamesystem#persistBlocks(INodeFile, boolean) can be removed. We can just call persistBlocks(String, INodeFile, boolean) instead. # In FSNamesystem#setQuota logSync cannot be called inside of the write lock: {code} + INodeDirectory changed = dir.setQuota(path, nsQuota, dsQuota); + if (changed != null) { +final Quota.Counts q = changed.getQuotaCounts(); +getEditLog().logSetQuota(path, +q.get(Quota.NAMESPACE), q.get(Quota.DISKSPACE)); +getEditLog().logSync(); + } } finally { writeUnlock(); } -getEditLog().logSync(); {code} # A typo in the java comment: {code} - // if src indicates a snapshot file, we need to make sure the returned + // if src inSicates a snapshot file, we need to make sure the returned {code} Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: jira-HDFS-6379.patch HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch, jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Status: In Progress (was: Patch Available) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: (was: jira-HDFS-6379.patch) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: (was: jira-HDFS-6379.patch) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025502#comment-14025502 ] Andrew Wang commented on HDFS-6257: --- Hmm, I guess good enough. +1 thanks colin. TestCacheDirectives#testExceedsCapacity fails occasionally in trunk --- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: jira-HDFS-6379.patch HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: (was: jira-HDFS-6379.patch) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Status: Patch Available (was: In Progress) HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6257: --- Summary: TestCacheDirectives#testExceedsCapacity fails occasionally (was: TestCacheDirectives#testExceedsCapacity fails occasionally in trunk) TestCacheDirectives#testExceedsCapacity fails occasionally -- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Yoder updated HDFS-6379: - Attachment: jira-HDFS-6379.patch HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond
[ https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025504#comment-14025504 ] Andrew Wang commented on HDFS-6493: --- I think we should keep the default value to disabled, but the property and this value should be documented in hdfs-default.xml. It'd also be nice (if not true already) to pretty print this value on NN startup, e.g. 30 minutes rather than 1800 seconds. It'd actually be nice follow-on work to look for similarly unfriendly values in the logs and pretty printing them. There are some time-related function in DFSUtil (e.g. durationToString, datetoIso8601String), but feel free to write your own functions too. Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond -- Key: HDFS-6493 URL: https://issues.apache.org/jira/browse/HDFS-6493 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Assignee: Juan Yu Priority: Trivial Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, the delay will be at least 30 minutes or even hours. it's not very user friendly to use milliseconds when it's likely measured in hours. I suggest to make the following change 1. change the unit of this config to second 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms to dfs.namenode.startup.delay.block.deletion.sec 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 minutes, one hour? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6257: --- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) committed, thanks TestCacheDirectives#testExceedsCapacity fails occasionally -- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6399) FSNamesystem ACL operations should check isPermissionEnabled
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-6399: - Assignee: Chris Nauroth (was: Charles Lamb) FSNamesystem ACL operations should check isPermissionEnabled Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6330) Move mkdirs() to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6330: Summary: Move mkdirs() to FSNamesystem (was: Move mkdir() to FSNamesystem) Move mkdirs() to FSNamesystem - Key: HDFS-6330 URL: https://issues.apache.org/jira/browse/HDFS-6330 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6330.000.patch, HDFS-6330.001.patch Currently mkdir() automatically creates all ancestors for a directory. This is implemented in FSDirectory, by calling unprotectedMkdir() along the path. This jira proposes to move the function to FSNamesystem to simplify the primitive that FSDirectory needs to provide. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6315: - Attachment: HDFS-6315.005.patch Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6478) RemoteException can't be retried properly for non-HA scenario
[ https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6478: -- Attachment: HDFS-6478.patch The patch has the followings: 1. Modify the proxy chain order for NamenodeProtocol and ClientProtocol so that NamenodeProtocolTranslatorPB/ClientNamenodeProtocolTranslatorPB directly call NamenodeProtocolPB and ClientNamenodeProtocolPB for non-HA case. 2. Update unit test TestFileCreation to verify retry count. This depends on HADOOP-10673, thus the patch also include HADOOP-10673 so that the patch can be submitted to run unit test. 3. Simplify the remoteException policy setup in NameNodeProxies. 4. Remove unnecessary retry policy for method create in DatanodeProtocolClientSideTranslatorPB. 5. DatanodeProtocolClientSideTranslatorPB still has the old proxy order. Leave it as it is given DataNodeProtocol doesn't do retry. We can open a separate jira to DataNodeProtocol retry if that is necessary. RemoteException can't be retried properly for non-HA scenario - Key: HDFS-6478 URL: https://issues.apache.org/jira/browse/HDFS-6478 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6478.patch For HA case, the call stack is DFSClient - RetryInvocationHandler - ClientNamenodeProtocolTranslatorPB - ProtobufRpcEngine. ProtobufRpcEngine. ProtobufRpcEngine throws ServiceException and expects the caller to unwrap it; ClientNamenodeProtocolTranslatorPB is the component that takes care of that. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy26.getFileInfo at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo at sun.reflect.GeneratedMethodAccessor24.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy27.getFileInfo at org.apache.hadoop.hdfs.DFSClient.getFileInfo at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus {noformat} However, for non-HA case, the call stack is DFSClient - ClientNamenodeProtocolTranslatorPB - RetryInvocationHandler - ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be retried properly. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy9.getListing at sun.reflect.NativeMethodAccessorImpl.invoke0 at sun.reflect.NativeMethodAccessorImpl.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy9.getListing at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing at org.apache.hadoop.hdfs.DFSClient.listPaths {noformat} Perhaps, we can fix it by have NN wrap RetryInvocationHandler around ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6478) RemoteException can't be retried properly for non-HA scenario
[ https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6478: -- Status: Patch Available (was: Open) RemoteException can't be retried properly for non-HA scenario - Key: HDFS-6478 URL: https://issues.apache.org/jira/browse/HDFS-6478 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6478.patch For HA case, the call stack is DFSClient - RetryInvocationHandler - ClientNamenodeProtocolTranslatorPB - ProtobufRpcEngine. ProtobufRpcEngine. ProtobufRpcEngine throws ServiceException and expects the caller to unwrap it; ClientNamenodeProtocolTranslatorPB is the component that takes care of that. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy26.getFileInfo at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo at sun.reflect.GeneratedMethodAccessor24.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy27.getFileInfo at org.apache.hadoop.hdfs.DFSClient.getFileInfo at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus {noformat} However, for non-HA case, the call stack is DFSClient - ClientNamenodeProtocolTranslatorPB - RetryInvocationHandler - ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be retried properly. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy9.getListing at sun.reflect.NativeMethodAccessorImpl.invoke0 at sun.reflect.NativeMethodAccessorImpl.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy9.getListing at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing at org.apache.hadoop.hdfs.DFSClient.listPaths {noformat} Perhaps, we can fix it by have NN wrap RetryInvocationHandler around ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6399) Add note about setfacl in HDFS permissions guide
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6399: -- Summary: Add note about setfacl in HDFS permissions guide (was: FSNamesystem ACL operations should check isPermissionEnabled) Add note about setfacl in HDFS permissions guide Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6315: - Attachment: (was: HDFS-6315.005.patch) Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6399) Add note about setfacl in HDFS permissions guide
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6399: -- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks Chris Add note about setfacl in HDFS permissions guide Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Chris Nauroth Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6399) FSNamesystem ACL operations should check isPermissionEnabled
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025509#comment-14025509 ] Andrew Wang commented on HDFS-6399: --- +1 thanks chris, will commit shortly. FSNamesystem ACL operations should check isPermissionEnabled Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025541#comment-14025541 ] Hudson commented on HDFS-6257: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5668 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5668/]) HDFS-6257. TestCacheDirectives#testExceedsCapacity fails occasionally (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601473) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java TestCacheDirectives#testExceedsCapacity fails occasionally -- Key: HDFS-6257 URL: https://issues.apache.org/jira/browse/HDFS-6257 Project: Hadoop HDFS Issue Type: Test Components: caching Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6257.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity {code} Error Message: Namenode should not send extra CACHE commands expected:0 but was:2 Stack Trace: java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6399) Add note about setfacl in HDFS permissions guide
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025540#comment-14025540 ] Chris Nauroth commented on HDFS-6399: - Andrew, thank you for reviewing and committing. Add note about setfacl in HDFS permissions guide Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Chris Nauroth Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025516#comment-14025516 ] Haohui Mai commented on HDFS-6315: -- Thanks Jing for the review. I've uploaded the v5 patch to address Jing's comments. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6399) Add note about setfacl in HDFS permissions guide
[ https://issues.apache.org/jira/browse/HDFS-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1402#comment-1402 ] Hudson commented on HDFS-6399: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5669 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5669/]) HDFS-6399. Add note about setfacl in HDFS permissions guide. Contributed by Chris Nauroth. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601476) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm Add note about setfacl in HDFS permissions guide Key: HDFS-6399 URL: https://issues.apache.org/jira/browse/HDFS-6399 Project: Hadoop HDFS Issue Type: Bug Components: documentation, namenode Affects Versions: 2.4.0 Reporter: Charles Lamb Assignee: Chris Nauroth Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6399.1.patch, HDFS-6399.2.patch The ACL operations in FSNamesystem don't currently check isPermissionEnabled before calling checkOwner(). This patch corrects that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025559#comment-14025559 ] Arpit Agarwal commented on HDFS-6482: - {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is currently 64. I am not sure why the default was set so low. It would be good to know the reason before we change the behavior. It was quite possibly an arbitrary choice. After ~4 million blocks we would start putting more than 256 blocks in each leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I think this is fine since 4 million blocks itself is going to be very unlikely. I recall as late as Vista NTFS directory listings would get noticeably slow with thousands of files per directory. Is there any performance loss with always having three levels of subdirectories, restricting each to 256 children at the most? - Who removes empty subdirectories when blocks are deleted? - Let's avoid suffixing hex numerals to subdir for consistency with the existing naming convention. - StringBuilder looks unnecessary in {{idToBlockDir}}. - We should add a release note stating that {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is obsolete. The approach looks good and a big +1 for removing LDir. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6460) To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6460: Attachment: HDFS-6460.002.patch Hi Andrew, Thanks a lot for the review and the good catch. I'm uploading new revision to address it. To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6315: - Attachment: HDFS-6315.005.patch Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025585#comment-14025585 ] Colin Patrick McCabe commented on HDFS-6382: For the MR strategy, it seems like this could be parallelized fairly easily. For example, if you have 5 MR tasks, you can calculate the hash of each path, and then task 1 can do all the paths that are 0 mod 5, task 2 can do all the paths that are 1 mod 5, and so forth. MR also doesn't introduce extra dependencies since HDFS and MR are packaged together. I don't understand what you mean by the mapreduce strategy will have additional overheads. What overheads are you forseeing? It is true that you need to avoid overloading the NameNode. But this is a concern with any approach, not just the MR one. It would be good to see a section on this. I think the simplest way to do it is to rate-limit RPCs to the NameNode to a configurable rate. bq. \[for the standalone daemon\] The major advantage of this approach is that we don’t need any extra work to finish the TTL work, all will be done in the daemon automatically. I don't understand what you mean by this. What will be done automatically? How are you going to implement HA for the standalone daemon? I suppose if all the state is kept in HDFS, you can simply restart it when it fails. However, it seems like you need to checkpoint how far along in the FS you are, so that if you die and later get restarted, you don't have to redo the whole FS scan. This implies reading directories in alphabetical order, or similar. You also need to somehow record when the last scan was, perhaps in a file in HDFS. I don't see a lot of discussion of logging and monitoring in general. How is the user going to become aware that a file was deleted because of a TTL? Or if there is an error during the delete, how will the user know? Logging is one choice here. Creating a file in HDFS is another. The setTtl command seems reasonable. Does this need to be an administrator command? HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025608#comment-14025608 ] Hadoop QA commented on HDFS-6379: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649411/jira-HDFS-6379.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-httpfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7062//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7062//console This message is automatically generated. HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025693#comment-14025693 ] Colin Patrick McCabe commented on HDFS-6482: bq. DFS_DATANODE_NUMBLOCKS_DEFAULT is currently 64. I am not sure why the default was set so low. It would be good to know the reason before we change the behavior. It was quite possibly an arbitrary choice. So, back in the really old days (think ext2), there were performance issues for directories with a large number of files (10,000+). See wikipedia's page on ext2 here: http://en.wikipedia.org/wiki/Ext2. The LDir subdirectory mechanism was intended to alleviate this. More recent filesystems like ext4 (and recent revisions of ext3) have what's called directory indices. This basically means that there is an index which allows you to look up a particular entry in a directory in less than O(N) time. This makes having directories with a huge number of entries possible. It's still nice to have multiple directories to avoid overloading {{readdir}} (when we have to do that-- for example, to find a metadata file without knowing its genstamp) and to make inspecting things easier. Plus, it allows us to stay compatible with systems that don't handle giant directories well. bq. After ~4 million blocks we would start putting more than 256 blocks in each leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I think this is fine since 4 million blocks itself is going to be very unlikely. I recall as late as Vista NTFS directory listings would get noticeably slow with thousands of files per directory. Is there any performance loss with always having three levels of subdirectories, restricting each to 256 children at the most? It's an interesting idea, but after all, as you pointed out, even to get to 1,024 blocks per subdirectory (which still isn't thousands but is a single thousand) under James' scheme would require 16 million blocks. At that point, it seems like there will be other problems. We can always evolve the directory and metadata naming structure again once 16 million blocks is on the horizon (and we probably will have to do other things too, like investigate off-heap memory storage) Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025713#comment-14025713 ] Daryn Sharp commented on HDFS-6315: --- Catching up from summit, will look at this soon. It's sadly conflicting with the single path resolution patch I keep working on. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025740#comment-14025740 ] Jing Zhao commented on HDFS-6315: - bq. It's sadly conflicting with the single path resolution patch I keep working on. Thanks for the comments, [~daryn]. This patch only makes limited changes in FSDirectory. Most changes just move the FSEditLog#logxxx call into FSNamesystem. Thus the rebase should not be complicated I guess. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6330) Move mkdirs() to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025730#comment-14025730 ] Jing Zhao commented on HDFS-6330: - The patch looks good to me. Some minors: # Let's use this chance to remove the empty javadoc of FSDirectory#normalizePath # The following change may be unnecessary? {code} - blockManager.getDatanodeManager().clearPendingCachingCommands(); - blockManager.getDatanodeManager().setShouldSendCachingCommands(false); - // Don't want to keep replication queues when not in Active. - blockManager.clearQueues(); + if (blockManager != null) { +blockManager.getDatanodeManager().clearPendingCachingCommands(); +blockManager.getDatanodeManager().setShouldSendCachingCommands(false); +// Don't want to keep replication queues when not in Active. +blockManager.clearQueues(); + } {code} # Nit: Some lines exceed the 80 character limit (e.g., mkdirsRecursively and addSymlink). # We may need to update the log information in mkdirsRecursively since it's no longer a FSDirectory call. Move mkdirs() to FSNamesystem - Key: HDFS-6330 URL: https://issues.apache.org/jira/browse/HDFS-6330 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6330.000.patch, HDFS-6330.001.patch Currently mkdir() automatically creates all ancestors for a directory. This is implemented in FSDirectory, by calling unprotectedMkdir() along the path. This jira proposes to move the function to FSNamesystem to simplify the primitive that FSDirectory needs to provide. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025761#comment-14025761 ] Kihwal Lee commented on HDFS-6482: -- BlockIDs are sequential nowadays. With the proposed block distribution method, leaf dirs can get severely unbalanced, especially in smaller clusters. Besides the cost of looking up entries in a directory, directory lock contention can become high and hurt performance if many files are created and read from a small set of directories. I think limiting the number to 64 kind of imposed a cap on how contentious it can be. We might do better by more evenly distributing blocks. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025773#comment-14025773 ] Hadoop QA commented on HDFS-6315: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649417/HDFS-6315.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7061//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7061//console This message is automatically generated. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025783#comment-14025783 ] Daryn Sharp commented on HDFS-6315: --- Maybe it's ok, but I'll apply the patch and comment in the morning. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025786#comment-14025786 ] James Thomas commented on HDFS-6482: Thanks for the review, Arpit, and thanks for the follow-up, Colin. I want to clarify one thing -- the numbers 4 million and 16 million that both of you mention are, as far as I understand, actually numbers of blocks for the ENTIRE cluster, not just a single DN. Suppose we had a cluster of 16 million blocks (with sequential block IDs), we could in theory have a single DN with a directory as large as 1024 entries, if we got unlucky with the assignment of blocks to DNs. Assuming uniform distribution of blocks across the DNs available in the cluster and a maximum # of blocks per DN of 2^24, we have an expected # of blocks per directory of 256. I don't know how accurate this assumption is. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025802#comment-14025802 ] James Thomas commented on HDFS-6482: Kihwal, we were considering using some sort of deterministic probing (as in hash tables) to find less full directories if the initial directory for a block is full. Do you think the cost (and additional complexity) of this sort of scheme is justified given the relatively low probability (given the uniform block distribution assumption, at least) of directory blowup? Additionally, I want to note that if the total number of blocks in the cluster is N, N/2^16 is a strict upper bound on the number of blocks in a single directory on any DN, assuming completely sequential block IDs. So for a small cluster we can't see any blowup. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025803#comment-14025803 ] Hadoop QA commented on HDFS-6315: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649417/HDFS-6315.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7063//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7063//console This message is automatically generated. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch, HDFS-6315.002.patch, HDFS-6315.003.patch, HDFS-6315.004.patch, HDFS-6315.005.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025823#comment-14025823 ] Chris Nauroth commented on HDFS-2006: - I agree with Andrew on the plan for merging to branch-2. Thank you, Uma. ability to support storing extended attributes per file --- Key: HDFS-2006 URL: https://issues.apache.org/jira/browse/HDFS-2006 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: dhruba borthakur Assignee: Yi Liu Fix For: 3.0.0 Attachments: ExtendedAttributes.html, HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch It would be nice if HDFS provides a feature to store extended attributes for files, similar to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. The challenge is that it has to be done in such a way that a site not using this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6478) RemoteException can't be retried properly for non-HA scenario
[ https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025843#comment-14025843 ] Hadoop QA commented on HDFS-6478: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649415/HDFS-6478.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7064//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7064//console This message is automatically generated. RemoteException can't be retried properly for non-HA scenario - Key: HDFS-6478 URL: https://issues.apache.org/jira/browse/HDFS-6478 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6478.patch For HA case, the call stack is DFSClient - RetryInvocationHandler - ClientNamenodeProtocolTranslatorPB - ProtobufRpcEngine. ProtobufRpcEngine. ProtobufRpcEngine throws ServiceException and expects the caller to unwrap it; ClientNamenodeProtocolTranslatorPB is the component that takes care of that. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy26.getFileInfo at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo at sun.reflect.GeneratedMethodAccessor24.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy27.getFileInfo at org.apache.hadoop.hdfs.DFSClient.getFileInfo at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus {noformat} However, for non-HA case, the call stack is DFSClient - ClientNamenodeProtocolTranslatorPB - RetryInvocationHandler - ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be retried properly. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy9.getListing at sun.reflect.NativeMethodAccessorImpl.invoke0 at sun.reflect.NativeMethodAccessorImpl.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy9.getListing at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing at org.apache.hadoop.hdfs.DFSClient.listPaths {noformat} Perhaps, we can fix it by have NN wrap RetryInvocationHandler around ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6395) Assorted improvements to xattr limit checking
[ https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025855#comment-14025855 ] Andrew Wang commented on HDFS-6395: --- I should have realized this earlier, considering I worked on something pretty similar with the fs-limits and the edit log before. I agree that it's difficult to do this without some serious code gymnastics, so let's just table the entire thing for now. Please resolve this if you agree, thanks again [~hitliuyi]. Assorted improvements to xattr limit checking - Key: HDFS-6395 URL: https://issues.apache.org/jira/browse/HDFS-6395 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Yi Liu Attachments: HDFS-6395.patch It'd be nice to print messages during fsimage and editlog loading if we hit either the # of xattrs per inode or the xattr size limits. We should also consider making the # of xattrs limit only apply to the user namespace, or to each namespace separately, to prevent users from locking out access to other namespaces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025943#comment-14025943 ] Alejandro Abdelnur commented on HDFS-6379: -- [~michaelbyoder], nice work. Would you mind adding a testcase where ACLs are disabled in HDFS to verify that being disable do not break file status and list status? After that I think it is ready to go. HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025944#comment-14025944 ] Hadoop QA commented on HDFS-6460: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649427/HDFS-6460.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7066//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7066//console This message is automatically generated. To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6379) HTTPFS - Implement ACLs support
[ https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025956#comment-14025956 ] Mike Yoder commented on HDFS-6379: -- Looks like there's another me out there! I'm [~yoderme], not that other...uh...guy with my name. :-) [~tucu00], I totally agree with that test case, but can you send a quick pointer as to how to do that in an automated fashion? All the test cases I've seen fire up the server part once at the start and leave it running for all tests. Any way to change the server conf dynamically? Thanks, -Mike HTTPFS - Implement ACLs support --- Key: HDFS-6379 URL: https://issues.apache.org/jira/browse/HDFS-6379 Project: Hadoop HDFS Issue Type: Bug Reporter: Alejandro Abdelnur Assignee: Mike Yoder Fix For: 2.4.0 Attachments: jira-HDFS-6379.patch HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS. This JIRA is for such. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6504) NFS: invalid Keytab/principal entry should shutdown nfs server
Yesha Vora created HDFS-6504: Summary: NFS: invalid Keytab/principal entry should shutdown nfs server Key: HDFS-6504 URL: https://issues.apache.org/jira/browse/HDFS-6504 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Invalid value in 'dfs.nfs.keytab.file' and 'dfs.nfs.kerberos.principal' should shutdown nfs. Currently NFS does not throw any error or shutdown nfs if invalid value is entered in any of the above properties. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025964#comment-14025964 ] Yongjun Zhang commented on HDFS-6460: - The failed test is irrelevant, and it was reported as HADOOP-10668. To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025970#comment-14025970 ] Andrew Wang commented on HDFS-6460: --- +1 will commit shortly, thanks Yongjun To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025972#comment-14025972 ] Brandon Li commented on HDFS-6439: -- [~atm], are you still working on this? If you are distracted by other tasks, I will upload a new patch based on yours. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025977#comment-14025977 ] Hadoop QA commented on HDFS-6439: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646408/linux-nfs-disallow-request-from-nonsecure-port.pcapng against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7067//console This message is automatically generated. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6460: -- Summary: Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance (was: To ignore stale/decommissioned nodes in NetworkTopology#pseudoSortByDistance) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025980#comment-14025980 ] Colin Patrick McCabe commented on HDFS-6482: bq. Suppose we had a cluster of 16 million blocks (with sequential block IDs), we could in theory have a single DN with a directory as large as 1024 entries, if we got unlucky with the assignment of blocks to DNs. I don't think this calculation is right. Even if all the blocks end up on a single DN (maximally unbalanced), in a 16 million block cluster, you have (16 * 1024 * 1024) / (256 * 256) = 256 entries per directory. To confirm this calculation, I ran this test program: {code} #include inttypes.h #include stdio.h #define MAX_A 256 #define MAX_B 256 uint64_t dir_entries[MAX_A][MAX_B]; int main(void) { uint64_t i, j, l, a, b, c; uint64_t max = (16LL * 1024LL * 1024LL); for (i = 0; i max; i++) { l = (i 0x00ffLL); a = (i 0xff00LL) 8LL; b = (i 0x00ffLL) 16LL; c = (i 0xff00LL) 16LL; c |= l; //printf(%02PRIx64/%02PRIx64/%012PRIx64\n, a, b, c); dir_entries[a][b]++; } max = 0; for (i = 0; i MAX_A; i++) { for (j = 0; j MAX_B; j++) { if (max dir_entries[i][j]) { max = dir_entries[i][j]; } } } printf(max entries per directory = %PRId64\n, max); return 0; } {code} bq. we were considering using some sort of deterministic probing (as in hash tables) to find less full directories if the initial directory for a block is full... I don't think probing is a good idea. It's going to slow things down in the common case when we're reading a block. Maybe we should add another layer in the hierarchy so that we know we won't get big directories even on huge clusters. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025981#comment-14025981 ] Andrew Wang commented on HDFS-6460: --- Committed this to trunk. Yongjun, do you mind prepping a branch-2 patch too? There's another test that needs to be updated. Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025992#comment-14025992 ] Hudson commented on HDFS-6460: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5671 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5671/]) HDFS-6460. Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance. Contributed by Yongjun Zhang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601535) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky
[ https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6470: -- Attachment: HDFS-6470.patch It seems the test has the following issues. 1. It asserts the size of BPServiceActor is 2 after BPOfferService started. One of the BPServiceActors could have shutdown due to initBlockPool failure by the time the assert is called. 2. It assumes the first BPServiceActor is healthy and uses that for blockReport verification. It is possible the second BPServiceActor is healthy. The patch moves the size check before BPOfferService starts. In addition, as long as one of the BPServiceActors can send blockReport, the test is considered passed. TestBPOfferService.testBPInitErrorHandling is flaky --- Key: HDFS-6470 URL: https://issues.apache.org/jira/browse/HDFS-6470 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Attachments: HDFS-6470.patch Saw some test flakage in a test-patch run, stacktrace: {code} java.lang.AssertionError: expected:2 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6470) TestBPOfferService.testBPInitErrorHandling is flaky
[ https://issues.apache.org/jira/browse/HDFS-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6470: -- Status: Patch Available (was: Open) TestBPOfferService.testBPInitErrorHandling is flaky --- Key: HDFS-6470 URL: https://issues.apache.org/jira/browse/HDFS-6470 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Attachments: HDFS-6470.patch Saw some test flakage in a test-patch run, stacktrace: {code} java.lang.AssertionError: expected:2 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBPInitErrorHandling(TestBPOfferService.java:334) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond
[ https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6493 started by Juan Yu. Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond -- Key: HDFS-6493 URL: https://issues.apache.org/jira/browse/HDFS-6493 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Assignee: Juan Yu Priority: Trivial Attachments: HDFS-6493.001.patch Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, the delay will be at least 30 minutes or even hours. it's not very user friendly to use milliseconds when it's likely measured in hours. I suggest to make the following change 1. change the unit of this config to second 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms to dfs.namenode.startup.delay.block.deletion.sec 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 minutes, one hour? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond
[ https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Yu updated HDFS-6493: -- Attachment: HDFS-6493.001.patch Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond -- Key: HDFS-6493 URL: https://issues.apache.org/jira/browse/HDFS-6493 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Assignee: Juan Yu Priority: Trivial Attachments: HDFS-6493.001.patch Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, the delay will be at least 30 minutes or even hours. it's not very user friendly to use milliseconds when it's likely measured in hours. I suggest to make the following change 1. change the unit of this config to second 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms to dfs.namenode.startup.delay.block.deletion.sec 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 minutes, one hour? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6493) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond
[ https://issues.apache.org/jira/browse/HDFS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Yu updated HDFS-6493: -- Attachment: (was: HDFS-6493.001.patch) Propose to change dfs.namenode.startup.delay.block.deletion to second instead of millisecond -- Key: HDFS-6493 URL: https://issues.apache.org/jira/browse/HDFS-6493 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Assignee: Juan Yu Priority: Trivial Based on the discussion in https://issues.apache.org/jira/browse/HDFS-6186, the delay will be at least 30 minutes or even hours. it's not very user friendly to use milliseconds when it's likely measured in hours. I suggest to make the following change 1. change the unit of this config to second 2. rename the config key from dfs.namenode.startup.delay.block.deletion.ms to dfs.namenode.startup.delay.block.deletion.sec 3. add the default value to hdfs-default.xml, what's the reasonable value, 30 minutes, one hour? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6502) incorrect description in distcp2 document
[ https://issues.apache.org/jira/browse/HDFS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned HDFS-6502: --- Assignee: Akira AJISAKA incorrect description in distcp2 document - Key: HDFS-6502 URL: https://issues.apache.org/jira/browse/HDFS-6502 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Akira AJISAKA In http://hadoop.apache.org/docs/r1.2.1/distcp2.html#UpdateAndOverwrite The first statement of the Update and Overwrite section says: {quote} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files even if they exist at the source, or have the same contents. {quote} The Command Line Options table says : {quote} -overwrite: Overwrite destination -update: Overwrite if src size different from dst size {quote} Based on the implementation, making the following modification would be more accurate: The first statement of the Update and Overwrite section: {code} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files if they exist at the target. {code} The Command Line Options table: {code} -overwrite: Overwrite destination -update: Overwrite destination if source and destination have different contents {code} Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6502) incorrect description in distcp2 document
[ https://issues.apache.org/jira/browse/HDFS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6502: Attachment: HDFS-6502.patch Thanks [~yzhangal] for the report. Attaching a patch for trunk and branch-2. incorrect description in distcp2 document - Key: HDFS-6502 URL: https://issues.apache.org/jira/browse/HDFS-6502 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Akira AJISAKA Attachments: HDFS-6502.patch In http://hadoop.apache.org/docs/r1.2.1/distcp2.html#UpdateAndOverwrite The first statement of the Update and Overwrite section says: {quote} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files even if they exist at the source, or have the same contents. {quote} The Command Line Options table says : {quote} -overwrite: Overwrite destination -update: Overwrite if src size different from dst size {quote} Based on the implementation, making the following modification would be more accurate: The first statement of the Update and Overwrite section: {code} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files if they exist at the target. {code} The Command Line Options table: {code} -overwrite: Overwrite destination -update: Overwrite destination if source and destination have different contents {code} Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6460: Attachment: HDFS-6460-branch2.001.patch Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6502) incorrect description in distcp2 document
[ https://issues.apache.org/jira/browse/HDFS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6502: Labels: newbie (was: ) Target Version/s: 2.5.0 Affects Version/s: (was: 2.4.0) 2.5.0 1.2.1 Status: Patch Available (was: Open) incorrect description in distcp2 document - Key: HDFS-6502 URL: https://issues.apache.org/jira/browse/HDFS-6502 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 1.2.1, 2.5.0 Reporter: Yongjun Zhang Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6502.patch In http://hadoop.apache.org/docs/r1.2.1/distcp2.html#UpdateAndOverwrite The first statement of the Update and Overwrite section says: {quote} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files even if they exist at the source, or have the same contents. {quote} The Command Line Options table says : {quote} -overwrite: Overwrite destination -update: Overwrite if src size different from dst size {quote} Based on the implementation, making the following modification would be more accurate: The first statement of the Update and Overwrite section: {code} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files if they exist at the target. {code} The Command Line Options table: {code} -overwrite: Overwrite destination -update: Overwrite destination if source and destination have different contents {code} Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026024#comment-14026024 ] Yongjun Zhang commented on HDFS-6460: - Many thanks Andrew! Just uploaded a patch for branch-2, the change is in TestHdfsNetworkTopologyWithNodeGroup.java as you mentioned. Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026027#comment-14026027 ] Aaron T. Myers commented on HDFS-6439: -- Definitely don't let me hold you up if you'd like to work on a patch, [~brandonli]. It'd be much appreciated, and I'd be happy to review it. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6395) Assorted improvements to xattr limit checking
[ https://issues.apache.org/jira/browse/HDFS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026032#comment-14026032 ] Andrew Wang commented on HDFS-6395: --- Woops, my bad, I forgot that this patch also fixes the # limit to not apply to the non-user namespaces. I had a few comments: - Would be nice to test that the system namespace isn't affected by these limits, I guess reach into FSNamesystem or FSDirectory via @VisibleForTesting methods. - Let's remove the prints when the limits hit their max, since I think that was a misunderstanding of Chris' comment about printing. Thanks Yi! Assorted improvements to xattr limit checking - Key: HDFS-6395 URL: https://issues.apache.org/jira/browse/HDFS-6395 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Yi Liu Attachments: HDFS-6395.patch It'd be nice to print messages during fsimage and editlog loading if we hit either the # of xattrs per inode or the xattr size limits. We should also consider making the # of xattrs limit only apply to the user namespace, or to each namespace separately, to prevent users from locking out access to other namespaces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6460) Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026039#comment-14026039 ] Hadoop QA commented on HDFS-6460: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649510/HDFS-6460-branch2.001.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7069//console This message is automatically generated. Ignore stale and decommissioned nodes in NetworkTopology#sortByDistance --- Key: HDFS-6460 URL: https://issues.apache.org/jira/browse/HDFS-6460 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6460-branch2.001.patch, HDFS-6460.001.patch, HDFS-6460.002.patch Per discussion in HDFS-6268, filing this jira as a follow-up, so that we can improve the sorting result and save a bit runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026047#comment-14026047 ] Zesheng Wu commented on HDFS-6382: -- Thanks [~cmccabe] for your feedback. bq. For the MR strategy, it seems like this could be parallelized fairly easily. For example, if you have 5 MR tasks, you can calculate the hash of each path, and then task 1 can do all the paths that are 0 mod 5, task 2 can do all the paths that are 1 mod 5, and so forth. MR also doesn't introduce extra dependencies since HDFS and MR are packaged together. You mean that we scan the whole namespace at first and then split it into 5 pieces according to hash of the path, why do we just complete the work during the first scanning process? If I misunderstand your meaning, please point out. bq. I don't understand what you mean by the mapreduce strategy will have additional overheads. What overheads are you foreseeing? Possible overheads: Starting a mapreduce job needs to split the input, start an AppMaster, collect result from random machines (Perhaps 'overheads' is not a proper word here) bq. I don't understand what you mean by this. What will be done automatically? Here automatically means we do not have to rely on external tools, the daemon itself can manage the work well. bq. How are you going to implement HA for the standalone daemon? Good point. As you suggested, one approach is save the state in HDFS and simply restart it when it fails. But managing the state is a complex work, I am considering how to simplify this. One possible simpler approach is that we can consider that the daemon is stateless and simply restart it when if fails. We needn't do checkpoint and just scan from the beginning when it restarts. Because we can require that the work the daemon does is idempotent, starting from the beginning will be harmless. Possible drawbacks of the later approach are that it may waste some time and may delay the work, but they are acceptable. bq. I don't see a lot of discussion of logging and monitoring in general. How is the user going to become aware that a file was deleted because of a TTL? Or if there is an error during the delete, how will the user know? For the simplicity purpose, in the initial version, we will use logs to record which file/directory is deleted by TTL, and errors during the deleting process. bq. Does this need to be an administrator command? It doesn't need to be an administrator command, user only can setTtl on file/directory that they have write permission, and can getTtl on file/directory that they have read permission. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026068#comment-14026068 ] Guo Ruijing commented on HDFS-6489: --- Take example, existing behavior: 1. create file 60M with prefer block size 64M. 2. append 10 bytes (disk utilization is increased by 60M + 10 bytes, totally 120M + 10 bytes) 3. append 10 bytes (disk utilization is increased by 60M + 20 bytes, totally 120M + 30 bytes) 4. append 10 bytes (disk utilization is increased by 60M + 30 bytes, totally 180M + 60bytes) expected behavior: 1. create file 60M with prefer block size 64M. 2. append 10 bytes (disk utilization is increased 10 bytes, totally 60M + 10 bytes) 3. append 10 bytes (disk utilization is increased 10 bytes, totally 60M + 20 bytes) 4. append 10 bytes (disk utilization is increased 10 bytes, totally 60M + 30 bytes) DFS Used space is not correct computed on frequent append operations Key: HDFS-6489 URL: https://issues.apache.org/jira/browse/HDFS-6489 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: stanley shi The current implementation of the Datanode will increase the DFS used space on each block write operation. This is correct in most scenario (create new file), but sometimes it will behave in-correct(append small data to a large block). For example, I have a file with only one block(say, 60M). Then I try to append to it very frequently but each time I append only 10 bytes; Then on each append, dfs used will be increased with the length of the block(60M), not teh actual data length(10bytes). Consider in a scenario I use many clients to append concurrently to a large number of files (1000+), assume the block size is 32M (half of the default value), then the dfs used will be increased 1000*32M = 32G on each append to the files; but actually I only write 10K bytes; this will cause the datanode to report in-sufficient disk space on data write. {quote}2014-06-04 15:27:34,719 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, FINALIZED{quote} But the actual disk usage: {quote} [root@hdsh143 ~]# df -h FilesystemSize Used Avail Use% Mounted on /dev/sda3 16G 2.9G 13G 20% / tmpfs 1.9G 72K 1.9G 1% /dev/shm /dev/sda1 97M 32M 61M 35% /boot {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6502) incorrect description in distcp2 document
[ https://issues.apache.org/jira/browse/HDFS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026069#comment-14026069 ] Hadoop QA commented on HDFS-6502: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649511/HDFS-6502.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7070//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7070//console This message is automatically generated. incorrect description in distcp2 document - Key: HDFS-6502 URL: https://issues.apache.org/jira/browse/HDFS-6502 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 1.2.1, 2.5.0 Reporter: Yongjun Zhang Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6502.patch In http://hadoop.apache.org/docs/r1.2.1/distcp2.html#UpdateAndOverwrite The first statement of the Update and Overwrite section says: {quote} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files even if they exist at the source, or have the same contents. {quote} The Command Line Options table says : {quote} -overwrite: Overwrite destination -update: Overwrite if src size different from dst size {quote} Based on the implementation, making the following modification would be more accurate: The first statement of the Update and Overwrite section: {code} -update is used to copy files from source that don't exist at the target, or have different contents. -overwrite overwrites target-files if they exist at the target. {code} The Command Line Options table: {code} -overwrite: Overwrite destination -update: Overwrite destination if source and destination have different contents {code} Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6505) Can not close file due to last block is marked as corrupt
Gordon Wang created HDFS-6505: - Summary: Can not close file due to last block is marked as corrupt Key: HDFS-6505 URL: https://issues.apache.org/jira/browse/HDFS-6505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Gordon Wang After appending a file, client could not close it. Because namenode could not complete the last block in file. The UC status of last block remained as COMMIT and never change. The namenode log was like this. {code} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1073741920_13948{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.28.1.2:50010|RBW]]} has not reached minimal replication 1 {code} After going through the log of namenode, I found a log like this {code} INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741920 added as corrupt on 172.28.1.2:50010 by sdw3/172.28.1.3 because client machine reported it {code} But actually, the last block was finished successfully in the data node. Because I could find the log in datanode {code} INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13808 (numBytes=50120352) to /172.28.1.3:50010 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.28.1.2:36860, dest: /172.28.1.2:50010, bytes: 51686616, op: HDFS_WRITE, cliID: libhdfs3_client_random_741511239_count_1_pid_215802_tid_140085714196576, offset: 0, srvID: DS-2074102060-172.28.1.2-50010-1401432768690, blockid: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, duration: 189226453336 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6505) Can not close file due to last block is marked as corrupt
[ https://issues.apache.org/jira/browse/HDFS-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026085#comment-14026085 ] Gordon Wang commented on HDFS-6505: --- This issue causes the last block is missing and the file is corrupted. But actually, the data on DataNode is correct. I went through the code, and I think some safe check is missing when namenode receives a bad block report from datanodes. See the following code snippet in namenode BlockManager {code} public void findAndMarkBlockAsCorrupt(final ExtendedBlock blk, final DatanodeInfo dn, String storageID, String reason) throws IOException { assert namesystem.hasWriteLock(); final BlockInfo storedBlock = getStoredBlock(blk.getLocalBlock()); if (storedBlock == null) { // Check if the replica is in the blockMap, if not // ignore the request for now. This could happen when BlockScanner // thread of Datanode reports bad block before Block reports are sent // by the Datanode on startup blockLog.info(BLOCK* findAndMarkBlockAsCorrupt: + blk + not found); return; } markBlockAsCorrupt(new BlockToMarkCorrupt(storedBlock, reason, Reason.CORRUPTION_REPORTED), dn, storageID); } {code} We should check the timestamp in reported block and stored block. If the reported block has a smaller timestamp, this block should not be marked as corrupt. It is possible that the reported block has a smaller timestamp when client has done some work on recovering pipeline. Can not close file due to last block is marked as corrupt - Key: HDFS-6505 URL: https://issues.apache.org/jira/browse/HDFS-6505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Gordon Wang After appending a file, client could not close it. Because namenode could not complete the last block in file. The UC status of last block remained as COMMIT and never change. The namenode log was like this. {code} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1073741920_13948{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.28.1.2:50010|RBW]]} has not reached minimal replication 1 {code} After going through the log of namenode, I found a log like this {code} INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741920 added as corrupt on 172.28.1.2:50010 by sdw3/172.28.1.3 because client machine reported it {code} But actually, the last block was finished successfully in the data node. Because I could find the log in datanode {code} INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13808 (numBytes=50120352) to /172.28.1.3:50010 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.28.1.2:36860, dest: /172.28.1.2:50010, bytes: 51686616, op: HDFS_WRITE, cliID: libhdfs3_client_random_741511239_count_1_pid_215802_tid_140085714196576, offset: 0, srvID: DS-2074102060-172.28.1.2-50010-1401432768690, blockid: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, duration: 189226453336 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-649434182-172.28.1.251-1401432753616:blk_1073741920_13948, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026090#comment-14026090 ] stanley shi commented on HDFS-5723: --- Hi Vinay, Seems my steps are an different error but with the same error log. Do you want to fix it in this ticket or prefer me to submit another ticket? Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction Key: HDFS-5723 URL: https://issues.apache.org/jira/browse/HDFS-5723 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-5723.patch, HDFS-5723.patch Scenario: 1. 3 node cluster with dfs.client.block.write.replace-datanode-on-failure.enable set to false. 2. One file is written with 3 replicas, blk_id_gs1 3. One of the datanode DN1 is down. 4. File was opened with append and some more data is added to the file and synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 5. Now DN1 restarted 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should be marked corrupted. but since NN having appended block state as UnderConstruction, at this time its not detecting this block as corrupt and adding to valid block locations. As long as the namenode is alive, this datanode also will be considered as valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)