[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989348#comment-13989348 ] Haohui Mai commented on HDFS-6293: -- Just to recap: # The requirement is to build an offline tool that can process PB-based fsimage. # The namespace is mostly a hierarchal structure with snapshots. That is the exact reason why the PB-based fsimage have moved towards a record based storage. # It is useful to recover the hierarchal structure in the namespace in some use cases. Based on design choices made in (2), any in-memory processing algorithm requires Theta(n) memory, where n is the number of inodes. It requires too much resources. # Various solutions to leverage the resource on SNN have been proposed. Here are my two cents: # Though the current sets of tools do load the whole fsimage into memory and process it, there is no reason that any offline tool has to be implemented in that way. For example, building an index can solve the above use cases. # The namespace is no longer a tree with snapshots. Forcing it into a hierarchal structure sometimes requires fitting square pegs through round pipes. # Note that the goal of SBN / SNN are to improve the reliability of the system. The simpler the code is, the more likely the code can be throughly reasoned about and become more reliable. Personally I don't like any solutions that add complexity to SBN / SNN to solve the use case of offline image viewer. It doesn't seem right to solve an offline problem using an online machine that is accounted for the reliability of the system. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6293: - Attachment: HDFS-6293.000.patch Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6293: - Assignee: Haohui Mai Status: Patch Available (was: Open) To demonstrate my points, I'm attaching a patch which stores the current PB-based fsimage into a LevelDB, and perform lsr on top of the LevelDB. The tool that converts the fsimage into LevelDB reads the whole {{INODE_DIR}} section into memory, then store the json representation of each inode into the key {{IN || parent_id || localName}}. That way all children for a particular inode are co-located thus it is efficient to run operations like lsr. The conversion tool takes 16bytes * number of inodes to convert the fsimage. For a fsimage that have 400M inodes, the tool takes around 6.4G of memory, which could be run on a commodity machine. The lsr tool only requires O(1) memory. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI
[ https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989379#comment-13989379 ] Vinayakumar B commented on HDFS-6337: - patch looks good. +1 Setfacl testcase is failing due to dash character in username in TestAclCLI --- Key: HDFS-6337 URL: https://issues.apache.org/jira/browse/HDFS-6337 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-6337.patch TestHDFSCLI is failing due to a '-' in username. I have seen the similar fix done in HDFS-5821. So, same fix should be done for setfacl case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989425#comment-13989425 ] Hudson commented on HDFS-5168: -- FAILURE: Integrated in Hadoop-Yarn-trunk #558 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/558/]) HDFS-5168. Add cross node dependency support to BlockPlacementPolicy. Contributed by Nikola Vujic (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Fix For: 2.5.0 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica
[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989422#comment-13989422 ] Hudson commented on HDFS-6295: -- FAILURE: Integrated in Hadoop-Yarn-trunk #558 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/558/]) HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989427#comment-13989427 ] Hadoop QA commented on HDFS-6293: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643332/HDFS-6293.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6812//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6812//console This message is automatically generated. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI
[ https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-6337: -- Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Vinay for the review!. I have just committed this to trunk and branch-2. Setfacl testcase is failing due to dash character in username in TestAclCLI --- Key: HDFS-6337 URL: https://issues.apache.org/jira/browse/HDFS-6337 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6337.patch TestHDFSCLI is failing due to a '-' in username. I have seen the similar fix done in HDFS-5821. So, same fix should be done for setfacl case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989452#comment-13989452 ] Hudson commented on HDFS-5168: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1749 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1749/]) HDFS-5168. Add cross node dependency support to BlockPlacementPolicy. Contributed by Nikola Vujic (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Fix For: 2.5.0 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica
[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989449#comment-13989449 ] Hudson commented on HDFS-6295: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1749 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1749/]) HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6319) Various syntax and style cleanups
[ https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6319: --- Attachment: HDFS-6319.7.patch Reattach the patch file to try to get Jenkins to run tests. Various syntax and style cleanups - Key: HDFS-6319 URL: https://issues.apache.org/jira/browse/HDFS-6319 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch Fix various style issues like if(, while(, [i.e. lack of a space after the keyword], Extra whitespace and newlines if (...) return ... [lack of {}'s] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6319) Various syntax and style cleanups
[ https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989478#comment-13989478 ] Hadoop QA commented on HDFS-6319: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643357/HDFS-6319.7.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6814//console This message is automatically generated. Various syntax and style cleanups - Key: HDFS-6319 URL: https://issues.apache.org/jira/browse/HDFS-6319 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch Fix various style issues like if(, while(, [i.e. lack of a space after the keyword], Extra whitespace and newlines if (...) return ... [lack of {}'s] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989495#comment-13989495 ] Binglin Chang commented on HDFS-6250: - Thanks for the analysis and patch [~airbots]. The fix makes sense, here are some additional concerns: bq. HDFS creates a /system/balancer.id file (30B) to track the balancer Looks like the file contains hostname, whose size is not fixed, I see you increased block size and capacity to minimize the impact of the file, but it seems the risk is still there. testBalancerWithRackLocality tests balancer do not perform cross rack block movements in test scenario, here are the related balancer logs: {code} 014-04-15 18:29:48,649 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.0], Source[127.0.0.1:46174, utilization=30.0]] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=0.0]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.168], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=1.8333]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=28.5], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=5.0]] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:46174, utilization=30.332], Source[127.0.0.1:54333, utilization=25.332]] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=7.667]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=22.668], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=10.5]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 above-average: [Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 below-average: [BalancerDatanode[127.0.0.1:54333, utilization=19.832], BalancerDatanode[127.0.0.1:48293, utilization=12.0]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 underutilized: [] {code} I guess the test intended to let /rack0/NODEGROUP0/dn above-average(=30%) but not over-utilized(30%, consider avg utilization=20%), so blocks on rack0 never move to rack1, but another balancer.id file may break the assumption. So there are some problem inherently in the test, not just race condition or timeout stuff. We may need to change the test(e.g. file size, utilize rate, validate method) to prevent those corner cases. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments:
[jira] [Updated] (HDFS-6319) Various syntax and style cleanups
[ https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6319: --- Attachment: HDFS-6319.8.patch Rebased. Various syntax and style cleanups - Key: HDFS-6319 URL: https://issues.apache.org/jira/browse/HDFS-6319 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch, HDFS-6319.8.patch Fix various style issues like if(, while(, [i.e. lack of a space after the keyword], Extra whitespace and newlines if (...) return ... [lack of {}'s] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6301) NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log.
[ https://issues.apache.org/jira/browse/HDFS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989520#comment-13989520 ] Uma Maheswara Rao G commented on HDFS-6301: --- +1 on the latest patch. Thanks for the reviews Andrew and Charles. I will file separate JIRA for OP_SET_XATTRS optimization above mentioned. NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log. Key: HDFS-6301 URL: https://issues.apache.org/jira/browse/HDFS-6301 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6301.1.patch, HDFS-6301.patch Store XAttrs in fsimage so that XAttrs are retained across NameNode restarts. Implement a new edit log opcode, {{OP_SET_XATTRS}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6301) NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log.
[ https://issues.apache.org/jira/browse/HDFS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-6301. --- Resolution: Fixed Hadoop Flags: Reviewed I have committed the patch to branch! NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log. Key: HDFS-6301 URL: https://issues.apache.org/jira/browse/HDFS-6301 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6301.1.patch, HDFS-6301.patch Store XAttrs in fsimage so that XAttrs are retained across NameNode restarts. Implement a new edit log opcode, {{OP_SET_XATTRS}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6340) DN can't finalize upgarde
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6340: - Priority: Blocker (was: Major) Target Version/s: 2.4.1 DN can't finalize upgarde - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Priority: Blocker Attachments: HDFS-6340-branch-2.4.0.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989540#comment-13989540 ] Kihwal Lee commented on HDFS-6293: -- bq. To demonstrate my points, I'm attaching a patch which stores the current PB-based fsimage into a LevelDB, and perform lsr on top of the LevelDB. That was the first thing I thought about doing, but the processing time matters too. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989569#comment-13989569 ] Hudson commented on HDFS-6295: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/]) HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989572#comment-13989572 ] Hudson commented on HDFS-5168: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/]) HDFS-5168. Add cross node dependency support to BlockPlacementPolicy. Contributed by Nikola Vujic (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Fix For: 2.5.0 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold
[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI
[ https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989574#comment-13989574 ] Hudson commented on HDFS-6337: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/]) HDFS-6337. Setfacl testcase is failing due to dash character in username in TestAclCLI. Contributed by Uma Maheswara Rao G. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592489) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testAclCLI.xml Setfacl testcase is failing due to dash character in username in TestAclCLI --- Key: HDFS-6337 URL: https://issues.apache.org/jira/browse/HDFS-6337 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6337.patch TestHDFSCLI is failing due to a '-' in username. I have seen the similar fix done in HDFS-5821. So, same fix should be done for setfacl case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6339) DN, SNN JN can't rollback data
[ https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989578#comment-13989578 ] Kihwal Lee commented on HDFS-6339: -- This is expected. Upgrade/rollback with HA was not supported until 2.4. So If you are rolling back from 2.4 HA to a previous version, some manual steps are needed. In this case, rollback needs to be done with HA off. It means that the shared edits need to be copied to non-HA name.dir or edits.dir depending on your config. If the HA NN was configured to also store edits locally, it is a bit easier. After successfully rolling back, HA can be re-enabled by initializing shared edits dir and bootstrapping standby. 2NN does not have to persist any state, so you can safely delete the temporary files. bq. I fixed this by deleting the JN data directory. I assume the NN had all edits locally. Otherwise there ca be data loss. Other than this, your procedure seems okay. In the future, please use mailing lists for the inquiries of this kind. DN, SNN JN can't rollback data Key: HDFS-6339 URL: https://issues.apache.org/jira/browse/HDFS-6339 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.2.0 Reporter: Rahul Singhal I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't perform rollback. I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to non-HA and started it on 2.2.0. I started NN DN with the '-rollback' startup option. (There is no explicit startup option for SNN JN like NN DN). Only NN was able to rollback correctly. My fixes: I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526. I fixed the SNN rollback problem by starting it with '-format' option. I then proceeded to converting the non-HA cluster to a HA cluster. The first step after configuration change was to start the JNs. But they also couldn't rollback. My fix: I fixed this by deleting the JN data directory. (deleting the 'current' directory and renaming 'previous' to 'current' didn't fix the rollback) My purpose for filing this bug is to: 1. Ask if these problems are known and intended to be fixed in any future releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0. 2. Confirm that my fixes are correct. If not, please help me with an appropriate fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6339) DN, SNN JN can't rollback data
[ https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989578#comment-13989578 ] Kihwal Lee edited comment on HDFS-6339 at 5/5/14 2:54 PM: -- This is expected. Upgrade/rollback with HA was not supported until 2.4. So If you are rolling back from 2.4 HA to a previous version, some manual steps are needed. In this case, rollback needs to be done with HA off. It means that the shared edits need to be copied to non-HA name.dir or edits.dir depending on your config. If the HA NN was configured to also store edits locally, it is a bit easier. After successfully rolling back, HA can be re-enabled by initializing shared edits dir and bootstrapping standby. 2NN does not have to persist any state, so you can safely delete the temporary files. bq. I fixed this by deleting the JN data directory. I assume the NN had all edits locally. Otherwise there can be data loss. Other than this, your procedure seems okay. In the future, please use mailing lists for the inquiries of this kind. was (Author: kihwal): This is expected. Upgrade/rollback with HA was not supported until 2.4. So If you are rolling back from 2.4 HA to a previous version, some manual steps are needed. In this case, rollback needs to be done with HA off. It means that the shared edits need to be copied to non-HA name.dir or edits.dir depending on your config. If the HA NN was configured to also store edits locally, it is a bit easier. After successfully rolling back, HA can be re-enabled by initializing shared edits dir and bootstrapping standby. 2NN does not have to persist any state, so you can safely delete the temporary files. bq. I fixed this by deleting the JN data directory. I assume the NN had all edits locally. Otherwise there ca be data loss. Other than this, your procedure seems okay. In the future, please use mailing lists for the inquiries of this kind. DN, SNN JN can't rollback data Key: HDFS-6339 URL: https://issues.apache.org/jira/browse/HDFS-6339 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.2.0 Reporter: Rahul Singhal I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't perform rollback. I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to non-HA and started it on 2.2.0. I started NN DN with the '-rollback' startup option. (There is no explicit startup option for SNN JN like NN DN). Only NN was able to rollback correctly. My fixes: I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526. I fixed the SNN rollback problem by starting it with '-format' option. I then proceeded to converting the non-HA cluster to a HA cluster. The first step after configuration change was to start the JNs. But they also couldn't rollback. My fix: I fixed this by deleting the JN data directory. (deleting the 'current' directory and renaming 'previous' to 'current' didn't fix the rollback) My purpose for filing this bug is to: 1. Ask if these problems are known and intended to be fixed in any future releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0. 2. Confirm that my fixes are correct. If not, please help me with an appropriate fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6339) DN, SNN JN can't rollback data
[ https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-6339. -- Resolution: Done DN, SNN JN can't rollback data Key: HDFS-6339 URL: https://issues.apache.org/jira/browse/HDFS-6339 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.2.0 Reporter: Rahul Singhal I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't perform rollback. I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to non-HA and started it on 2.2.0. I started NN DN with the '-rollback' startup option. (There is no explicit startup option for SNN JN like NN DN). Only NN was able to rollback correctly. My fixes: I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526. I fixed the SNN rollback problem by starting it with '-format' option. I then proceeded to converting the non-HA cluster to a HA cluster. The first step after configuration change was to start the JNs. But they also couldn't rollback. My fix: I fixed this by deleting the JN data directory. (deleting the 'current' directory and renaming 'previous' to 'current' didn't fix the rollback) My purpose for filing this bug is to: 1. Ask if these problems are known and intended to be fixed in any future releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0. 2. Confirm that my fixes are correct. If not, please help me with an appropriate fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6335) TestOfflineEditsViewer for XAttr
[ https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-6335: - Attachment: HDFS-6335.1.patch Thanks Charles for your review. I just update the patch. TestOfflineEditsViewer for XAttr Key: HDFS-6335 URL: https://issues.apache.org/jira/browse/HDFS-6335 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored TestOfflineEditsViewer for XAttr, and also need update for editsStored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
Chen He created HDFS-6342: - Summary: TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989598#comment-13989598 ] Chen He commented on HDFS-6250: --- Thank you for your comments, [~decster]. I agree with you that the balancer.id file can bring more problems. There are two ways to reduce the side-effect of the balancer.id file in this test method. 1) increasing the block size to reduce the impact of balancer.id file( this is what I did); 2) introduce two new nodes, one in rack0 and one in rack1. The failure in this JIRA is because of the balancer.id file's block (blk_181) that should be deleted. We have to wait until that block is deleted. I created a sub-task HDFS-6342 to redesign this test method. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6338) Add a RPC method to allow administrator to delete the file lease.
[ https://issues.apache.org/jira/browse/HDFS-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989603#comment-13989603 ] Kihwal Lee commented on HDFS-6338: -- I think you can achieve what you want with recoverLease(). Leases cannot be simply deleted without actually finalizing the last block replicas and closing the file. NN does it for you, but it usually takes several seconds for recovering the last block. If you don't care about the old content and want to create a new file with the same name, simply delete the old file. The lease will be deleted along with the file. Add a RPC method to allow administrator to delete the file lease. - Key: HDFS-6338 URL: https://issues.apache.org/jira/browse/HDFS-6338 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor we have to wait file lease expire after unexpected interrupt during HDFS writing. so I want to add a RPC method to allow administrator delete the file lease. Please leave comments here, I am workong on the patch now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989604#comment-13989604 ] Chen He commented on HDFS-6342: --- If we create two new nodes: one in rack0 and one in rack1, it can avoid inter-rack data transferring even the balancer.id file is huge. TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6335) TestOfflineEditsViewer for XAttr
[ https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989610#comment-13989610 ] Uma Maheswara Rao G commented on HDFS-6335: --- +1 on the latest patch. Thanks for the review Charles. I have ran the test in my env and they are passed: {noformat} --- T E S T S --- Running org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.038 sec - in org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Results : Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 {noformat} TestOfflineEditsViewer for XAttr Key: HDFS-6335 URL: https://issues.apache.org/jira/browse/HDFS-6335 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored TestOfflineEditsViewer for XAttr, and also need update for editsStored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6335) TestOfflineEditsViewer for XAttr
[ https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-6335. --- Resolution: Fixed Hadoop Flags: Reviewed I have just committed this to branch TestOfflineEditsViewer for XAttr Key: HDFS-6335 URL: https://issues.apache.org/jira/browse/HDFS-6335 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored TestOfflineEditsViewer for XAttr, and also need update for editsStored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands
[ https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-6298: - Attachment: HDFS-6298.2.patch Thanks Uma for review. The new patch includes update for your comments. XML based End-to-End test for getfattr and setfattr commands Key: HDFS-6298 URL: https://issues.apache.org/jira/browse/HDFS-6298 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Uma Maheswara Rao G Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch This JIRA to add test cases with CLI -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands
[ https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989616#comment-13989616 ] Yi Liu commented on HDFS-6298: -- New end to end test is added: setfattr : Add an xattr which has wrong prefix XML based End-to-End test for getfattr and setfattr commands Key: HDFS-6298 URL: https://issues.apache.org/jira/browse/HDFS-6298 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Uma Maheswara Rao G Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch This JIRA to add test cases with CLI -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands
[ https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989662#comment-13989662 ] Uma Maheswara Rao G commented on HDFS-6298: --- Thanks a lot Yi for the update on patch. Actually my intention on test cases is for dir/file permissions based test cases. Ex: If a user does not have permission for a file, then he should not be able to set the xattrs for that file. Also I suggest to add tests for each namespace specified by user API. We allow only 2 namespaces from user API. XML based End-to-End test for getfattr and setfattr commands Key: HDFS-6298 URL: https://issues.apache.org/jira/browse/HDFS-6298 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Uma Maheswara Rao G Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch This JIRA to add test cases with CLI -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5522) Datanode disk error check may be incorrectly skipped
[ https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-5522: - Attachment: HDFS-5522.patch In this patch, I have created new thread which will check for disk errors when there is request fro disk error check every 5 seconds. Datanode disk error check may be incorrectly skipped Key: HDFS-5522 URL: https://issues.apache.org/jira/browse/HDFS-5522 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Kihwal Lee Assignee: Rushabh S Shah Attachments: HDFS-5522.patch After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when network errors occur during processing data node requests. This appears to create problems when a disk is having problems, but not failing I/O soon. If I/O hangs for a long time, network read/write may timeout first and the peer may close the connection. Although the error was caused by a faulty local disk, disk check is not being carried out in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5522) Datanode disk error check may be incorrectly skipped
[ https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-5522: - Target Version/s: 2.5.0 Status: Patch Available (was: Open) Datanode disk error check may be incorrectly skipped Key: HDFS-5522 URL: https://issues.apache.org/jira/browse/HDFS-5522 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0, 0.23.9 Reporter: Kihwal Lee Assignee: Rushabh S Shah Attachments: HDFS-5522.patch After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when network errors occur during processing data node requests. This appears to create problems when a disk is having problems, but not failing I/O soon. If I/O hangs for a long time, network read/write may timeout first and the peer may close the connection. Although the error was caused by a faulty local disk, disk check is not being carried out in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6334) Client failover proxy provider for IP failover based NN HA
[ https://issues.apache.org/jira/browse/HDFS-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6334: - Attachment: HDFS-6334.patch Client failover proxy provider for IP failover based NN HA -- Key: HDFS-6334 URL: https://issues.apache.org/jira/browse/HDFS-6334 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-6334.patch With RPCv9 and improvements in the SPNEGO auth handling, it is possible to set up a pair of HA namenodes utilizing IP failover as client-request fencing mechanism. This jira will make it possible for HA to be configured without requiring use of logical URI and provide a simple IP failover proxy provider. The change will allow any old implementation of {{FailoverProxyProvider}} to continue to work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6334) Client failover proxy provider for IP failover based NN HA
[ https://issues.apache.org/jira/browse/HDFS-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6334: - Status: Patch Available (was: Open) Client failover proxy provider for IP failover based NN HA -- Key: HDFS-6334 URL: https://issues.apache.org/jira/browse/HDFS-6334 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-6334.patch With RPCv9 and improvements in the SPNEGO auth handling, it is possible to set up a pair of HA namenodes utilizing IP failover as client-request fencing mechanism. This jira will make it possible for HA to be configured without requiring use of logical URI and provide a simple IP failover proxy provider. The change will allow any old implementation of {{FailoverProxyProvider}} to continue to work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6342: -- Target Version/s: 3.0.0 Status: Patch Available (was: Open) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6342: -- Attachment: HDFS-6342.patch test run 40 times and no error report. TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230-UpgradeInProgress.jpg HDFS-6230-NoUpgradesInProgress.png Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Patch Available (was: Open) Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled
[ https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6336: Attachment: HDFS-6336.001.patch Cannot download file via webhdfs when wildcard is enabled - Key: HDFS-6336 URL: https://issues.apache.org/jira/browse/HDFS-6336 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, HDFS-6336.001.patch With wildcard is enabled, issuing a webhdfs command like {code} http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN {code} would give {code} http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused}} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled
[ https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989702#comment-13989702 ] Yongjun Zhang commented on HDFS-6336: - Somehow the test was still not triggered, uploaded the same patch to try again. Cannot download file via webhdfs when wildcard is enabled - Key: HDFS-6336 URL: https://issues.apache.org/jira/browse/HDFS-6336 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, HDFS-6336.001.patch With wildcard is enabled, issuing a webhdfs command like {code} http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN {code} would give {code} http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused}} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6340) DN can't finalize upgarde
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989705#comment-13989705 ] Kihwal Lee commented on HDFS-6340: -- Good catch! I think we can start with {{false}} as the initial value and use the simple assignment instead of AND operation. After all, the last result must be up-to-date. But there is another problem. {{nn.isStandbyState()}} is not protected from HA state transitions. We could create a {{FSNamesystem}} method that acquires its read lock and checks the datanode storage staleness (calling down to {{BlockManager}}) and the HA state. This is preferred since we want to avoid making {{BlockManager}} lock {{FSNameSystem}}. If we do this, we don't have to check the individual results from {{processReport()}}. DN can't finalize upgarde - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Priority: Blocker Attachments: HDFS-6340-branch-2.4.0.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6339) DN, SNN JN can't rollback data
[ https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989711#comment-13989711 ] Rahul Singhal commented on HDFS-6339: - Thanks a lot for your reply [~kihwal]. I was testing both cases/paths: (non-HA, 2.2.0) - (non-HA, 2.4.0) - (non-HA 2.2.0) (HA, 2.2.0) - (HA, 2.4.0) - (non-HA, 2.2.0) - (HA, 2.2.0) The issue with 2NN was noticed in cases 1. I guess I was mainly confused by the fact that start-dfs.sh does not fromat the 2NN during rollback. Thanks for confirming my procedure. And I will use the mailing list for future questions but since you have the context here, I was hoping you could answer one more questions. In what cases, will NN not have all edits locally? Should they be available at edits.dir? DN, SNN JN can't rollback data Key: HDFS-6339 URL: https://issues.apache.org/jira/browse/HDFS-6339 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.2.0 Reporter: Rahul Singhal I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't perform rollback. I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to non-HA and started it on 2.2.0. I started NN DN with the '-rollback' startup option. (There is no explicit startup option for SNN JN like NN DN). Only NN was able to rollback correctly. My fixes: I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526. I fixed the SNN rollback problem by starting it with '-format' option. I then proceeded to converting the non-HA cluster to a HA cluster. The first step after configuration change was to start the JNs. But they also couldn't rollback. My fix: I fixed this by deleting the JN data directory. (deleting the 'current' directory and renaming 'previous' to 'current' didn't fix the rollback) My purpose for filing this bug is to: 1. Ask if these problems are known and intended to be fixed in any future releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0. 2. Confirm that my fixes are correct. If not, please help me with an appropriate fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs
[ https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6313: - Priority: Blocker (was: Major) Target Version/s: 2.4.1 (was: 2.5.0) WebHdfs may use the wrong NN when configured for multiple HA NNs Key: HDFS-6313 URL: https://issues.apache.org/jira/browse/HDFS-6313 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Priority: Blocker WebHdfs resolveNNAddr will return a union of addresses for all HA configured NNs. The client may access the wrong NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6330) Move mkdir() to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989750#comment-13989750 ] Hadoop QA commented on HDFS-6330: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643151/HDFS-6330.000.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6818//console This message is automatically generated. Move mkdir() to FSNamesystem Key: HDFS-6330 URL: https://issues.apache.org/jira/browse/HDFS-6330 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6330.000.patch Currently mkdir() automatically creates all ancestors for a directory. This is implemented in FSDirectory, by calling unprotectedMkdir() along the path. This jira proposes to move the function to FSNamesystem to simplify the primitive that FSDirectory needs to provide. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs
[ https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989761#comment-13989761 ] Kihwal Lee commented on HDFS-6313: -- In 2.4.0 it is {{DFSUtil.resolveWebHdfsUri()}}. In trunk, it is {{WebHdfsFileSystem#resolveNNAddr()}. They all obtain NN addresses by calling {{DFSUtil.getAddresses()}}, which gets all NN http addresses from all known name services. If multiple name services are configured, {{WebHdfsFileSystem}} can use a wrong NN. WebHdfs may use the wrong NN when configured for multiple HA NNs Key: HDFS-6313 URL: https://issues.apache.org/jira/browse/HDFS-6313 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Priority: Blocker WebHdfs resolveNNAddr will return a union of addresses for all HA configured NNs. The client may access the wrong NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989791#comment-13989791 ] Colin Patrick McCabe commented on HDFS-6287: bq. Hi, Colin. Thanks for posting this. Did you find that you needed to use SSE to get the addition fast enough so that the benchmark highlights read throughput instead of sum computation? IOW, could we potentially simplify this patch to not use SSE at all and still have a valid benchmark? Without that optimization, the benchmark quickly becomes CPU-bound and you don't get true numbers for ZCR and other fast read methods. I just benchmarked 1.5 GB/s for the un-optimized version versus 5.7 GB/s for the optimized. bq. I think it would be helpful to add a comment with a high-level summary of what vecsum does, maybe right before the main. Added bq. I have one minor comment on the code itself so far. I think you can remove the hdfsFreeBuilder call. hdfsBuilderConnect always frees the builder, whether it succeeds or fails. The only time you would need to call hdfsFreeBuilder directly is if you allocated a builder but then never attempted to connect with it. I don't see any way for that to happen in the libhdfs_data_create code. Yeah, that is deadcode. Let me remove that Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6287: --- Attachment: HDFS-6287.005.patch Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6326) WebHdfs ACL compatibility is broken
[ https://issues.apache.org/jira/browse/HDFS-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6326: Attachment: HDFS-6326.2.patch Here is patch v2, changing the exception handling in ls. I took a very defensive approach and just caught {{Exception}}. This fixes the immediate problem and also anticipates any future problems related to custom {{FileSystem}} implementations. Of course, it's not generally a good idea to do a blanket catch of {{Exception}}. In this case though, the worst thing that can happen is that we skip displaying the '+', which I think is preferable over causing the ls command to fail if there are other unanticipated failures related to {{getAclStatus}}. In addition to running the ACL-related unit tests, I also did some manual testing. I tested ls using URLs with the webhdfs scheme against a 2.3.0 cluster, and it worked. I also tested against a trunk cluster and confirmed that I was still getting the '+' appended. WebHdfs ACL compatibility is broken --- Key: HDFS-6326 URL: https://issues.apache.org/jira/browse/HDFS-6326 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Assignee: Chris Nauroth Priority: Blocker Attachments: HDFS-6326.1.patch, HDFS-6326.2.patch 2.4 ACL support is completely incompatible with 2.4 webhdfs servers. The NN throws an {{IllegalArgumentException}} exception. {code} hadoop fs -ls webhdfs://nn/ Found 21 items ls: Invalid value for webhdfs parameter op: No enum constant org.apache.hadoop.hdfs.web.resources.GetOpParam.Op.GETACLSTATUS [... 20 more times...] {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5522) Datanode disk error check may be incorrectly skipped
[ https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989848#comment-13989848 ] Hadoop QA commented on HDFS-5522: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643378/HDFS-5522.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6817//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6817//console This message is automatically generated. Datanode disk error check may be incorrectly skipped Key: HDFS-5522 URL: https://issues.apache.org/jira/browse/HDFS-5522 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.9, 2.2.0 Reporter: Kihwal Lee Assignee: Rushabh S Shah Attachments: HDFS-5522.patch After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when network errors occur during processing data node requests. This appears to create problems when a disk is having problems, but not failing I/O soon. If I/O hangs for a long time, network read/write may timeout first and the peer may close the connection. Although the error was caused by a faulty local disk, disk check is not being carried out in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6331) ClientProtocol#setXattr should not be annotated idempotent
[ https://issues.apache.org/jira/browse/HDFS-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989872#comment-13989872 ] Chris Nauroth commented on HDFS-6331: - Hi, [~andrew.wang]. I see this patch is already committed, but I just want to confirm that I agree with your earlier statements. Because of how the flags work, a retry may cause the 2nd application of the operation to throw an exception, even though it should have been a valid call from the client's perspective. Therefore, we need {{AtMostOnce}} semantics. As you said, this differs from the behavior of {{setAcl}}, which we can classify as {{Idempotent}}. Thanks everyone for catching the issue and fixing it. ClientProtocol#setXattr should not be annotated idempotent -- Key: HDFS-6331 URL: https://issues.apache.org/jira/browse/HDFS-6331 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Andrew Wang Assignee: Uma Maheswara Rao G Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6331.patch ClientProtocol#setXAttr is annotated @Idempotent, but this is incorrect since subsequent retries need to throw different exceptions based on the passed flags (e.g. CREATE, REPLACE). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989931#comment-13989931 ] Hadoop QA commented on HDFS-6328: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643135/HDFS-6328.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6819//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6819//console This message is automatically generated. Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6165) hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory
[ https://issues.apache.org/jira/browse/HDFS-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989946#comment-13989946 ] Yongjun Zhang commented on HDFS-6165: - HI [~daryn], I looked into callers of checkPermission, there are two other places in additional to delete operation: - FSNamesystem.getContentSummary method calls checkPermission with FsAction.READ_EXECUTE passed to subAccess. - FSNamesystem.checkSubtreeReadPermission method calls checkPermission with FsAction.READ passed to subAccess. so it looks like that we do need the additional parameter. About the RemoteException, thanks for pointing out that FsShell won't work correctly with other filesystems with the patch. Since there are so many filesystems, the scope of change to address mkdir issue will be much more wide. I'm thinking about handling this in a separate JIRA. What do you guys think? With rmr fixed, it can serve as a workaround for rmdir issue. Thanks. hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory -- Key: HDFS-6165 URL: https://issues.apache.org/jira/browse/HDFS-6165 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6165.001.patch, HDFS-6165.002.patch, HDFS-6165.003.patch, HDFS-6165.004.patch, HDFS-6165.004.patch, HDFS-6165.005.patch, HDFS-6165.006.patch, HDFS-6165.006.patch Given a directory owned by user A with WRITE permission containing an empty directory owned by user B, it is not possible to delete user B's empty directory with either hdfs dfs -rm -r or hdfs dfs -rmdir. Because the current implementation requires FULL permission of the empty directory, and throws exception. On the other hand, on linux, rm -r and rmdir command can remove empty directory as long as the parent directory has WRITE permission (and prefix component of the path have EXECUTE permission), For the tested OSes, some prompt user asking for confirmation, some don't. Here's a reproduction: {code} [root@vm01 ~]# hdfs dfs -ls /user/ Found 4 items drwxr-xr-x - userabc users 0 2013-05-03 01:55 /user/userabc drwxr-xr-x - hdfssupergroup 0 2013-05-03 00:28 /user/hdfs drwxrwxrwx - mapred hadoop 0 2013-05-03 00:13 /user/history drwxr-xr-x - hdfssupergroup 0 2013-04-14 16:46 /user/hive [root@vm01 ~]# hdfs dfs -ls /user/userabc Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds drwxr-xr-x - hdfsusers 0 2013-05-03 01:54 /user/userabc/foo drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# hdfs dfs -ls /user/userabc/foo/ [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -r -skipTrash /user/userabc/foo rm: Permission denied: user=userabc, access=ALL, inode=/user/userabc/foo:hdfs:users:drwxr-xr-x {code} The super user can delete the directory. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -rm -r -skipTrash /user/userabc/foo Deleted /user/userabc/foo {code} The same is not true for files, however. They have the correct behavior. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -touchz /user/userabc/foo-file [root@vm01 ~]# hdfs dfs -ls /user/userabc/ Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds -rw-r--r-- 1 hdfsusers 0 2013-05-03 02:11 /user/userabc/foo-file drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -skipTrash /user/userabc/foo-file Deleted /user/userabc/foo-file {code} Using hdfs dfs -rmdir command: {code} bash-4.1$ hadoop fs -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:29 /user drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:28
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989948#comment-13989948 ] Siqi Li commented on HDFS-5928: --- [~wheat9] for 2.3, if the JSP UI is no longer the default UI, what is the default UI? show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989955#comment-13989955 ] Hadoop QA commented on HDFS-6294: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642882/HDFS-6294.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6820//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6820//console This message is automatically generated. Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch, HDFS-6294.002.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989956#comment-13989956 ] Haohui Mai commented on HDFS-5928: -- Since 2.3 HDFS has moved towards a HTML5-based UI. Please see HDFS-5333 for more details. show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs
[ https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6313: - Status: Patch Available (was: Open) WebHdfs may use the wrong NN when configured for multiple HA NNs Key: HDFS-6313 URL: https://issues.apache.org/jira/browse/HDFS-6313 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.4.0, 3.0.0 Reporter: Daryn Sharp Priority: Blocker Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch WebHdfs resolveNNAddr will return a union of addresses for all HA configured NNs. The client may access the wrong NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs
[ https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-6313: Assignee: Kihwal Lee WebHdfs may use the wrong NN when configured for multiple HA NNs Key: HDFS-6313 URL: https://issues.apache.org/jira/browse/HDFS-6313 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch WebHdfs resolveNNAddr will return a union of addresses for all HA configured NNs. The client may access the wrong NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs
[ https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6313: - Attachment: HDFS-6313.patch HDFS-6313.branch-2.4.patch The patch makes WebHdfsFileSystem extract the only entry that matches the logical name. The new test case demonstrates the bug. WebHdfs may use the wrong NN when configured for multiple HA NNs Key: HDFS-6313 URL: https://issues.apache.org/jira/browse/HDFS-6313 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Priority: Blocker Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch WebHdfs resolveNNAddr will return a union of addresses for all HA configured NNs. The client may access the wrong NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989971#comment-13989971 ] Haohui Mai commented on HDFS-6328: -- The test failure is unrelated. I ran the test locally and it passed. Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989982#comment-13989982 ] Jing Zhao commented on HDFS-6328: - The patch looks pretty good to me. Thanks for the cleanup. Some minors: # The changes on imports seem unnecessary # The following change may need to be reverted: {code} -Preconditions.checkArgument( -src.endsWith(HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR), -%s does not end with %s, src, HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR); + Preconditions.checkArgument(src.endsWith(HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR), %s does not end with %s, src, HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR); {code} # The following line exceeds 80 characters: {code} + return srcs.startsWith(/) !srcs.endsWith(/) getINode4Write(srcs, false) == null; {code} # Let's add {} for the while loop: {code} +while(src[i] == dst[i]) + i++; {code} +1 after addressing the comments. Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6328: - Attachment: HDFS-6328.001.patch Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6193) HftpFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990007#comment-13990007 ] Tsuyoshi OZAWA commented on HDFS-6193: -- Let's wait for review by HDFS experts. HftpFileSystem open should throw FileNotFoundException for non-existing paths - Key: HDFS-6193 URL: https://issues.apache.org/jira/browse/HDFS-6193 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6193-branch-2.4.0.v01.patch, HDFS-6193-branch-2.4.v02.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6314) Test cases for XAttrs
[ https://issues.apache.org/jira/browse/HDFS-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990013#comment-13990013 ] Chris Nauroth commented on HDFS-6314: - Hi, Yi. Thanks for writing all of these tests. I'd like to suggest 2 more test cases: # 1) set xattrs on a file, 2) remove the xattrs from that file, 3) restart NN and 4) set xattrs again on that same file. Do this test twice: once saving a checkpoint before the restart and again without saving a checkpoint. The idea here is to make sure that we don't accidentally leave behind a lingering empty {{XAttrFeature}} attached to the inode after removal of the xattrs. That would leave the inode in a bad state where future attempts to add xattrs would fail due to the precondition check in {{INodeWithAdditionalFields#addXAttrFeature}}. (We had a bug like this on the ACLs feature branch at one time.) # In {{testXAttrSymlinks}}, let's also do a {{setXAttr}} on the link, and then do a {{getXAttrs}} on the target and assert that the xattrs previously set through the link are now visible when querying on the target. Test cases for XAttrs - Key: HDFS-6314 URL: https://issues.apache.org/jira/browse/HDFS-6314 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Fix For: HDFS XAttrs (HDFS-2006) Attachments: HDFS-6314.1.patch, HDFS-6314.patch Tests NameNode interaction for all XAttr APIs, covers restarting NN, saving new checkpoint. Tests XAttr for Snapshot, symlinks. Tests XAttr for HA failover. And more... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6340) DN can't finalize upgarde
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6340: Status: Patch Available (was: Open) DN can't finalize upgarde - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Priority: Blocker Attachments: HDFS-6340-branch-2.4.0.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6340) DN can't finalize upgarde
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990055#comment-13990055 ] Arpit Agarwal commented on HDFS-6340: - Yes, good catch [~rahulsinghal.iitd]. The change looks fine to me but the patch won't apply in trunk. The {{nn.isStandbyState()}} bug appears to have been there for a while. You could fix it here as Kihwal suggested or file a separate Jira for it and just fix the immediate regression here. DN can't finalize upgarde - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Priority: Blocker Attachments: HDFS-6340-branch-2.4.0.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990058#comment-13990058 ] Daryn Sharp commented on HDFS-6328: --- As a general statement, I'm not sure there's a lot of value add in the changes like altering whitespace and moving methods. Mixing functional changes and cosmetic changes make it a bit harder to see what actually changed. Please understand it does makes life harder for those of us also working in the code that will encounter merge conflicts... Is there a reason why this loop needed to become more complicated? At this point I believe it's guaranteed that the src dest are not identical, nor is the src a subdir of the dest? {code} -for(; src[i] == dst[i]; i++); // src[i - 1] is the last common ancestor. +while(src[i] == dst[i] i src.length i dst.length) { + i++; +} {code} Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990064#comment-13990064 ] Daryn Sharp commented on HDFS-6315: --- I am also working towards the goal of removing or minimizing the use of the FSD lock, but I recall it's being used to protect non-threadsafe data structures (like the inode map and snapshot manager). It's spurred by the work to add fine grain locking to the namesystem - which has been derailed by other pressing issues. Do keep in mind that hopefully in the next few months there will not be a globally held FSN so don't entirely remove the FSD lock believing the FSN lock will cover for it. bq. The change can be reverted when removing the lock of FSDirectory. I'm curious what you have in mind. HDFS-5693 appears to be a valuable change. I thoughts deletes used to do something similar while collecting blocks, but that whole region of code has been changed. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990066#comment-13990066 ] Haohui Mai commented on HDFS-6328: -- The main motivation of this jira is to perform identical code clean up to make the reviews of HDFS-6330 and HDFS-6315 easier. The original motivation is to make sure there is no OutOfBoundException in the loop locally without going through all the traces, but it looks the order of the clauses is wrong. I'll fix it in another patch. Simplify code in FSDirectory Key: HDFS-6328 URL: https://issues.apache.org/jira/browse/HDFS-6328 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch This jira proposes: # Cleaning up dead code in FSDirectory. # Simplify the control flows that IntelliJ flags as warnings. # Move functions related to resolving paths into one place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6317) Add snapshot quota
[ https://issues.apache.org/jira/browse/HDFS-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990073#comment-13990073 ] Alex Shafer commented on HDFS-6317: --- That would be most appreciated. Add snapshot quota -- Key: HDFS-6317 URL: https://issues.apache.org/jira/browse/HDFS-6317 Project: Hadoop HDFS Issue Type: Improvement Reporter: Alex Shafer Either allow the 65k snapshot limit to be set with a configuration option or add a per-directory snapshot quota settable with the `hdfs dfsadmin` CLI and viewable by appending fields to `hdfs dfs -count -q` output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6056) Clean up NFS config settings
[ https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6056: - Attachment: HDFS-6056.003.patch Clean up NFS config settings Key: HDFS-6056 URL: https://issues.apache.org/jira/browse/HDFS-6056 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.3.0 Reporter: Aaron T. Myers Assignee: Brandon Li Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, HDFS-6056.003.patch As discussed on HDFS-6050, there's a few opportunities to improve the config settings related to NFS. This JIRA is to implement those changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990076#comment-13990076 ] Haohui Mai commented on HDFS-6315: -- bq. Do keep in mind that hopefully in the next few months there will not be a globally held FSN so don't entirely remove the FSD lock believing the FSN lock will cover for it. The ultimate goal here is to allow FSD only implement the mappings between names and inodes. That way locking is an implementation detail but not part of the interface. FSD can be implemented in lock-free data structure which does not require locking at all. Making the FSN lock more fine-grained is definitely useful but it is orthogonal. bq. I'm curious what you have in mind. HDFS-5693 appears to be a valuable change. I thoughts deletes used to do something similar while collecting blocks, but that whole region of code has been changed. Based on my initial surveys, in the majority (90%) cases that both FSD lock and FSN lock are held together. They can be combined with little performance lost in today's codebase. In longer term FSD might be lock-free as I mentioned above. Decouple recording edit logs from FSDirectory - Key: HDFS-6315 URL: https://issues.apache.org/jira/browse/HDFS-6315 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch Currently both FSNamesystem and FSDirectory record edit logs. This design requires both FSNamesystem and FSDirectory to be tightly coupled together to implement a durable namespace. This jira proposes to separate the responsibility of implementing the namespace and providing durability with edit logs. Specifically, FSDirectory implements the namespace (which should have no edit log operations), and FSNamesystem implement durability by recording the edit logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled
[ https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990114#comment-13990114 ] Hadoop QA commented on HDFS-6336: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643390/HDFS-6336.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6822//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6822//console This message is automatically generated. Cannot download file via webhdfs when wildcard is enabled - Key: HDFS-6336 URL: https://issues.apache.org/jira/browse/HDFS-6336 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, HDFS-6336.001.patch With wildcard is enabled, issuing a webhdfs command like {code} http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN {code} would give {code} http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused}} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled
[ https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990198#comment-13990198 ] Haohui Mai commented on HDFS-6336: -- Since it is passing IP / port around, it looks to me that the patch does not allow the DN to fail over when HA is enabled. Cannot download file via webhdfs when wildcard is enabled - Key: HDFS-6336 URL: https://issues.apache.org/jira/browse/HDFS-6336 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, HDFS-6336.001.patch With wildcard is enabled, issuing a webhdfs command like {code} http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN {code} would give {code} http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused}} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990204#comment-13990204 ] Haohui Mai commented on HDFS-6293: -- bq. There is existing apps that use a custom Visitor similar to lsr. It outputs directory entries with full path and list of blocks for files. [~kihwal], can you please elaborate it? If you're talking about use cases like hdfs-du, there is no need to construct the whole namespace from bottom up. Scanning through the records would be sufficient. bq. That was the first thing I thought about doing, but the processing time matters too. It might not be as bad as you thought. I ran an experiments to see how much time is required to convert an fsimage to a level db on an 8-core Xeon E5530 CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers reported in HDFS-5698. |Size in Old|512M|1G|2G|4G|8G| |Size in PB|469M|950M|1.9G|3.7G|7.0G| |Converting to LevelDB (ms)|30505|56531|121579|373108|1047121| The additional latency for a 8G fsimage is around 15mins, which looks reasonable for me for the use cases of an offline tool. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6293: - Attachment: HDFS-6293.001.patch Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990208#comment-13990208 ] Binglin Chang commented on HDFS-6342: - Test rack capacities are equal doesn't mean there is no block movement cross rack, I don't think simply add a new datanode works, right? Maybe we can make more changes and in the mean time reduce the timeout if possible, 80 seconds for a test is a bit long. TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990211#comment-13990211 ] Binglin Chang commented on HDFS-6342: - As for the fix, I see the need to write a balancer id file, but fill it with hostname doesn't seem to be necessary(cause it is never used anywhere), so if we can modify balancer, write the balancer file but don't write any content, it should not have side effects to balancer and test check code, and we may skip timeout(need to confirm) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI
[ https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990230#comment-13990230 ] Hudson commented on HDFS-6337: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5593 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5593/]) HDFS-6337. Setfacl testcase is failing due to dash character in username in TestAclCLI. Contributed by Uma Maheswara Rao G. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592489) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testAclCLI.xml Setfacl testcase is failing due to dash character in username in TestAclCLI --- Key: HDFS-6337 URL: https://issues.apache.org/jira/browse/HDFS-6337 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6337.patch TestHDFSCLI is failing due to a '-' in username. I have seen the similar fix done in HDFS-5821. So, same fix should be done for setfacl case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6340) DN can't finalize upgarde
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990269#comment-13990269 ] Hadoop QA commented on HDFS-6340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643277/HDFS-6340-branch-2.4.0.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6828//console This message is automatically generated. DN can't finalize upgarde - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Priority: Blocker Attachments: HDFS-6340-branch-2.4.0.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990204#comment-13990204 ] Haohui Mai edited comment on HDFS-6293 at 5/6/14 4:57 AM: -- bq. There is existing apps that use a custom Visitor similar to lsr. It outputs directory entries with full path and list of blocks for files. [~kihwal], can you please elaborate it? If you're talking about use cases like hdfs-du, scanning through the records might be sufficient. bq. That was the first thing I thought about doing, but the processing time matters too. It might not be as bad as you thought. I ran an experiments to see how much time is required to convert an fsimage to a level db on an 8-core Xeon E5530 CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers reported in HDFS-5698. |Size in Old|512M|1G|2G|4G|8G| |Size in PB|469M|950M|1.9G|3.7G|7.0G| |Converting to LevelDB (ms)|30505|56531|121579|373108|1047121| The additional latency for a 8G fsimage is around 15mins. was (Author: wheat9): bq. There is existing apps that use a custom Visitor similar to lsr. It outputs directory entries with full path and list of blocks for files. [~kihwal], can you please elaborate it? If you're talking about use cases like hdfs-du, there is no need to construct the whole namespace from bottom up. Scanning through the records would be sufficient. bq. That was the first thing I thought about doing, but the processing time matters too. It might not be as bad as you thought. I ran an experiments to see how much time is required to convert an fsimage to a level db on an 8-core Xeon E5530 CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers reported in HDFS-5698. |Size in Old|512M|1G|2G|4G|8G| |Size in PB|469M|950M|1.9G|3.7G|7.0G| |Converting to LevelDB (ms)|30505|56531|121579|373108|1047121| The additional latency for a 8G fsimage is around 15mins, which looks reasonable for me for the use cases of an offline tool. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)