[jira] [Commented] (HDFS-6292) Display HDFS per user and per group usage on the webUI
[ https://issues.apache.org/jira/browse/HDFS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982786#comment-13982786 ] Vinayakumar B commented on HDFS-6292: - Good one Ravi. I think calculating in Secondary NN side is OK. But I have a feeling like, just to get these statistics user needs to navigate to SNN page is not a good idea. How about keeping track of these in NameNode side from the starting itself and update these statistics (same as other metrics.) for every operation which modifies these and avoid re-calculation of whole statistics in between to avoid holding namesystem lock for more time. Display HDFS per user and per group usage on the webUI -- Key: HDFS-6292 URL: https://issues.apache.org/jira/browse/HDFS-6292 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-6292.patch, HDFS-6292.png It would be nice to show HDFS usage per user and per group on a web ui. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()
[ https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982788#comment-13982788 ] Vinayakumar B commented on HDFS-6291: - Yes! you are right. image.close() should be finally clause. FSImage may be left unclosed in BootstrapStandby#doRun() Key: HDFS-6291 URL: https://issues.apache.org/jira/browse/HDFS-6291 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor At around line 203: {code} if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) { return ERR_CODE_LOGS_UNAVAILABLE; } {code} If we return following the above check, image is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikola Vujic updated HDFS-5168: --- Attachment: HDFS-5168.patch attaching original patch again. BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica we have to exclude from the search space all nodes from the node groups where the chosen node belongs. This approach may require major changes in the NetworkTopology. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982798#comment-13982798 ] Hadoop QA commented on HDFS-6261: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642186/HDFS-6261.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDistributedFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6753//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6753//console This message is automatically generated. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982933#comment-13982933 ] Hadoop QA commented on HDFS-5168: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642193/HDFS-5168.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6754//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6754//console This message is automatically generated. BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica we have to exclude from the search space all nodes from the node groups where the chosen node belongs. This approach may require major changes in the NetworkTopology. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6258) Support XAttrs from NameNode and implements XAttr APIs for DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983007#comment-13983007 ] Charles Lamb commented on HDFS-6258: Hi Yi, Here are a few minor things that I picked up on as well as some javadoc fixups. Charles Index: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/XAttr.java === --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/XAttr.java (revision 0) +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/XAttr.java (working copy) @@ -0,0 +1,138 @@ +/** + * XAttr is POSIX Extended Attribute model, similar to the one in traditional Operating Systems. + * Extended Attribute consists of a name and associated data, and 4 namespaces are defined: user, + * trusted, security and system. + * 1). USER namespace extended attribute may be assigned for storing arbitrary additional + * information, and its access permissions are defined by file/directory permission bits. + * 2). TRUSTED namespace extended attribute are visible and accessible only to privilege user + * (file/directory owner or fs admin), and it is available from both user space (filesystem + * API) and fs kernel. + * 3). SYSTEM namespace extended attribute is used by fs kernel to store system objects, + * and only available in fs kernel. It's not visible to users. + * 4). SECURITY namespace extended attribute is used by fs kernel for security features, and + * it's not visible to users. XAttr is the POSIX Extended Attribute model similar to that found in traditional Operating Systems. Extended Attributes consist of one or more name/value pairs associated with a file or directory. Four namespaces are defined: user, trusted, security and system. 1) USER namespace attributes may be used by any user to store arbitrary information. Access permissions in this namespace are defined by a file directory's permission bits. 2) TRUSTED namespace attributes are only visible and accessible to privileged users (a file or directory's owner or the fs admin). This namespace is available from both user space (filesystem API) and fs kernel. 3) SYSTEM namespace attributes are used by the fs kernel to store system objects. This namespace is only available in the fs kernel. It is not visible to users. 4) SECURITY namespace attributes are used by the fs kernel for security features. It is not visible to users. Index: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java === --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java (revision 1589028) +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java (working copy) @@ -109,6 +109,8 @@ import org.apache.hadoop.fs.MD5MD5CRC32FileChecksum; import org.apache.hadoop.fs.MD5MD5CRC32GzipFileChecksum; import org.apache.hadoop.fs.Options; +import org.apache.hadoop.fs.XAttr; +import org.apache.hadoop.fs.XAttrSetFlag; import org.apache.hadoop.fs.Options.ChecksumOpt; import org.apache.hadoop.fs.ParentNotDirectoryException; import org.apache.hadoop.fs.Path; @@ -191,6 +193,8 @@ import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Joiner; import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; import com.google.common.net.InetAddresses; / @@ -2757,6 +2761,127 @@ UnresolvedPathException.class); } } + + XAttr constructXAttr(String name, byte[] value) { +if (name == null) { + throw new NullPointerException(XAttr name can not be null.); +} + +int prefixIndex = name.indexOf(.); +if (prefixIndex == -1) { + throw new IllegalArgumentException(XAttr name must be prefixed with user/trusted/security/system which followed by '.'); s/which/and/ +} + +XAttr.NameSpace ns; +String prefix = name.substring(0, prefixIndex).toUpperCase(); +if (prefix.equals(XAttr.NameSpace.USER.toString())) { + ns = XAttr.NameSpace.USER; +} else if (prefix.equals(XAttr.NameSpace.TRUSTED.toString())) { + ns = XAttr.NameSpace.TRUSTED; +} else if (prefix.equals(XAttr.NameSpace.SECURITY.toString())) { + ns = XAttr.NameSpace.SECURITY; +} else if (prefix.equals(XAttr.NameSpace.SYSTEM.toString())) { + ns = XAttr.NameSpace.SYSTEM; +} else { + throw new IllegalArgumentException(XAttr name must be prefixed with user/trusted/security/system which followed by '.'); s/which/and/ I'm unclear as to whether namespaces are case-sensitive or insensitive (I believe they are case-insensitive). The
[jira] [Updated] (HDFS-6218) Audit log should use true client IP for proxied webhdfs operations
[ https://issues.apache.org/jira/browse/HDFS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6218: - Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks for the fix, Daryn. Audit log should use true client IP for proxied webhdfs operations -- Key: HDFS-6218 URL: https://issues.apache.org/jira/browse/HDFS-6218 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, webhdfs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6218.patch When using a http proxy, it's not very useful for the audit log to contain the proxy's IP address. Similar to proxy superusers, the NN should allow configuration of trusted proxy servers and use the X-Forwarded-For header when logging the client request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6269) NameNode Audit Log should differentiate between webHDFS open and HDFS open.
[ https://issues.apache.org/jira/browse/HDFS-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6269: - Target Version/s: 2.5.0 (was: 2.4.1) NameNode Audit Log should differentiate between webHDFS open and HDFS open. --- Key: HDFS-6269 URL: https://issues.apache.org/jira/browse/HDFS-6269 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt To enhance traceability, the NameNode audit log should use a different string for open in the cmd= part of the audit entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6269) NameNode Audit Log should differentiate between webHDFS open and HDFS open.
[ https://issues.apache.org/jira/browse/HDFS-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983024#comment-13983024 ] Kihwal Lee commented on HDFS-6269: -- The test failure is due to HDFS-6250. NameNode Audit Log should differentiate between webHDFS open and HDFS open. --- Key: HDFS-6269 URL: https://issues.apache.org/jira/browse/HDFS-6269 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt To enhance traceability, the NameNode audit log should use a different string for open in the cmd= part of the audit entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6218) Audit log should use true client IP for proxied webhdfs operations
[ https://issues.apache.org/jira/browse/HDFS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983030#comment-13983030 ] Hudson commented on HDFS-6218: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5579 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5579/]) HDFS-6218. Audit log should use true client IP for proxied webhdfs operations. Contributed by Daryn Sharp. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1590640) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java Audit log should use true client IP for proxied webhdfs operations -- Key: HDFS-6218 URL: https://issues.apache.org/jira/browse/HDFS-6218 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, webhdfs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6218.patch When using a http proxy, it's not very useful for the audit log to contain the proxy's IP address. Similar to proxy superusers, the NN should allow configuration of trusted proxy servers and use the X-Forwarded-For header when logging the client request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6269) NameNode Audit Log should differentiate between webHDFS open and HDFS open.
[ https://issues.apache.org/jira/browse/HDFS-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983031#comment-13983031 ] Eric Payne commented on HDFS-6269: -- The patch for this JIRA did not cause the unit test failure (see https://builds.apache.org/job/PreCommit-HDFS-Build/6742/, which also has this same failure). I ran the specific unit test (org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup) in y build environment and it passes successfully, both with and without the patch in this JIRA. NameNode Audit Log should differentiate between webHDFS open and HDFS open. --- Key: HDFS-6269 URL: https://issues.apache.org/jira/browse/HDFS-6269 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt To enhance traceability, the NameNode audit log should use a different string for open in the cmd= part of the audit entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6293) Issues with OIV processing PB-based fsimages
Kihwal Lee created HDFS-6293: Summary: Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6176) Remove assignments to method arguments
[ https://issues.apache.org/jira/browse/HDFS-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983066#comment-13983066 ] Charles Lamb commented on HDFS-6176: Hi Suresh, Thanks. I've been tied up with other things. I'll try to generate a patch in a while. There are several hundred places where the code does this. Charles Remove assignments to method arguments -- Key: HDFS-6176 URL: https://issues.apache.org/jira/browse/HDFS-6176 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Charles Lamb Priority: Minor There are many places in the code where assignments are made to method arguments. Eclipse is quite happy to flag this if the appropriate warning is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983095#comment-13983095 ] Kihwal Lee commented on HDFS-6293: -- Outputting in the new XML format is fast and consumes little memory because it is essentially dumping what is in the image in order. It does not provide readily usable directory/file information as it used to in pre-2.4/protobuf versions. Using something like the ls -l format or any custom visitor for dumping file system tree will require loading of all inodes upfront and linking them afterwards. This requires considerably larger amount of memory. The smallest footprint will be similar to NN's without triplets. It is clearly unacceptable. Reducing memory consumption at the price of considerably longer processing time is also unacceptable. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6293: - Attachment: Heap Histogram.html Attaching heap histogram of OIV. The max heap was set to 2GB to make it go out of heap early and dump the heap. I only loaded up about 3M files/dirs before crashing. If we optimize the PB inefficiencies, we might be able to make it work with 50% of the heap. But that will still be too much. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983105#comment-13983105 ] Kihwal Lee edited comment on HDFS-6293 at 4/28/14 3:33 PM: --- Attaching heap histogram of OIV. The max heap was set to 2GB to make it go out of heap early and dump the heap. It only loaded about 3M files/dirs before crashing. If we optimize the PB inefficiencies, we might be able to make it work with 50% of the heap. But that will still be too much. was (Author: kihwal): Attaching heap histogram of OIV. The max heap was set to 2GB to make it go out of heap early and dump the heap. I only loaded up about 3M files/dirs before crashing. If we optimize the PB inefficiencies, we might be able to make it work with 50% of the heap. But that will still be too much. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983131#comment-13983131 ] Kihwal Lee commented on HDFS-6293: -- The 2.4.0 pb-fsimage does contain tokens, but OIV does not show any tokens. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-4793) uploading file larger than the spaceQuota limit should not create 0 byte file
[ https://issues.apache.org/jira/browse/HDFS-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-4793. --- Resolution: Duplicate Duplicate HDFS-172. uploading file larger than the spaceQuota limit should not create 0 byte file - Key: HDFS-4793 URL: https://issues.apache.org/jira/browse/HDFS-4793 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0 Reporter: Yesha Vora Set the spaceQuota size for Dir A = 64MB. Try to upload a largefile of 1GB in Dir A. The copyFromLocal operation fails but creates 0 byte file on HDFS. [User1@Machine1]$ hadoop fs -ls /A Found 1 items -rwx-- 1 User1 User1 0 2013-05-02 22:29 /A/1GB Expected Behavior: It should not create any 0 byte File -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983196#comment-13983196 ] Colin Patrick McCabe commented on HDFS-6287: OK, time to implement auto-detection of SSE, I guess... Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6286) adding a timeout setting for local read io
[ https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983194#comment-13983194 ] Colin Patrick McCabe commented on HDFS-6286: It seems like enabling hedged reads (which has been merged as HDFS-5776) is a better solution to the problem of high-latency local reads. bq. Per my knowledge, there's no good mechanism to cancel a running read io(Please correct me if it's wrong), You are correct that there is no mechanism for userspace to cancel a synchronous I/O operation in the kernel. bq. my opinion is adding a future around the read request, and we could set a timeout there, if the threshold reached, we can add the local node into deadnode probably... Any thought? We can't afford to construct a future on each read. Reads are often quite small and that would generate too much garbage. We could potentially calculate the time each read took, by calling {{System.nanoTime}} or similar. (On most Linux variants, this is a low-cost call which doesn't need to transition to kernel space.) But setting a timeout is going to be very problematic. For one thing, if the client gets a GC, all of its local reads might then shut down due to the timeout, which would just make performance worse. I've seen perfectly good disks become slow when under heavy load, but only occasionally. I think it's better just to use hedged reads when latency is a concern (such as in HBase.) This gets you all the same benefits, and doesn't require any code changes. It also benefits you when you are doing non-local reads, which this change would not. adding a timeout setting for local read io -- Key: HDFS-6286 URL: https://issues.apache.org/jira/browse/HDFS-6286 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0, 2.4.0 Reporter: Liang Xie Assignee: Liang Xie Currently, if a write or remote read requested into a sick disk, DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to return back. but it doesn't work on local read. Take HBase scan for example, DFSInputStream.read - readWithStrategy - readBuffer - BlockReaderLocal.read - dataIn.read - FileChannelImpl.read if it hits a bad disk, the low read io probably takes tens of seconds, and what's worse is, the DFSInputStream.read hold a lock always. Per my knowledge, there's no good mechanism to cancel a running read io(Please correct me if it's wrong), so my opinion is adding a future around the read request, and we could set a timeout there, if the threshold reached, we can add the local node into deadnode probably... Any thought? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983198#comment-13983198 ] Colin Patrick McCabe commented on HDFS-6288: +1. Thanks, Juan. DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Attachments: HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983206#comment-13983206 ] Colin Patrick McCabe commented on HDFS-6288: Oops, there is a test failure that needs to be addressed. The test failure in {{TestPread}} is because of this new addition: {code} private void doPread(FSDataInputStream stm, long position, byte[] buffer, int offset, int length) throws IOException { int nread = 0; if (!(stm.getWrappedStream() instanceof DFSInputStream)) { throw new IOException(not DFSInputStream); } ... {code} We need to support non-{{DFSInputStream}} objects here so that we can test {{LocalFS}} and so forth. DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Yu updated HDFS-6288: -- Attachment: HDFS-6288.002.patch Fix unit test testPreadLocalFS DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983215#comment-13983215 ] Andrew Wang commented on HDFS-6288: --- I think Juan's new patch handles this case, so I'm +1 pending Jenkins. DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6287: --- Attachment: HDFS-6287.003.patch Here's a version that tries to compile with SSE intrinsics, and falls back on a cross-simple platform loop if that fails. Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-1309) FileSystem.rename will fail silently
[ https://issues.apache.org/jira/browse/HDFS-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983259#comment-13983259 ] Colin Patrick McCabe commented on HDFS-1309: Is this still an issue in branch 2? It looks like {{DistributedFileSystem#rename}} throws {{IOException}} if something goes wrong, and has for a while. Unless I'm missing something, rename is not a filesystem operation that fails silently. FileSystem.rename will fail silently Key: HDFS-1309 URL: https://issues.apache.org/jira/browse/HDFS-1309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.2 Environment: Linux version 2.6.31-302-ec2 (buildd@yellow) (gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu7) ) #7-Ubuntu SMP Tue Oct 13 19:55:22 UTC 2009 Reporter: Kris Nuttycombe Priority: Minor Some filesystem operations (such as rename) will fail silently. In the attached example, a failure message will be written to the hadoop log, but it would be much better if the operation were to fail fast by throwing a checked exception and forcing the caller to handle the problem; failing to do so can easily lead to inadvertent data corruption. val coalesceBasePath = new Path(eventLog.basePath, coalesceTo) val backupBasePath = new Path(eventLog.basePath, relocateTo) eventLog.fs.mkdirs(backupBasePath) for (path - coalesced; time - HDFSEventLog.timePart(path, eventType)) { val backupPath = HDFSEventLog.path(backupBasePath, eventType, time) log.info(Relocating + path + to + backupPath) eventLog.fs.rename(path, backupPath) } INF [20100715-16:11:20.727] reporting: Relocating hdfs://localhost:9000/test-batchEventLog/metrics/metrics_1279226067707 to hdfs://localhost:9000/test-batchEventLog/pre-coalesce/metrics/metrics_1279226067707 INF [20100715-16:11:20.752] reporting: Relocating hdfs://localhost:9000/test-batchEventLog/metrics/metrics_1279226077707 to hdfs://localhost:9000/test-batchEventLog/pre-coalesce/metrics/metrics_1279226077707 INF [20100715-16:11:20.754] reporting: Relocating hdfs://localhost:9000/test-batchEventLog/metrics/metrics_1279226457707 to hdfs://localhost:9000/test-batchEventLog/pre-coalesce/metrics/metrics_1279226457707 INF [20100715-16:11:20.757] reporting: Relocating hdfs://localhost:9000/test-batchEventLog/metrics/metrics_1279229126727 to hdfs://localhost:9000/test-batchEventLog/pre-coalesce/metrics/metrics_1279229126727 Complete. [knuttycombe@floorshow reporting (reporting-coalesce)]$ hadoop fs -ls hdfs://localhost:9000/test-batchEventLog/ Found 3 items drwxr-xr-x - knuttycombe supergroup 0 2010-07-15 14:54 /test-batchEventLog/coalesced drwxr-xr-x - knuttycombe supergroup 0 2010-07-15 14:35 /test-batchEventLog/metrics drwxr-xr-x - knuttycombe supergroup 0 2010-07-15 16:11 /test-batchEventLog/pre-coalesce [knuttycombe@floorshow reporting (reporting-coalesce)]$ hadoop fs -ls hdfs://localhost:9000/test-batchEventLog/metrics Found 4 items -rw-r--r-- 3 knuttycombe supergroup2017122 2010-07-15 14:34 /test-batchEventLog/metrics/metrics_1279226067707 -rw-r--r-- 3 knuttycombe supergroup4122951 2010-07-15 14:34 /test-batchEventLog/metrics/metrics_1279226077707 -rw-r--r-- 3 knuttycombe supergroup512 2010-07-15 14:35 /test-batchEventLog/metrics/metrics_1279226457707 -rw-r--r-- 3 knuttycombe supergroup8638301 2010-07-15 14:26 /test-batchEventLog/metrics/metrics_1279229126727 [knuttycombe@floorshow reporting (reporting-coalesce)]$ hadoop fs -ls hdfs://localhost:9000/test-batchEventLog/pre-coalesce [knuttycombe@floorshow reporting (reporting-coalesce)]$ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6269) NameNode Audit Log should differentiate between webHDFS open and HDFS open.
[ https://issues.apache.org/jira/browse/HDFS-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983265#comment-13983265 ] Daryn Sharp commented on HDFS-6269: --- +1 Looks good to me. NameNode Audit Log should differentiate between webHDFS open and HDFS open. --- Key: HDFS-6269 URL: https://issues.apache.org/jira/browse/HDFS-6269 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, webhdfs Affects Versions: 2.4.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt, HDFS-6269-AuditLogWebOpen.txt To enhance traceability, the NameNode audit log should use a different string for open in the cmd= part of the audit entry. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
Colin Patrick McCabe created HDFS-6294: -- Summary: Use INode IDs to avoid conflicts when a file open for write is renamed Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6294: --- Attachment: HDFS-6294.001.patch Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983282#comment-13983282 ] Colin Patrick McCabe commented on HDFS-6294: One reason why this comes up is because of the NFS gateway. Since the gateway keeps files open for about 10 minutes after the last packet arrives from the client (by default, at least), there are a lot of times when someone copies a file to NFS via the gateway and then moves it. My approach was just to use inode IDs for all the operations done by a file open for write: {{complete}}, {{addBlock}}, {{fsync}}, {{abandonBlock}}, and {{getAdditionalDataNodes}}. In the cases where an inode ID was not being passed over the wire, I added one to the protobuf. This is backwards compatible because the new protobuf fields are optional. If the inode ID is not present, we fall back on the old behavior of using the full path instead. Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983286#comment-13983286 ] Steve Loughran commented on HDFS-6294: -- There's a test for this in HADOOP-9361 which attempts to [rename a file being appended to|https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractAppendContractTest.java#L113]. Presumably that test will pass once this patch has gone through? Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983287#comment-13983287 ] Hadoop QA commented on HDFS-6287: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642283/HDFS-6287.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6756//console This message is automatically generated. Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6294: --- Attachment: (was: HDFS-6294.001.patch) Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983288#comment-13983288 ] Marcelo Vanzin commented on HDFS-6293: -- Hi Kihwal, We have developed some code internally that mitigates (but does not eliminate) some of these problems. For an image with 140M entries it would need in the ballpark of 7-8GB of heap space, from my pencil-and-napkin calculations. Also, it does not generate entries in order like LsrPBImage does, and it's tailored for the use case of listing the contents of the file system (so it completely ignores things like snapshots). (The reason it still requires a lot of memory is, as you note, that it needs to load information about all inodes in memory; our code is just a little smarter about what information it loads. I don't think it's possible to make it much better without changing the data in the fsimage itself.) If people are ok with those limitations, we could clean up our code and post it as a patch. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6294: --- Attachment: HDFS-6294.001.patch Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6294: --- Status: Patch Available (was: Open) Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6287: --- Attachment: HDFS-6287.004.patch Looks like on older glibc versions like the one our jenkins machines are using, you needed to link with librt to use {{clock_gettime}}. Added. I also fixed a warning message in {{test_libhdfs_threaded}} Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch, HDFS-6287.004.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6252) Namenode old webUI should be deprecated
[ https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983314#comment-13983314 ] Jing Zhao commented on HDFS-6252: - The current patch looks good to me. The failed tests should be unrelated. +1 Maybe we should wait for a couple days to let others who are still using the old WebUI review the patch also. We may also need to open new jiras to add those that are only in the old UI to the new UI. Namenode old webUI should be deprecated --- Key: HDFS-6252 URL: https://issues.apache.org/jira/browse/HDFS-6252 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Fengdong Yu Assignee: Haohui Mai Priority: Minor Attachments: HDFS-6252.000.patch, HDFS-6252.001.patch, HDFS-6252.002.patch, HDFS-6252.003.patch, HDFS-6252.004.patch, HDFS-6252.005.patch, HDFS-6252.006.patch We've deprecated hftp and hsftp in HDFS-5570, so if we always download file from download this file on the browseDirectory.jsp, it will throw an error: Problem accessing /streamFile/*** because streamFile servlet was deleted in HDFS-5570. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6252) Namenode old webUI should be deprecated
[ https://issues.apache.org/jira/browse/HDFS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6252: Target Version/s: 3.0.0 Namenode old webUI should be deprecated --- Key: HDFS-6252 URL: https://issues.apache.org/jira/browse/HDFS-6252 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Fengdong Yu Assignee: Haohui Mai Priority: Minor Attachments: HDFS-6252.000.patch, HDFS-6252.001.patch, HDFS-6252.002.patch, HDFS-6252.003.patch, HDFS-6252.004.patch, HDFS-6252.005.patch, HDFS-6252.006.patch We've deprecated hftp and hsftp in HDFS-5570, so if we always download file from download this file on the browseDirectory.jsp, it will throw an error: Problem accessing /streamFile/*** because streamFile servlet was deleted in HDFS-5570. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983334#comment-13983334 ] Colin Patrick McCabe commented on HDFS-6294: bq. There's a test for this in HADOOP-9361 which attempts to rename a file being appended to. Presumably that test will pass once this patch has gone through? Yeah, I believe that will pass after this patch. There is also a test included as part of this patch which is HDFS-specific, called {{testLeaseAfterRenameAndRecreate}}, which tests a similar thing. The HDFS test also looks at some HDFS-specific stuff like leases. Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983361#comment-13983361 ] Haohui Mai commented on HDFS-6293: -- bq. Another issue is the complete change of format/content in OIV's XML output. The XML format in both the legacy and the PB-base code intends to match the physical layout of the FSImage for fast processing. The layout of the FSImage is totally private, which means that there are very few compatibility guarantees that you can rely on. We should have clarify it early on. bq. It does not provide readily usable directory/file information as it used to in pre-2.4/protobuf versions. This is by design. A format based on records instead of hierarchical structure is more robust (especially with snapshot), and it allows parallel processing. The rationale has been articulated in the document attached on HDFS-5698. With a FSImage that is as big as yours, I suggest parsing the protobuf records directly and importing them to hive / pig for more efficient queries. This has been articulated in HDFS-5952. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983362#comment-13983362 ] Kihwal Lee commented on HDFS-6293: -- [~va...@rededc.com.br]: Thanks for sharing your experience. That's certainly an improvement, but that's still too big and 140M is not the largest name space we have to deal with. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983412#comment-13983412 ] Kihwal Lee commented on HDFS-6293: -- bq. This is by design. I understand that it has merits over the old way. But you cannot simply ignore existing use cases. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6258) Support XAttrs from NameNode and implements XAttr APIs for DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983413#comment-13983413 ] Charles Lamb commented on HDFS-6258: Yi, Probably the easiest thing to do is for you to commit your patch and then I'll generate a patch with my comments. Charles Support XAttrs from NameNode and implements XAttr APIs for DistributedFileSystem Key: HDFS-6258 URL: https://issues.apache.org/jira/browse/HDFS-6258 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6258.1.patch, HDFS-6258.2.patch, HDFS-6258.3.patch, HDFS-6258.patch This JIRA is to implement extended attributes in HDFS: support XAttrs from NameNode, implements XAttr APIs for DistributedFileSystem and so on. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983459#comment-13983459 ] Hadoop QA commented on HDFS-6288: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642270/HDFS-6288.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6755//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6755//console This message is automatically generated. DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6288: -- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks for the contribution Juan! DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983508#comment-13983508 ] Hudson commented on HDFS-6288: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5581 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5581/]) HDFS-6288. DFSInputStream Pread doesn't update ReadStatistics. Contributed by Juan Yu. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1590776) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6288) DFSInputStream Pread doesn't update ReadStatistics
[ https://issues.apache.org/jira/browse/HDFS-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983511#comment-13983511 ] Juan Yu commented on HDFS-6288: --- cool, thanks. DFSInputStream Pread doesn't update ReadStatistics -- Key: HDFS-6288 URL: https://issues.apache.org/jira/browse/HDFS-6288 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Juan Yu Assignee: Juan Yu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6288.002.patch, HDFS-6288.1.patch DFSInputStream Pread doesn't update ReadStatistics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times
[ https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983547#comment-13983547 ] Hadoop QA commented on HDFS-6287: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642293/HDFS-6287.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6757//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6757//console This message is automatically generated. Add vecsum test of libhdfs read access times Key: HDFS-6287 URL: https://issues.apache.org/jira/browse/HDFS-6287 Project: Hadoop HDFS Issue Type: Test Components: libhdfs, test Affects Versions: 2.5.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, HDFS-6287.003.patch, HDFS-6287.004.patch Add vecsum, a benchmark that tests libhdfs access times. This includes short-circuit, zero-copy, and standard libhdfs access modes. It also has a local filesystem mode for comparison. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981608#comment-13981608 ] Sanjay Radia edited comment on HDFS-5851 at 4/28/14 10:23 PM: -- Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). I have reproduced the text from the updated doc here for convenience: Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is outside the address space of an application and allows sharing RDDs across applications. Both Tachyon and DDMs use memory mapped files and lazy writing to reduce the need to recompute. Tachyon, since it is an RDD implementation, records the computation in order to regenerate the data in case of loss whereas DDMs relies on the application to regenerate. Tachyon and RDDs do not have a notion of discardability, which is fundamental to DDMs where data can be discarded when it is under memory and/or backing store pressure. DDMs are closest to virtual memory/anti-caching in that they virtualize memory, with the twist that data can be discarded. was (Author: sanjay.radia): Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). I have reproduced the text from the updated doc here for convenience: Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is outside the address space of an application and allows sharing RDDs across applications. Both Tachyon and DDMs use memory mapped files and lazy writing to reduce the need to recompute. Tachyon, since it is an RDD implementation, records the computation in order to regenerate the data in case of loss whereas DDMs relies on the application to regenerate. Tachyon and RDDs do not have a notion of discardability, which is fundamental to DDMs where data can be discarded when it is under memory and/or backing store pressure. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983679#comment-13983679 ] Aaron T. Myers commented on HDFS-6289: -- I feel confident that the TestBalancerWithNodeGroup failure is spurious. It passes fine on my box, isn't really related to this code, and has been flaky off and on for a long time. HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed
[ https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983692#comment-13983692 ] Hadoop QA commented on HDFS-6294: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642289/HDFS-6294.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestSymlinkHdfsFileContext org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotReplication org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks org.apache.hadoop.hdfs.server.namenode.TestMetaSave org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.server.namenode.TestFSDirectory org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.fs.TestHDFSFileContextMainOperations org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.fs.TestSymlinkHdfsFileSystem org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs org.apache.hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots org.apache.hadoop.hdfs.server.namenode.TestINodeFile org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat org.apache.hadoop.hdfs.server.namenode.TestFileLimit org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks org.apache.hadoop.hdfs.TestSetrepDecreasing {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6758//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6758//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6758//console This message is automatically generated. Use INode IDs to avoid conflicts when a file open for write is renamed -- Key: HDFS-6294 URL: https://issues.apache.org/jira/browse/HDFS-6294 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6294.001.patch Now that we have a unique INode ID for each INode, clients with files that are open for write can use this unique ID rather than a file path when they are requesting more blocks or closing the open file. This will avoid conflicts when a file which is open for write is renamed, and another file with that name is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983734#comment-13983734 ] Suresh Srinivas commented on HDFS-6293: --- OfflineImageViewer just dumps the fsimage in a readable format. In the past given hierarchical nature of the fsimage, the information printed was consumable. Now it is no longer so. One solution we can do is - Add an option to print directory tree information (along the lines ls -r) that works against fsimage. Given that the information is printed is no longer dependent on fsimage structure itself, this can be backward compatible output (with the caveats tools having to deal with extra information for newly added features such as ACLs). Once this is in place, we can have backward compatibility expectations on that. What do you guys think? We could also consider either building a tool that works efficiently in memory or reorganize the fsimage to make that possible (hope we do not have to change fsimage, due to incompatibility issues). [~kihwal], can you please provide the use cases you are using OIV for? Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983802#comment-13983802 ] Todd Lipcon commented on HDFS-6289: --- {code} +// TODO(atm): This should be s/storedBlock/block, since we should be +// postponing the info of the reported block, not the stored block, +// though that actually exacerbates the bug, doesn't fix it. {code} Out of context, this comment won't make much sense -- what's the bug it's refe rring to? Maybe you should file a separate follow-up JIRA here for this second i ssue, since you aren't fixing it here? Otherwise lgtm. HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983805#comment-13983805 ] Aaron T. Myers commented on HDFS-6289: -- Thanks for the review, Todd. bq. Maybe you should file a separate follow-up JIRA here for this second issue, since you aren't fixing it here? I could also just fix it here. It seems pretty transparently obvious that we should make that change. Do you agree? If so, I'll just post a patch fixing that as well. HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983825#comment-13983825 ] Todd Lipcon commented on HDFS-6289: --- Is there any test you could write to show that bug? I agree with your logic, but surprised that there isn't some bug that it causes. Given that the current test isn't a regression test for that bug, maybe should tackle it separately? HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983835#comment-13983835 ] Arpit Agarwal commented on HDFS-5851: - I scheduled a Google+ hangout for 4/30 3-4pm PDT - [link here|https://plus.google.com/events/ckvo7ui46qihd6cfq0sqptrhogo?authkey=CMvgrcTOv9n12wE]. Let me know if you are unable to access it. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6289: - Attachment: HDFS-6289.patch You're right, should probably take care of this separately. I only incidentally discovered it while looking into this issue, but it really is a separate bug. I'll file another JIRA once this one's committed. Latest patch makes the TODO more clear when read out of context, and addresses Yongjun's feedback. Todd, this look OK to you? HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch, HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
Andrew Wang created HDFS-6295: - Summary: Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6295: -- Attachment: hdfs-6295-1.patch Patch attached. This adds a new DN state decommissioning, and lets users query just certain states through dfsadmin. This meant changing the output format of -report slightly. I also took the opportunity to fix up the whitespace in the report function (had an extra indent). Also improved the whitespace for dfsadmin's usage/help related to report. Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6295-1.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6295: -- Status: Patch Available (was: Open) Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6295-1.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983870#comment-13983870 ] Tsz Wo Nicholas Sze commented on HDFS-2882: --- Vinay, the patch cannot be applied anymore. Could you update it? DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Vinayakumar B Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-4211) failed volume causes DataNode#getVolumeInfo NPEs on multi-BP DN
[ https://issues.apache.org/jira/browse/HDFS-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-4211. --- Resolution: Duplicate Resolving this as a duplicate of HDFS-2882. failed volume causes DataNode#getVolumeInfo NPEs on multi-BP DN --- Key: HDFS-4211 URL: https://issues.apache.org/jira/browse/HDFS-4211 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson On a DN with {{failed.volumes.tolerated=0}} a disk went bad. After restarting the DN, the following backtrace was observed when accessing {{/jmx}}: {code} 2012-06-12 16:21:43,248 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute VolumeInfo of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:869) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:670) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:315) at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:293) at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:193) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:947) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getVolumeInfo(DataNode.java:2130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) {code} Since tolerated=0 the DN should have errored out rather than starting up, but due to having multiple BPs configured the DN does not exit correctly in this situation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983920#comment-13983920 ] Hadoop QA commented on HDFS-2882: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615147/HDFS-2882.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6760//console This message is automatically generated. DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Vinayakumar B Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983926#comment-13983926 ] Hadoop QA commented on HDFS-2882: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615147/HDFS-2882.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6761//console This message is automatically generated. DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Vinayakumar B Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-5168: -- Component/s: namenode BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica we have to exclude from the search space all nodes from the node groups where the chosen node belongs. This approach may require major changes in the NetworkTopology. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983960#comment-13983960 ] Tsz Wo Nicholas Sze commented on HDFS-5168: --- DNSToSwitchMapping is a \@Public \@Evolving interface so that we have to change it in a compatible manner (otherwise, we cannot commit this to branch-2) . We should avoid adding the new getDependency(..) method to it. How about we add another interface class, say DNSToSwitchMappingWithDependency, and keep DNSToSwitchMapping unchanged? More details: - DNSToSwitchMappingWithDependency extends DNSToSwitchMapping and adds the new getDependency(..) method. - ScriptBasedMappingWithDependency extends ScriptBasedMapping and RawScriptBasedMappingWithDependency extends RawScriptBasedMapping; change ScriptBasedMapping and RawScriptBasedMapping to allow inheritance. - Add dependency cache support to ScriptBasedMappingWithDependency. - DatanodeManager checks if dnsToSwitchMapping instanceof DNSToSwitchMappingWithDependency. If yes, cast the object and get dependencies; otherwise, use empty list. - CachedDNSToSwitchMapping and TableMapping remains unchanged. BlockPlacementPolicy does not work for cross node group dependencies Key: HDFS-5168 URL: https://issues.apache.org/jira/browse/HDFS-5168 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nikola Vujic Assignee: Nikola Vujic Priority: Critical Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch Block placement policies do not work for cross rack/node group dependencies. In reality this is needed when compute servers and storage fall in two independent fault domains, then both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide proper block placement. Let's suppose that we have Hadoop cluster with one rack with two servers, and we run 2 VMs per server. Node group topology for this cluster would be: server1-vm1 - /d1/r1/n1 server1-vm2 - /d1/r1/n1 server2-vm1 - /d1/r1/n2 server2-vm2 - /d1/r1/n2 This is working fine as long as server and storage fall into the same fault domain but if storage is in a different fault domain from the server, we will not be able to handle that. For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1, then we must not place two replicas on these two nodes although they are in different node groups. Two possible approaches: - One approach would be to define cross rack/node group dependencies and to use them when excluding nodes from the search space. This looks as the cleanest way to fix this as it requires minor changes in the BlockPlacementPolicy classes. - Other approach would be to allow nodes to fall in more than one node group. When we chose a node to hold a replica we have to exclude from the search space all nodes from the node groups where the chosen node belongs. This approach may require major changes in the NetworkTopology. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
[ https://issues.apache.org/jira/browse/HDFS-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983964#comment-13983964 ] Hadoop QA commented on HDFS-6289: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642380/HDFS-6289.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6759//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6759//console This message is automatically generated. HA failover can fail if there are pending DN messages for DNs which no longer exist --- Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-6289.patch, HDFS-6289.patch In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6165) hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory
[ https://issues.apache.org/jira/browse/HDFS-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6165: Attachment: HDFS-6165.005.patch hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory -- Key: HDFS-6165 URL: https://issues.apache.org/jira/browse/HDFS-6165 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6165.001.patch, HDFS-6165.002.patch, HDFS-6165.003.patch, HDFS-6165.004.patch, HDFS-6165.004.patch, HDFS-6165.005.patch Given a directory owned by user A with WRITE permission containing an empty directory owned by user B, it is not possible to delete user B's empty directory with either hdfs dfs -rm -r or hdfs dfs -rmdir. Because the current implementation requires FULL permission of the empty directory, and throws exception. On the other hand, on linux, rm -r and rmdir command can remove empty directory as long as the parent directory has WRITE permission (and prefix component of the path have EXECUTE permission), For the tested OSes, some prompt user asking for confirmation, some don't. Here's a reproduction: {code} [root@vm01 ~]# hdfs dfs -ls /user/ Found 4 items drwxr-xr-x - userabc users 0 2013-05-03 01:55 /user/userabc drwxr-xr-x - hdfssupergroup 0 2013-05-03 00:28 /user/hdfs drwxrwxrwx - mapred hadoop 0 2013-05-03 00:13 /user/history drwxr-xr-x - hdfssupergroup 0 2013-04-14 16:46 /user/hive [root@vm01 ~]# hdfs dfs -ls /user/userabc Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds drwxr-xr-x - hdfsusers 0 2013-05-03 01:54 /user/userabc/foo drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# hdfs dfs -ls /user/userabc/foo/ [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -r -skipTrash /user/userabc/foo rm: Permission denied: user=userabc, access=ALL, inode=/user/userabc/foo:hdfs:users:drwxr-xr-x {code} The super user can delete the directory. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -rm -r -skipTrash /user/userabc/foo Deleted /user/userabc/foo {code} The same is not true for files, however. They have the correct behavior. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -touchz /user/userabc/foo-file [root@vm01 ~]# hdfs dfs -ls /user/userabc/ Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds -rw-r--r-- 1 hdfsusers 0 2013-05-03 02:11 /user/userabc/foo-file drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -skipTrash /user/userabc/foo-file Deleted /user/userabc/foo-file {code} Using hdfs dfs -rmdir command: {code} bash-4.1$ hadoop fs -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:29 /user drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:28 /user/hdfs drwxr-xr-x - usrabc users 0 2014-03-28 23:39 /user/usrabc drwxr-xr-x - abcabc 0 2014-03-28 23:39 /user/usrabc/foo-empty1 [root@vm01 usrabc]# su usrabc [usrabc@vm01 ~]$ hdfs dfs -rmdir /user/usrabc/foo-empty1 rmdir: Permission denied: user=usrabc, access=ALL, inode=/user/usrabc/foo-empty1:abc:abc:drwxr-xr-x {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6165) hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory
[ https://issues.apache.org/jira/browse/HDFS-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983980#comment-13983980 ] Yongjun Zhang commented on HDFS-6165: - Hi, Thanks a lot for your earlier comments, and thanks Andrew a lot for the detailed review! I just updated patch version 005 to address all. For rmdir, it's the solution I described in above; For rmr solution, I actually did {code} void checkPermission(String path, INodeDirectory root, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess, boolean ignoreEmptyDir, boolean resolveLink) {code} The two parameters subAccess and ignoreEmptyDir work together, - if subAccess is not NULL, access permission of subDirs are checked, - when subAccess is checked, if ignoreEmptyDir is true, ignore empty directories. To address Andrew's comments {quote} I think the semantics for a recursive delete via DistributedFileSystem#delete are still not quite right. The change you made will work for the shell since it does its own recursion, but we need to do the same remove if empty dir with read when recursing via recursive = true too. You might be able to do this by modifying FSPermissionChecker#checkSubAccess appropriately, but a new flag or new code would be safer. {quote} Thanks a lot for pointing this out, indeed it's a problem there. See above described solution, except we agreed that we don't need to check permission for empty dir. {quote} isDirectory, can we add per-parameter javadoc rather than stacking on the @return? I think renaming empty to isEmpty would also help. Nit, also need a space in the ternary empty? and dir.isEmptyDirectory(src)?. {quote} These are now gone with new solution. {code} In Delete, I think it's a bit cleaner to do an instanceof PathIsNotEmptyDirectoryException.class check instead. {code} This is handled in a better way now. I discovered a bug HADOOP-10543 (and posted a patch) when looking at this. With HADOOP-10543 committed, I would be able to do exactly what Andrew suggested. But I think what I have in this new revision should fine too. {quote} Some lines longer than 80 chars {quote} Hopefully all addressed:-) {quote} TestFsShellPrivilege: I gave this a quick pass, but overall it may be better to rewrite these to use the DFS API instead of the shell. We need to test recursive delete, which the shell doesn't do, and we don't really have any shell changes in the latest rev, which lessens the importance of having new shell tests. {quote} I think adding a test infra like what I added give another option here, hopefully the new revision looks better:-) {quote} execCmd needs to do some try/finally to close and restore the streams if there's an exception. Also an extra commented line there. {quote} This FsShell actually took care of catching exception, so the stream will not get lost. Extra comment line removed. {quote} Could we rename this file to TestFsShellPermission? Permission is a more standard term. {quote} Done. {quote} This file also should not be in hadoop-tools, but rather hadoop-common. {quote} Because it uses MiniDFSCluster, it can not be in hadoop-common. But I moved to hdfs test area now. {quote} This does a lot of starting and stopping of a MiniCluster for running single-line tests. Can we combine these into a single test? We also don't need any DNs for this cluster, since we're just testing perms. {quote} I refactored the code to take care of this. Since we create file, I still keep DNs. {quote} We have FileSystemTestHelper#createFile for creating files, can save some code Use of @Before and @After blocks might also clarify what's going on. This also should be a JUnit4 test with @Test annotations, not JUnit3. USER_UGI should not be all caps, it's not static final It's a bit ugly how we pass UNEXPECTED_RESULT in for a lot of tests. Can we just pass a boolean for expectSuccess or expectFailure, or maybe a String that we can call assertExceptionContains on? {quote} All are taken care of, except I forgot @before and @After, but hoopefully it looks much better now. {quote} FileEntry looks basically like a FileStatus, can we just use that instead? {quote} FileEntry only have the fields needed for this test, and it's easier to manage in test area. I'm worried using FileStatus would be not easy to control. So I didn't do this. Hope it's acceptable. Thanks in advance for a further review. hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory -- Key: HDFS-6165 URL: https://issues.apache.org/jira/browse/HDFS-6165 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983998#comment-13983998 ] Sanghyun Yun commented on HDFS-5147: hdfs dfsadmin -safemode enter This command's result is first namenode(it doesn't matter that active) status changes to safemode. I think should be changed active namenode or both. hdfs dfsadmin -fs hdfs://CLUSTERNAME -safemode enter This command's result is same. Is it correct? I performed in r2.2.0 Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup --- Key: HDFS-5147 URL: https://issues.apache.org/jira/browse/HDFS-5147 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jing Zhao There are certain commands in dfsadmin return the status of the first namenode specified in the configs rather than interacting with the active namenode For example. Issue hdfs dfsadmin -safemode get and it will return the status of the first namenode in the configs rather than the active namenode. I think all dfsadmin commands should determine which is the active namenode do the operation on it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6165) hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory
[ https://issues.apache.org/jira/browse/HDFS-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984008#comment-13984008 ] Hadoop QA commented on HDFS-6165: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642397/HDFS-6165.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6763//console This message is automatically generated. hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory -- Key: HDFS-6165 URL: https://issues.apache.org/jira/browse/HDFS-6165 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor Attachments: HDFS-6165.001.patch, HDFS-6165.002.patch, HDFS-6165.003.patch, HDFS-6165.004.patch, HDFS-6165.004.patch, HDFS-6165.005.patch Given a directory owned by user A with WRITE permission containing an empty directory owned by user B, it is not possible to delete user B's empty directory with either hdfs dfs -rm -r or hdfs dfs -rmdir. Because the current implementation requires FULL permission of the empty directory, and throws exception. On the other hand, on linux, rm -r and rmdir command can remove empty directory as long as the parent directory has WRITE permission (and prefix component of the path have EXECUTE permission), For the tested OSes, some prompt user asking for confirmation, some don't. Here's a reproduction: {code} [root@vm01 ~]# hdfs dfs -ls /user/ Found 4 items drwxr-xr-x - userabc users 0 2013-05-03 01:55 /user/userabc drwxr-xr-x - hdfssupergroup 0 2013-05-03 00:28 /user/hdfs drwxrwxrwx - mapred hadoop 0 2013-05-03 00:13 /user/history drwxr-xr-x - hdfssupergroup 0 2013-04-14 16:46 /user/hive [root@vm01 ~]# hdfs dfs -ls /user/userabc Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds drwxr-xr-x - hdfsusers 0 2013-05-03 01:54 /user/userabc/foo drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# hdfs dfs -ls /user/userabc/foo/ [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -r -skipTrash /user/userabc/foo rm: Permission denied: user=userabc, access=ALL, inode=/user/userabc/foo:hdfs:users:drwxr-xr-x {code} The super user can delete the directory. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -rm -r -skipTrash /user/userabc/foo Deleted /user/userabc/foo {code} The same is not true for files, however. They have the correct behavior. {code} [root@vm01 ~]# sudo -u hdfs hdfs dfs -touchz /user/userabc/foo-file [root@vm01 ~]# hdfs dfs -ls /user/userabc/ Found 8 items drwx-- - userabc users 0 2013-05-02 17:00 /user/userabc/.Trash drwxr-xr-x - userabc users 0 2013-05-03 01:34 /user/userabc/.cm drwx-- - userabc users 0 2013-05-03 01:06 /user/userabc/.staging drwxr-xr-x - userabc users 0 2013-04-14 18:31 /user/userabc/apps drwxr-xr-x - userabc users 0 2013-04-30 18:05 /user/userabc/ds -rw-r--r-- 1 hdfsusers 0 2013-05-03 02:11 /user/userabc/foo-file drwxr-xr-x - userabc users 0 2013-04-30 16:18 /user/userabc/maven_source drwxr-xr-x - hdfsusers 0 2013-05-03 01:40 /user/userabc/test-restore [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -skipTrash /user/userabc/foo-file Deleted /user/userabc/foo-file {code} Using hdfs dfs -rmdir command: {code} bash-4.1$ hadoop fs -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:29 /user drwxr-xr-x - hdfs supergroup 0 2014-03-25 16:28 /user/hdfs drwxr-xr-x - usrabc users 0 2014-03-28 23:39 /user/usrabc drwxr-xr-x - abcabc 0 2014-03-28 23:39 /user/usrabc/foo-empty1 [root@vm01 usrabc]# su usrabc [usrabc@vm01 ~]$ hdfs dfs -rmdir
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984016#comment-13984016 ] Jing Zhao commented on HDFS-5147: - [~yunsh], you need to specify specific NN URI in the -fs option, i.e., instead of -fs hdfs://CLUSTERNAME, you may want to use -fs hdfs://NN2_HOST:NN2_PORT if you want to put NN2 into safemode. Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup --- Key: HDFS-5147 URL: https://issues.apache.org/jira/browse/HDFS-5147 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jing Zhao There are certain commands in dfsadmin return the status of the first namenode specified in the configs rather than interacting with the active namenode For example. Issue hdfs dfsadmin -safemode get and it will return the status of the first namenode in the configs rather than the active namenode. I think all dfsadmin commands should determine which is the active namenode do the operation on it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6297) Add new CLI cases to reflect new features of dfs and dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984021#comment-13984021 ] Dasha Boudnik commented on HDFS-6297: - I'm looking into this. Add new CLI cases to reflect new features of dfs and dfsadmin - Key: HDFS-6297 URL: https://issues.apache.org/jira/browse/HDFS-6297 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.3.0, 2.4.0 Reporter: Dasha Boudnik Fix For: 3.0.0 Some new features of HDFS aren't covered by the existing TestCLI test cases (snapshot, upgrade, a few other minor ones). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6296) Add new CLI cases to reflect new features of dfs and dfsadmin
Dasha Boudnik created HDFS-6296: --- Summary: Add new CLI cases to reflect new features of dfs and dfsadmin Key: HDFS-6296 URL: https://issues.apache.org/jira/browse/HDFS-6296 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.0, 2.3.0 Reporter: Dasha Boudnik Fix For: 3.0.0 Some new features of HDFS aren't covered by the existing TestCLI test cases (snapshot, upgrade, a few other minor ones). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6297) Add new CLI cases to reflect new features of dfs and dfsadmin
Dasha Boudnik created HDFS-6297: --- Summary: Add new CLI cases to reflect new features of dfs and dfsadmin Key: HDFS-6297 URL: https://issues.apache.org/jira/browse/HDFS-6297 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.0, 2.3.0 Reporter: Dasha Boudnik Fix For: 3.0.0 Some new features of HDFS aren't covered by the existing TestCLI test cases (snapshot, upgrade, a few other minor ones). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984044#comment-13984044 ] Hadoop QA commented on HDFS-6295: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642381/hdfs-6295-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6762//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6762//console This message is automatically generated. Add decommissioning state and node state filtering to dfsadmin Key: HDFS-6295 URL: https://issues.apache.org/jira/browse/HDFS-6295 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6295-1.patch One of the few admin-friendly ways of viewing the list of decommissioning nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes on the cluster, which is prohibitive for large clusters, and also requires manual parsing to look at the decom status. It'd be nicer if we could fetch and display only decommissioning nodes (or just live and dead nodes for that matter). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-2882: Attachment: HDFS-2882.patch Attaching the updated patch. DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Vinayakumar B Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984047#comment-13984047 ] Sanghyun Yun commented on HDFS-5147: [~jingzhao], thanks for your answers. I known -fs option and it works well. But, I think dfsadmin command should be affect active namenode when I don't specify specific NN URI or use -fs hdfs://CLUSTERNAME. Currently, it affect first namenode that doesn't matter active or standby. Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup --- Key: HDFS-5147 URL: https://issues.apache.org/jira/browse/HDFS-5147 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jing Zhao There are certain commands in dfsadmin return the status of the first namenode specified in the configs rather than interacting with the active namenode For example. Issue hdfs dfsadmin -safemode get and it will return the status of the first namenode in the configs rather than the active namenode. I think all dfsadmin commands should determine which is the active namenode do the operation on it. -- This message was sent by Atlassian JIRA (v6.2#6252)