[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-7542: -- Attachment: HDFS-7542.002.patch Add an option to DFSAdmin -safemode wait to ignore connection failures -- Key: HDFS-7542 URL: https://issues.apache.org/jira/browse/HDFS-7542 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch Currently, the _dfsadmin -safemode wait_ command aborts when connection to the NN fails (network glitch, ConnectException when NN is unreachable, EOFException if network link shut down). In certain situations, users have asked for an option to make the command resilient to connection failures. This is useful so that the admin can initiate the wait command despite the NN not being fully up or survive intermittent network issues. With this option, the admin can rely on the wait command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251365#comment-14251365 ] Stephen Chu commented on HDFS-7542: --- TestRollingUpgradeRollback failure is unrelated to these DFSAdmin command changes. I re-ran the test a few times successfully. The release audit warning also seems to be incorrect because all modified files have the Apache license. It's hard to see the exact test name that timed out. Retrying jenkins with the same patch. Add an option to DFSAdmin -safemode wait to ignore connection failures -- Key: HDFS-7542 URL: https://issues.apache.org/jira/browse/HDFS-7542 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch Currently, the _dfsadmin -safemode wait_ command aborts when connection to the NN fails (network glitch, ConnectException when NN is unreachable, EOFException if network link shut down). In certain situations, users have asked for an option to make the command resilient to connection failures. This is useful so that the admin can initiate the wait command despite the NN not being fully up or survive intermittent network issues. With this option, the admin can rely on the wait command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Summary: Running Httpfs UTs using MiniKDC (was: Running Httpfs UTs with testKerberos profile has failures.) Running Httpfs UTs using MiniKDC Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Description: JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the hassle of setting up Kerberos environment to run HttpFS with Kerberos unit tests. (was: UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials.) Running Httpfs UTs using MiniKDC Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the hassle of setting up Kerberos environment to run HttpFS with Kerberos unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Attachment: (was: HDFS-6149.patch) Running Httpfs UTs using MiniKDC Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the hassle of setting up Kerberos environment to run HttpFS with Kerberos unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6149) Running Httpfs UTs using MiniKDC
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251387#comment-14251387 ] Jinghui Wang commented on HDFS-6149: Updating the patch to move TestHttpFSWithKerberos to use MiniKDC rather than depending on native Kerberos environment setup for unit tests. Running Httpfs UTs using MiniKDC Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the hassle of setting up Kerberos environment to run HttpFS with Kerberos unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Attachment: HDFS-6149.patch Running Httpfs UTs using MiniKDC Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Attachments: HDFS-6149.patch JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the hassle of setting up Kerberos environment to run HttpFS with Kerberos unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251415#comment-14251415 ] Hadoop QA commented on HDFS-7527: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687971/HDFS-7527.002.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9070//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9070//console This message is automatically generated. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251420#comment-14251420 ] Frantisek Vacek commented on HDFS-7392: --- Problem can be solved by different implementation of SecurityUtils.StandardHostResolver.getByName(String host) Current implementation {code} interface HostResolver { InetAddress getByName(String host) throws UnknownHostException; } /** * Uses standard java host resolution */ static class StandardHostResolver implements HostResolver { @Override public InetAddress getByName(String host) throws UnknownHostException { return InetAddress.getByName(host); } } {code} Proper implementation should be like {code} interface HostResolver { InetAddress[] getByName(String host) throws UnknownHostException; } /** * Uses standard java host resolution */ static class StandardHostResolver implements HostResolver { @Override public InetAddress[] getByName(String host) throws UnknownHostException { return InetAddress.getAllByName(host); } } {code} org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever - Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Assignee: Yi Liu Attachments: 1.png, 2.png In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.example.com Server: 127.0.1.1 Address:127.0.1.1#53 share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.223 Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251473#comment-14251473 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251469#comment-14251469 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251484#comment-14251484 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251488#comment-14251488 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/779/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
Harsh J created HDFS-7546: - Summary: Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Harsh J Priority: Minor This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-7546: -- Attachment: HDFS-7546.patch Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Harsh J Priority: Minor Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-7546: -- Assignee: Harsh J Target Version/s: 2.7.0 Affects Version/s: 2.1.1-beta Status: Patch Available (was: Open) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.1.1-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251567#comment-14251567 ] Hadoop QA commented on HDFS-7542: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687988/HDFS-7542.002.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9071//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9071//console This message is automatically generated. Add an option to DFSAdmin -safemode wait to ignore connection failures -- Key: HDFS-7542 URL: https://issues.apache.org/jira/browse/HDFS-7542 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch Currently, the _dfsadmin -safemode wait_ command aborts when connection to the NN fails (network glitch, ConnectException when NN is unreachable, EOFException if network link shut down). In certain situations, users have asked for an option to make the command resilient to connection failures. This is useful so that the admin can initiate the wait command despite the NN not being fully up or survive intermittent network issues. With this option, the admin can rely on the wait command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-6833: - Attachment: HDFS-6833-13.patch DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.5.0, 2.5.1 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Critical Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251581#comment-14251581 ] Shinichi Yamashita commented on HDFS-6833: -- Hi [~yzhangal], Thank you for your review! My previous patch file was not sufficient. I attach a patch file that fixed two things. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.5.0, 2.5.1 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Critical Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251637#comment-14251637 ] Frantisek Vacek commented on HDFS-7392: --- Please ignore my previous proposal, it will not work. org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever - Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Assignee: Yi Liu Attachments: 1.png, 2.png In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.example.com Server: 127.0.1.1 Address:127.0.1.1#53 share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.223 Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251683#comment-14251683 ] Allen Wittenauer commented on HDFS-7546: Is it just namenode or is it any service that has Kerberos configured? Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.1.1-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251715#comment-14251715 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251711#comment-14251711 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251729#comment-14251729 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frantisek Vacek updated HDFS-7392: -- Attachment: HDFS-7392.diff org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever - Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Assignee: Yi Liu Attachments: 1.png, 2.png, HDFS-7392.diff In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.example.com Server: 127.0.1.1 Address:127.0.1.1#53 share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.223 Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251739#comment-14251739 ] Frantisek Vacek commented on HDFS-7392: --- Finaly, I've created promissed patch. It is attached as HDFS-7392.diff . It is not a final solution of course, but it is working and I hope that it brings a light to the problem we are facing on. Yi, please contact me if you need more info or explanation. regards Fanda org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever - Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Assignee: Yi Liu Attachments: 1.png, 2.png, HDFS-7392.diff In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.example.com Server: 127.0.1.1 Address:127.0.1.1#53 share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.223 Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251741#comment-14251741 ] Frantisek Vacek commented on HDFS-7392: --- I'm also attaching a log whent the patch is applied. {code} DEBUG [main] (Client.java:697) - Connecting to share.merck.com/54.40.29.65:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 1 time(s); maxRetries=45 WARN [main] (Client.java:564) - Address change detected. Old: share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 2 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 1 time(s); maxRetries=45 WARN [main] (Client.java:564) - Address change detected. Old: share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 2 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 3 time(s); maxRetries=45 WARN [main] (Client.java:564) - Address change detected. Old: share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 4 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 3 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 4 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 5 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 6 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 7 time(s); maxRetries=45 WARN [main] (Client.java:564) - Address change detected. Old: share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 8 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 5 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 6 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 7 time(s); maxRetries=45 WARN [main] (Client.java:564) - Address change detected. Old: share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 8 time(s); maxRetries=45 INFO [main] (Client.java:816) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 9 time(s); maxRetries=45 {code} org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever - Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Assignee: Yi Liu Attachments: 1.png, 2.png, HDFS-7392.diff In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.example.com Server: 127.0.1.1 Address:127.0.1.1#53 share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.223 Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com Address: 192.168.1.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message
[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7546: Labels: supportability (was: ) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.1.1-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor Labels: supportability Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
Binglin Chang created HDFS-7547: --- Summary: Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251725#comment-14251725 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Description: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Description: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. (was: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test ) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Status: Patch Available (was: Open) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Attachment: HDFS-7547.001.patch Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-7547.001.patch HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251769#comment-14251769 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251773#comment-14251773 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251781#comment-14251781 ] Yongjun Zhang commented on HDFS-7546: - Hi [~qwertymaniac], Thanks for reporting the issue and providing patch. I labeled it as supportability. I reviewed the change and have a few comments. * The description of the property can be improved with more information. What about: {code} A client-side property that describes permitted server principal pattern. It can be configured to control allowed realms to authenticate with, which is useful in cross-realm environment. {code} * what's the current default of this property prior to your change? * wonder if there is any catch by changing the default pattern to *, which essentially accepts any pattern? Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.1.1-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor Labels: supportability Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
[ https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251786#comment-14251786 ] Hadoop QA commented on HDFS-7546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688010/HDFS-7546.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9072//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9072//console This message is automatically generated. Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern -- Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.1.1-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor Labels: supportability Attachments: HDFS-7546.patch This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251811#comment-14251811 ] Hudson commented on HDFS-7531: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, HDFS-7531.002.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251807#comment-14251807 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
Rushabh S Shah created HDFS-7548: Summary: Corrupt block reporting delayed until datablock scanner thread detects it Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548.patch Whenever in write pipeline if the datanode detects any checksum error while transferring the block to target node, that particular block is added to first position in the blockInfoSet with setting the lastScanTime to 0. This will make the BlockPoolSliceScanner to pick this block first since that data structure is sorted by lastScanTime. In this way, we will scan this corrupt block first and will report it to namenode. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()
[ https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-7538: - Assignee: Ted Yu Status: Patch Available (was: Open) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo() - Key: HDFS-7538 URL: https://issues.apache.org/jira/browse/HDFS-7538 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-7538-001.patch {code} if (removedDst != null) { undoRemoveDst = false; ... if (undoRemoveDst) { // Rename failed - restore dst if (dstParent.isDirectory() dstParent.asDirectory().isWithSnapshot()) { dstParent.asDirectory().undoRename4DstParent(removedDst, {code} If the first if check doesn't pass, removedDst would be null and undoRemoveDst may be true. This combination would lead to NullPointerException in the finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()
[ https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-7538: - Attachment: hdfs-7538-001.patch removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo() - Key: HDFS-7538 URL: https://issues.apache.org/jira/browse/HDFS-7538 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: hdfs-7538-001.patch {code} if (removedDst != null) { undoRemoveDst = false; ... if (undoRemoveDst) { // Rename failed - restore dst if (dstParent.isDirectory() dstParent.asDirectory().isWithSnapshot()) { dstParent.asDirectory().undoRename4DstParent(removedDst, {code} If the first if check doesn't pass, removedDst would be null and undoRemoveDst may be true. This combination would lead to NullPointerException in the finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7464) TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 8
[ https://issues.apache.org/jira/browse/HDFS-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-7464: -- Assignee: (was: Chen He) TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 8 --- Key: HDFS-7464 URL: https://issues.apache.org/jira/browse/HDFS-7464 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/ : {code} REGRESSION: org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration Error Message: refreshSuperUserGroupsConfiguration: End of File Exception between local host is: asf908.gq1.ygridcore.net/67.195.81.152; destination host is: localhost:12700; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException expected:0 but was:-1 Stack Trace: java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File Exception between local host is: asf908.gq1.ygridcore.net/67.195.81.152; destination host is: localhost:12700; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException expected:0 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:228) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252032#comment-14252032 ] Hadoop QA commented on HDFS-7547: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688049/HDFS-7547.001.patch against trunk revision 389f881. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9074//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9074//console This message is automatically generated. Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-7547.001.patch HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7373) Clean up temporary files after fsimage transfer failures
[ https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7373: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Akira. I've committed this to trunk and branch-2. Clean up temporary files after fsimage transfer failures Key: HDFS-7373 URL: https://issues.apache.org/jira/browse/HDFS-7373 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-7373.patch When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in each storage directory. If the size of name space is large, these files can take up quite a bit of space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures
[ https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252050#comment-14252050 ] Hudson commented on HDFS-7373: -- FAILURE: Integrated in Hadoop-trunk-Commit #6747 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6747/]) HDFS-7373. Clean up temporary files after fsimage transfer failures. Contributed by Kihwal Lee (kihwal: rev c0d666c74e9ea76564a2458c6c0a78ae7afa9fea) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java Clean up temporary files after fsimage transfer failures Key: HDFS-7373 URL: https://issues.apache.org/jira/browse/HDFS-7373 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-7373.patch When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in each storage directory. If the size of name space is large, these files can take up quite a bit of space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7443: --- Summary: Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume (was: Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Priority: Blocker When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252063#comment-14252063 ] Ming Ma commented on HDFS-5535: --- Opened https://issues.apache.org/jira/browse/HDFS-7541 to explore ideas for more efficient DN rolling upgrades. Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Assignee: Tsz Wo Nicholas Sze Fix For: 2.4.0 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, HDFSRollingUpgradesHighLevelDesign.v2.pdf, HDFSRollingUpgradesHighLevelDesign.v3.pdf, h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, h5535_20140221-2031.patch, h5535_20140224-1931.patch, h5535_20140225-1225.patch, h5535_20140226-1328.patch, h5535_20140226-1911.patch, h5535_20140227-1239.patch, h5535_20140228-1714.patch, h5535_20140304-1138.patch, h5535_20140304-branch-2.patch, h5535_20140310-branch-2.patch, hdfs-5535-test-plan.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-7443: -- Assignee: Colin Patrick McCabe Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252079#comment-14252079 ] Jing Zhao commented on HDFS-7543: - Thanks for working on this, Haohui! The patch looks good to me. The test failures should be unrelated. +1 Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7543: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks Jing for the reviews. Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252104#comment-14252104 ] Hudson commented on HDFS-7543: -- FAILURE: Integrated in Hadoop-trunk-Commit #6749 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6749/]) HDFS-7543. Avoid path resolution when getting FileStatus for audit logs. Contributed by Haohui Mai. (wheat9: rev 65f2a4ee600dfffa5203450261da3c1989de25a9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAclOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirMkdirOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-3107: -- Attachment: HDFS-3107.patch Updating the patches once again. This update is mostly related to [~jingzhao]'s refactoring of HDFS-7509. There are no changes to the truncate logic itself. Just to remind people here. The snapshot part of truncate is being maintained under HDFS-7056. And the combined patch for the two issues is also submitted there (per [~cmccabe]'s request). HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7056: -- Status: Open (was: Patch Available) Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7056: -- Attachment: HDFS-7056.patch Updating the patches once again. This is the snapshot part of truncate. The update is mostly related to [~jingzhao]'s refactoring of HDFS-7509. There are no changes to the truncate logic itself. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Status: Patch Available (was: Open) Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: HDFS-3107-HDFS-7056-combined.patch Attaching combined patch. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252156#comment-14252156 ] Hadoop QA commented on HDFS-7056: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688105/HDFS-3107-HDFS-7056-combined.patch against trunk revision ef1fc51. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9077//console This message is automatically generated. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252184#comment-14252184 ] Hadoop QA commented on HDFS-7548: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688062/HDFS-7548.patch against trunk revision 389f881. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9075//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9075//console This message is automatically generated. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252209#comment-14252209 ] Charles Lamb commented on HDFS-7543: Sorry I'm late to the party. Here are a few comments: FSDirMkdirOp.java in #mkdirs, you removed the final String srcArg = src. This should be left in. Many IDEs will whine about making assignments to formal args and that's why it was put in in the first place. FSDirRenameOp.java #renameToInt, dstIIP (and resultingStat) could benefit from final's. FSDirXAttrOp.java I'm not sure why you've moved the call to getINodesInPath4Write and checkXAttrChangeAccess inside the writeLock. FSDirStatAndListing.java The javadoc for the @param src needs to be changed to reflect that it's an INodesInPath, not a String. Nit: it might be better to rename the INodesInPath arg from src to iip. #getFileInfo4DotSnapshot is now unused since you in-lined it into #getFileInfo. Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252344#comment-14252344 ] Rushabh S Shah commented on HDFS-7548: -- The following tests are passing fine on my local setup on both branch-2 and trunk: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA There is already a jira filed for org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration test failure : HDFS-7547 Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252354#comment-14252354 ] Konstantin Shvachko commented on HDFS-3107: --- Looks like HDFS-7506 broke this again. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7443: --- Attachment: HDFS-7443.001.patch This patch changes the upgrade path from non-blockid-based-layout to blockid-based layout so that it uses the jdk7 {{Files#createLink}} function, instead of our hand-rolled hardlink code. This avoids the dilemma of detecting EEXIST on all the various platforms that we hacked in support for in {{HardLink.java}}, such as Linux shell-based (no libhadoop.so case), cygwin, Windows native, and Linux JNI-based. It might be possible to distinguish regular errors from EEXIST on all those platforms, but writing all that code would be a very big job. I did not remove or alter any other code in {{HardLink.java}} in this patch. I think clearly we should think about refactoring that code to use jdk7 later, but that is a bigger change that is not as critical as this fix. We also can't get rid of {{HardLink.java}} completely because we are unfortunately depending on reading the hard link count of files in a few places-- something jdk7 does not support. Another weird thing about the {{HardLink}} class is that all it actually contains is statistics information-- every important method is {{static}}. So that's why we continue to use a {{HardLink}} instance in the upgrade code. I think in the future, we should simply use the {{HardLink#Statistics}} class directly, since the outer class provides no value (it has only static methods). Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252373#comment-14252373 ] Colin Patrick McCabe edited comment on HDFS-7443 at 12/18/14 10:02 PM: --- Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a duplicate block (present in multiple subdirectories in a volume), so that we are exercising the collision-handling pathway in {{TestDatanodeLayoutUpgrade}}. was (Author: cmccabe): Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a duplicate block (present in multiple subdirectories in a volume), so that we are exercising the collision-handling pathway. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252373#comment-14252373 ] Colin Patrick McCabe commented on HDFS-7443: Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a duplicate block (present in multiple subdirectories in a volume), so that we are exercising the collision-handling pathway. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7443: --- Target Version/s: 2.7.0 (was: 2.6.1) Status: Patch Available (was: Open) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7537: --- Attachment: dfs-min-2.png dfs.namenode.replication.min 1 missing replicas NN restart is confusing --- Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Reporter: Allen Wittenauer Attachments: dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7530) Allow renaming encryption zone roots
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7530: -- Summary: Allow renaming encryption zone roots (was: Allow renaming an Encryption Zone root) Allow renaming encryption zone roots Key: HDFS-7530 URL: https://issues.apache.org/jira/browse/HDFS-7530 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, HDFS-7530.003.patch It should be possible to do hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252377#comment-14252377 ] Allen Wittenauer commented on HDFS-7537: Attached two screenshots that shows the confusion. dfs.namenode.replication.min 1 missing replicas NN restart is confusing --- Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Reporter: Allen Wittenauer Attachments: dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7530) Allow renaming of encryption zone roots
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7530: -- Summary: Allow renaming of encryption zone roots (was: Allow renaming encryption zone roots) Allow renaming of encryption zone roots --- Key: HDFS-7530 URL: https://issues.apache.org/jira/browse/HDFS-7530 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, HDFS-7530.003.patch It should be possible to do hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7537: --- Attachment: dfs-min-2-fsck.png dfs.namenode.replication.min 1 missing replicas NN restart is confusing --- Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Reporter: Allen Wittenauer Attachments: dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252383#comment-14252383 ] Allen Wittenauer commented on HDFS-7537: Mock-up of an fsck that alerts when min rep hasn't actually been met: {code} Status: HEALTHY Total size:236 B Total dirs:1 Total files: 1 Total symlinks:0 Total blocks (validated): 1 (avg. block size 236 B) UNDER MIN REPL'D BLOCKS: 1 (100.0 %) Minimally replicated blocks: 0 (0.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 2 (66.64 %) Number of data-nodes: 1 Number of racks: 1 {code} With all datanodes down (and therefore triggering corrupt/missing blocks): {code} Status: CORRUPT Total size:236 B Total dirs:1 Total files: 1 Total symlinks:0 Total blocks (validated): 1 (avg. block size 236 B) UNDER MIN REPL'D BLOCKS: 1 (100.0 %) CORRUPT FILES:1 MISSING BLOCKS: 1 MISSING SIZE: 236 B CORRUPT BLOCKS: 1 Minimally replicated blocks: 0 (0.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 0.0 Corrupt blocks:1 Missing replicas: 0 Number of data-nodes: 0 Number of racks: 0 FSCK ended at Thu Dec 18 14:08:25 PST 2014 in 13 milliseconds {code} dfs.namenode.replication.min 1 missing replicas NN restart is confusing --- Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Reporter: Allen Wittenauer Attachments: dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()
[ https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252395#comment-14252395 ] Hadoop QA commented on HDFS-7538: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688080/hdfs-7538-001.patch against trunk revision 389f881. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9076//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9076//console This message is automatically generated. removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo() - Key: HDFS-7538 URL: https://issues.apache.org/jira/browse/HDFS-7538 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-7538-001.patch {code} if (removedDst != null) { undoRemoveDst = false; ... if (undoRemoveDst) { // Rename failed - restore dst if (dstParent.isDirectory() dstParent.asDirectory().isWithSnapshot()) { dstParent.asDirectory().undoRename4DstParent(removedDst, {code} If the first if check doesn't pass, removedDst would be null and undoRemoveDst may be true. This combination would lead to NullPointerException in the finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7530) Allow renaming of encryption zone roots
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7530: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Ran tests locally. TestDecom still failed but is true for me unrelated to this patch. Committed to trunk and branch-2, thanks for the contribution Charles. Allow renaming of encryption zone roots --- Key: HDFS-7530 URL: https://issues.apache.org/jira/browse/HDFS-7530 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, HDFS-7530.003.patch It should be possible to do hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming of encryption zone roots
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252407#comment-14252407 ] Hudson commented on HDFS-7530: -- FAILURE: Integrated in Hadoop-trunk-Commit #6753 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6753/]) HDFS-7530. Allow renaming of encryption zone roots. Contributed by Charles Lamb. (wang: rev b0b9084433d5e80131429e6e76858b099deb2dda) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCryptoConf.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Allow renaming of encryption zone roots --- Key: HDFS-7530 URL: https://issues.apache.org/jira/browse/HDFS-7530 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, HDFS-7530.003.patch It should be possible to do hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-7523: Attachment: HDFS-7523-001.txt Retry. The failures look unrelated. Just making sure. Change makes sense to me. I see that the 'hint' DEFAULT_DATA_SOCKET_SIZE is passed elsewhere in the code base as receive size in datanode xceiver and domain peer service. It is also the send size in DFSOutputStream. It not being set here in DFSClient looks like an oversight. Nice one [~xieliang007] I'll commit in next day or so unless objection. Setting a socket receive buffer size in DFSClient - Key: HDFS-7523 URL: https://issues.apache.org/jira/browse/HDFS-7523 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-7523-001.txt, HDFS-7523-001.txt It would be nice if we have a socket receive buffer size while creating socket from client(HBase) view, in old version it should be in DFSInputStream, in trunk it seems should be at: {code} @Override // RemotePeerFactory public Peer newConnectedPeer(InetSocketAddress addr, TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId) throws IOException { Peer peer = null; boolean success = false; Socket sock = null; try { sock = socketFactory.createSocket(); NetUtils.connect(sock, addr, getRandomLocalInterfaceAddr(), dfsClientConf.socketTimeout); peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this, blockToken, datanodeId); peer.setReadTimeout(dfsClientConf.socketTimeout); {code} e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, this value sometimes is small for HBase 64k block reading in a 10G network(at least, more system call) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252437#comment-14252437 ] Arpit Agarwal commented on HDFS-7443: - +1 pending Jenkins. Thanks for ensuring it's covered in testing. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7484: Attachment: HDFS-7484.005.patch Update the patch to fix bugs. In general the current patch also contains changes related to {{INodesInPath}} and {{addINode}}. I will separate them out into another jira. Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch, HDFS-7484.003.patch, HDFS-7484.004.patch, HDFS-7484.005.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252452#comment-14252452 ] Arpit Agarwal commented on HDFS-7443: - Actually I just had a thought. I assumed that the excess copies would be hard links to the same physical file, perhaps due to a bug in the earlier LDir code. If these are distinct physical files, then should we retain the one with the largest on-disk size? Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252451#comment-14252451 ] Konstantin Boudnik commented on HDFS-3107: -- I actually quite like it. I think over the last few iterations the patch was polished enough and the test coverage is quite decent. +1 HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7549) Add GenericTestUtils#disableLog, GenericTestUtils#setLogLevel
Colin Patrick McCabe created HDFS-7549: -- Summary: Add GenericTestUtils#disableLog, GenericTestUtils#setLogLevel Key: HDFS-7549 URL: https://issues.apache.org/jira/browse/HDFS-7549 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Now that we are using both commons-logging and slf4j, we can no longer rely on just casting the Log object to a {{Log4JLogger}} and calling {{setLevel}} on that. With {{org.slf4j.Logger}} objects, we need to look up the underlying {{Log4JLogger}} using {{LogManager#getLogger}}. This patch adds {{GenericTestUtils#disableLog}} and {{GenericTestUtils#setLogLevel}} functions which hide this complexity from unit tests, just allowing the tests to call {{disableLog}} or {{setLogLevel}}, and have {{GenericTestUtils}} figure out the right thing to do based on the log / logger type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-3107: -- Attachment: HDFS-3107.patch Incorporated latest trunk. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252476#comment-14252476 ] Konstantin Boudnik commented on HDFS-3107: -- The diff between the two seems to be quite small, yet I guess it requires a formal review again. Hence +1 again. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252480#comment-14252480 ] Colin Patrick McCabe commented on HDFS-7443: When we observed this, they were not hard links, but separate copies. They were identical (we ran a command-line checksum on them). If possible, I would rather not start trying to pick the best one because I feel like 3x replication should ensure that we have redundancy in the system, and because the code would get a lot more complex. Because we do the hardlinks in parallel, we would have to somehow accumulate the duplicates and deal with them at the end, once all worker threads had been joined. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252478#comment-14252478 ] Haohui Mai commented on HDFS-7543: -- Thanks for the catch. Let's file a follow up jira to clean it up. Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: HDFS-3107-HDFS-7056-combined.patch Attaching newly combined patch. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect
[ https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7431: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 for the patch. I committed it to trunk and branch-2. Yi, thank you for the contribution. log message for InvalidMagicNumberException may be incorrect Key: HDFS-7431 URL: https://issues.apache.org/jira/browse/HDFS-7431 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.7.0 Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, HDFS-7431.003.patch For security mode, HDFS now supports that Datanodes don't require root or jsvc if {{dfs.data.transfer.protection}} is configured. Log message for {{InvalidMagicNumberException}}, we miss one case: when the datanodes run on unprivileged port and {{dfs.data.transfer.protection}} is configured to {{authentication}} but {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required and a low version dfs client is used, then {{InvalidMagicNumberException}} is thrown and we write log: {quote} Failed to read expected encryption handshake from client at Perhaps the client is running an older version of Hadoop which does not support encryption {quote} Recently I run HDFS built on trunk and security is enabled, but the client is 2.5.1 version. Then I got the above log message, but actually I have not configured encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252501#comment-14252501 ] Colin Patrick McCabe commented on HDFS-7527: Thanks for looking at this, [~decster] and [~wheat9]. It's a difficult and frustrating area of the code, in my opinion. Unfortunately, I don't think this latest patch is exactly what we need. Last time we proposed adding more DNS lookups in the {{DatanodeManager}}, the Yahoo guys said this was unacceptable from a performance point of view. Caching DNS lookups, so that we didn't have to do them all the time, is a big part of what the {{HostFileManager}} was created to do. [~daryn], [~eli], do you have any ideas here? TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test:
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252512#comment-14252512 ] Hadoop QA commented on HDFS-7056: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688173/HDFS-3107-HDFS-7056-combined.patch against trunk revision 5df7ecb. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9082//console This message is automatically generated. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect
[ https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252513#comment-14252513 ] Hudson commented on HDFS-7431: -- FAILURE: Integrated in Hadoop-trunk-Commit #6754 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6754/]) HDFS-7431. log message for InvalidMagicNumberException may be incorrect. Contributed by Yi Liu. (cnauroth: rev 5df7ecb33ab24de903f0fd98e2a055164874def5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java log message for InvalidMagicNumberException may be incorrect Key: HDFS-7431 URL: https://issues.apache.org/jira/browse/HDFS-7431 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.7.0 Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, HDFS-7431.003.patch For security mode, HDFS now supports that Datanodes don't require root or jsvc if {{dfs.data.transfer.protection}} is configured. Log message for {{InvalidMagicNumberException}}, we miss one case: when the datanodes run on unprivileged port and {{dfs.data.transfer.protection}} is configured to {{authentication}} but {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required and a low version dfs client is used, then {{InvalidMagicNumberException}} is thrown and we write log: {quote} Failed to read expected encryption handshake from client at Perhaps the client is running an older version of Hadoop which does not support encryption {quote} Recently I run HDFS built on trunk and security is enabled, but the client is 2.5.1 version. Then I got the above log message, but actually I have not configured encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7550) Minor followon cleanups from HDFS-7543
Charles Lamb created HDFS-7550: -- Summary: Minor followon cleanups from HDFS-7543 Key: HDFS-7550 URL: https://issues.apache.org/jira/browse/HDFS-7550 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Priority: Minor The commit of HDFS-7543 crossed paths with these comments: FSDirMkdirOp.java in #mkdirs, you removed the final String srcArg = src. This should be left in. Many IDEs will whine about making assignments to formal args and that's why it was put in in the first place. FSDirRenameOp.java #renameToInt, dstIIP (and resultingStat) could benefit from final's. FSDirXAttrOp.java I'm not sure why you've moved the call to getINodesInPath4Write and checkXAttrChangeAccess inside the writeLock. FSDirStatAndListing.java The javadoc for the @param src needs to be changed to reflect that it's an INodesInPath, not a String. Nit: it might be better to rename the INodesInPath arg from src to iip. #getFileInfo4DotSnapshot is now unused since you in-lined it into #getFileInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252532#comment-14252532 ] Charles Lamb commented on HDFS-7543: Thanks @wheat9. HDFS-7550. Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7543.000.patch The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252541#comment-14252541 ] Colin Patrick McCabe commented on HDFS-7018: Please don't make another copy of {{hdfs.h}} in the source tree. This will lead to the various copies getting out of sync over time, which would be very bad. Instead, just reference the existing copy via a relative path. You can add your {{hdfsGetLastError}} function to this file, and just have a dummy implementation that returns unknown error in all cases for {{libwebhdfs}} and {{libhdfs}}. We can improve this in a follow-on JIRA. {code} 39 #ifdef __cplusplus 40 extern C { 41 #endif {code} While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code. The C\+\+ linker is smart enough to figure out that the prototypes it is seeing correspond to the prototypes in the {{hdfs.h}} file you included. {code} 45 static THREAD_LOCAL const char *ErrorMessage = NULL; 46 static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL; 47 static THREAD_LOCAL hdfs::internal::once_flag once; 48 49 static void CreateMessageBuffer() { 50 ErrorMessageBuffer = new std::string; 51 } {code} I don't think we need all this. Making the thread-local buffer a pointer to a {{std::string}} means that we have to check {{once_flag}} before we access it, which is inefficient. It also means that if the thread exits, this memory will be leaked (unless you set up a POSIX thread destructor, which is complicated and platform-specific). Instead, let's just have a char\[128\] buffer for each thread. As an added bonus, because this utilitizes pre-allocation, it handles the case where you can't allocate memory for the error string itself, which you have said in the past that you care about. {code} 158 private: 159 bool input; 160 void *stream; 161 }; {code} Please don't use {{void*}} here. It is not typesafe. You can clearly see that FS objects and file objects have a concrete type, spelled out in {{hdfs.h}}: {code} struct hdfs_internal; typedef struct hdfs_internal* hdfsFS; struct hdfsFile_internal; typedef struct hdfsFile_internal* hdfsFile; {code} All of these functions need to have a {{catch (...)}} which sets the error message to unknown and returns {{EINTERNAL}}. The reason is that if you attempt to throw a C\+\+ exception through a C API, the program will abort (technically, {{std::terminate}} will be called). I realize you probably think you have caught all possible exceptions, but since this is C\+\+, we can never really be sure without the {{catch (...)}} Implement C interface for libhdfs3 -- Key: HDFS-7018 URL: https://issues.apache.org/jira/browse/HDFS-7018 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch Implement C interface for libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252541#comment-14252541 ] Colin Patrick McCabe edited comment on HDFS-7018 at 12/18/14 11:31 PM: --- Please don't make another copy of {{hdfs.h}} in the source tree. This will lead to the various copies getting out of sync over time, which would be very bad. Instead, just reference the existing copy via a relative path. You can add your {{hdfsGetLastError}} function to this file, and just have a dummy implementation that returns unknown error in all cases for {{libwebhdfs}} and {{libhdfs}}. We can improve this in a follow-on JIRA. {code} 39 #ifdef __cplusplus 40 extern C { 41 #endif {code} While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code. The C\+\+ linker is smart enough to figure out that the prototypes it is seeing correspond to the prototypes in the {{hdfs.h}} file you included. {code} 45 static THREAD_LOCAL const char *ErrorMessage = NULL; 46 static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL; 47 static THREAD_LOCAL hdfs::internal::once_flag once; 48 49 static void CreateMessageBuffer() { 50 ErrorMessageBuffer = new std::string; 51 } {code} I don't think we need all this. Making the thread-local buffer a pointer to a {{std::string}} means that we have to check {{once_flag}} before we access it, which is inefficient. It also means that if the thread exits, this memory will be leaked (unless you set up a POSIX thread destructor, which is complicated and platform-specific). Instead, let's just have a char\[128\] buffer for each thread. As an added bonus, because this utilitizes pre-allocation, it handles the case where you can't allocate memory for the error string itself, which you have said in the past that you care about. {code} 158 private: 159 bool input; 160 void *stream; 161 }; {code} Please don't use {{void*}} here. It is not typesafe. You can clearly see that FS objects and file objects have a concrete type, spelled out in {{hdfs.h}}: {code} struct hdfs_internal; typedef struct hdfs_internal* hdfsFS; struct hdfsFile_internal; typedef struct hdfsFile_internal* hdfsFile; {code} All of these functions need to have a {{catch (...)}} which sets the error message to unknown and returns {{EINTERNAL}}. The reason is that if you attempt to throw a C\+\+ exception through a C API, the program will abort (technically, {{std::terminate}} will be called). I realize you probably think you have caught all possible exceptions, but since this is C\+\+, we can never really be sure without the {{catch (...)}} P.S. thanks for working on this! was (Author: cmccabe): Please don't make another copy of {{hdfs.h}} in the source tree. This will lead to the various copies getting out of sync over time, which would be very bad. Instead, just reference the existing copy via a relative path. You can add your {{hdfsGetLastError}} function to this file, and just have a dummy implementation that returns unknown error in all cases for {{libwebhdfs}} and {{libhdfs}}. We can improve this in a follow-on JIRA. {code} 39 #ifdef __cplusplus 40 extern C { 41 #endif {code} While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code. The C\+\+ linker is smart enough to figure out that the prototypes it is seeing correspond to the prototypes in the {{hdfs.h}} file you included. {code} 45 static THREAD_LOCAL const char *ErrorMessage = NULL; 46 static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL; 47 static THREAD_LOCAL hdfs::internal::once_flag once; 48 49 static void CreateMessageBuffer() { 50 ErrorMessageBuffer = new std::string; 51 } {code} I don't think we need all this. Making the thread-local buffer a pointer to a {{std::string}} means that we have to check {{once_flag}} before we access it, which is inefficient. It also means that if the thread exits, this memory will be leaked (unless you set up a POSIX thread destructor, which is complicated and platform-specific). Instead, let's just have a char\[128\] buffer for each thread. As an added bonus, because this utilitizes pre-allocation, it handles the case where you can't allocate memory for the error string itself, which you have said in the past that you care about. {code} 158 private: 159 bool input; 160 void *stream; 161 }; {code} Please don't use {{void*}} here. It is not typesafe. You can clearly see that FS objects and file objects have a concrete type, spelled out in {{hdfs.h}}: {code} struct hdfs_internal; typedef struct hdfs_internal* hdfsFS; struct hdfsFile_internal; typedef struct hdfsFile_internal* hdfsFile; {code} All of these functions
[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy
[ https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252553#comment-14252553 ] Ming Ma commented on HDFS-7182: --- Anyone else has suggestions on this? The patch has been running fine in one of our production clusters. JMX metrics aren't accessible when NN is busy - Key: HDFS-7182 URL: https://issues.apache.org/jira/browse/HDFS-7182 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7182.patch HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple new metrics have been added. It turns out RollingUpgradeStatus requires FSNamesystem read lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy
[ https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252561#comment-14252561 ] Hadoop QA commented on HDFS-7182: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672609/HDFS-7182.patch against trunk revision 0402bad. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9083//console This message is automatically generated. JMX metrics aren't accessible when NN is busy - Key: HDFS-7182 URL: https://issues.apache.org/jira/browse/HDFS-7182 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7182.patch HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple new metrics have been added. It turns out RollingUpgradeStatus requires FSNamesystem read lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252565#comment-14252565 ] Arpit Agarwal commented on HDFS-7443: - bq. because the code would get a lot more complex. Because we do the hardlinks in parallel, we would have to somehow accumulate the duplicates and deal with them at the end, once all worker threads had been joined. We wouldn't need all that. A length check on src and dst when we hit an exception should suffice right, depending on the result either discard src or overwrite dst? Anyway I think your patch is fine to go as it is. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-7443.001.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)