[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906776#comment-13906776 ] Hadoop QA commented on HDFS-5939: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1262/HDFS-5939.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6190//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6190//console This message is automatically generated. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5970: - Priority: Minor (was: Major) callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906806#comment-13906806 ] Hadoop QA commented on HDFS-5982: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630004/HDFS-5982.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestEditLogRace {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6191//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6191//console This message is automatically generated. Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906813#comment-13906813 ] Junping Du commented on HDFS-5970: -- The plan sounds reasonable. I agree NPE happens here is still theoretically, so we can come back to this when real case happen. Move its priority to Minor but leave it open until we seriously check it won't happen. callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906859#comment-13906859 ] Hudson commented on HDFS-5961: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5961. OIV cannot load fsimages containing a symbolic link. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569789) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java OIV cannot load fsimages containing a symbolic link --- Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5961.patch In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906867#comment-13906867 ] Hudson commented on HDFS-5318: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5318. Support read-only and read-write paths to shared replicas. (Contributed by Eric Sirianni) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569951) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906865#comment-13906865 ] Hudson commented on HDFS-5742: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5742. DatanodeCluster (mini cluster of DNs) fails to start. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570067) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestInjectionForSimulatedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch, HDFS-5742.04.patch, HDFS-5742.05.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5979) Typo and logger fix for fsimage PB code
[ https://issues.apache.org/jira/browse/HDFS-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906862#comment-13906862 ] Hudson commented on HDFS-5979: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5979. Typo and logger fix for fsimage PB code. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570070) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java Typo and logger fix for fsimage PB code --- Key: HDFS-5979 URL: https://issues.apache.org/jira/browse/HDFS-5979 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: 2.4.0 Attachments: hdfs-5979-1.patch Found a typo and incorrect logger name in the fsimage PB code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906871#comment-13906871 ] Hudson commented on HDFS-5868: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5868. Make hsync implementation pluggable. (Contributed by Buddy Taylor) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569978) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, HDFS-5868b-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906869#comment-13906869 ] Hudson commented on HDFS-4685: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) Merge HDFS-4685 to trunk. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569870) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntry.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryScope.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryType.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclStatus.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsAction.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/AclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/FsCommand.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Ls.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestAcl.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestFsPermission.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestAclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemDelegation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AclException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclConfigFlag.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclStorage.java *
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906864#comment-13906864 ] Hudson commented on HDFS-5483: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5483. NN should gracefully handle multiple block replicas on same DN. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570040) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.4.0 Attachments: h5483.02.patch, h5483.03.patch, h5483.04.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906863#comment-13906863 ] Hudson commented on HDFS-5973: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #487 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/487/]) HDFS-5973. add DomainSocket#shutdown method. (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569950) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocket.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocket.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906964#comment-13906964 ] Hudson commented on HDFS-5961: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5961. OIV cannot load fsimages containing a symbolic link. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569789) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java OIV cannot load fsimages containing a symbolic link --- Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5961.patch In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906972#comment-13906972 ] Hudson commented on HDFS-5318: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5318. Support read-only and read-write paths to shared replicas. (Contributed by Eric Sirianni) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569951) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906969#comment-13906969 ] Hudson commented on HDFS-5483: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5483. NN should gracefully handle multiple block replicas on same DN. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570040) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.4.0 Attachments: h5483.02.patch, h5483.03.patch, h5483.04.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5979) Typo and logger fix for fsimage PB code
[ https://issues.apache.org/jira/browse/HDFS-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906967#comment-13906967 ] Hudson commented on HDFS-5979: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5979. Typo and logger fix for fsimage PB code. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570070) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java Typo and logger fix for fsimage PB code --- Key: HDFS-5979 URL: https://issues.apache.org/jira/browse/HDFS-5979 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: 2.4.0 Attachments: hdfs-5979-1.patch Found a typo and incorrect logger name in the fsimage PB code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906970#comment-13906970 ] Hudson commented on HDFS-5742: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5742. DatanodeCluster (mini cluster of DNs) fails to start. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570067) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestInjectionForSimulatedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch, HDFS-5742.04.patch, HDFS-5742.05.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906968#comment-13906968 ] Hudson commented on HDFS-5973: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5973. add DomainSocket#shutdown method. (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569950) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocket.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocket.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906974#comment-13906974 ] Hudson commented on HDFS-4685: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) Merge HDFS-4685 to trunk. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569870) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntry.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryScope.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryType.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclStatus.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsAction.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/AclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/FsCommand.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Ls.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestAcl.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestFsPermission.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestAclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemDelegation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AclException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclConfigFlag.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclStorage.java *
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906976#comment-13906976 ] Hudson commented on HDFS-5868: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1679/]) HDFS-5868. Make hsync implementation pluggable. (Contributed by Buddy Taylor) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569978) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, HDFS-5868b-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5498) Improve datanode startup time
[ https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906993#comment-13906993 ] Kihwal Lee commented on HDFS-5498: -- [~azuryy], the patch depends on HDFS-5583 and HDFS-5924. [~brandonli] is reviewing them. Improve datanode startup time - Key: HDFS-5498 URL: https://issues.apache.org/jira/browse/HDFS-5498 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5498.with_du_change.patch, HDFS-5498.with_du_change.patch, HDFS-5498_sh.patch Similarly to HDFS-5027, an improvement can be made for getVomeMap(). This is the phase in which ReplicaMap.is populated. But it will be even better if datanode scans only once and do both. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906995#comment-13906995 ] Kihwal Lee commented on HDFS-5535: -- [~azuryy] I believe remaining parts are mostly independent, so we can start testing and fixing problems now. Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, h5535_20140219.patch In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907015#comment-13907015 ] Hudson commented on HDFS-5961: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5961. OIV cannot load fsimages containing a symbolic link. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569789) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java OIV cannot load fsimages containing a symbolic link --- Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5961.patch In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5979) Typo and logger fix for fsimage PB code
[ https://issues.apache.org/jira/browse/HDFS-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907018#comment-13907018 ] Hudson commented on HDFS-5979: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5979. Typo and logger fix for fsimage PB code. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570070) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java Typo and logger fix for fsimage PB code --- Key: HDFS-5979 URL: https://issues.apache.org/jira/browse/HDFS-5979 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: 2.4.0 Attachments: hdfs-5979-1.patch Found a typo and incorrect logger name in the fsimage PB code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907027#comment-13907027 ] Hudson commented on HDFS-5868: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5868. Make hsync implementation pluggable. (Contributed by Buddy Taylor) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569978) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/ReplicaOutputStreams.java Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, HDFS-5868b-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907021#comment-13907021 ] Hudson commented on HDFS-5742: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5742. DatanodeCluster (mini cluster of DNs) fails to start. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570067) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DataNodeCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestInjectionForSimulatedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch, HDFS-5742.04.patch, HDFS-5742.05.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907025#comment-13907025 ] Hudson commented on HDFS-4685: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) Merge HDFS-4685 to trunk. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569870) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntry.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryScope.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryType.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclStatus.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsAction.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/AclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/FsCommand.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Ls.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestAcl.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestFsPermission.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestAclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemDelegation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AclException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclConfigFlag.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclStorage.java *
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907019#comment-13907019 ] Hudson commented on HDFS-5973: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5973. add DomainSocket#shutdown method. (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569950) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocket.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocket.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907020#comment-13907020 ] Hudson commented on HDFS-5483: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5483. NN should gracefully handle multiple block replicas on same DN. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570040) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.4.0 Attachments: h5483.02.patch, h5483.03.patch, h5483.04.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907023#comment-13907023 ] Hudson commented on HDFS-5318: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1704/]) HDFS-5318. Support read-only and read-write paths to shared replicas. (Contributed by Eric Sirianni) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569951) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestReadOnlySharedStorage.java Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5983: - Attachment: testlog.txt Attaching the log from the precommit build. NN starts with 0 block, then initializes the repl queue, then 3 blocks are reported by the datanode. The test makes assumption that by the time it acquires and releases the namesystem write lock, block report is processed completely and repl queues are initialized. This is not true. TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Attachments: testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907039#comment-13907039 ] Kihwal Lee commented on HDFS-5962: -- Indeed the test failure does not seem like related. I filed HDFS-5983. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch, HDFS-5962.5.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907101#comment-13907101 ] Kihwal Lee commented on HDFS-5962: -- +1 lgtm. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch, HDFS-5962.5.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907105#comment-13907105 ] Yongjun Zhang commented on HDFS-5970: - Thanks for making the change. Indeed it's minor. callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907131#comment-13907131 ] Hudson commented on HDFS-5962: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5195 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5195/]) HDFS-5962. Mtime and atime are not persisted for symbolic links. Contributed by Akira Ajisaka. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570252) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch, HDFS-5962.5.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5962: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2 and branch-2.4. Thanks for working on the fix, Akira. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch, HDFS-5962.5.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907235#comment-13907235 ] Chris Nauroth commented on HDFS-5982: - Nice find, Tassapol and Jing. The patch mostly looks good to me, after we fix the unit test failure. One question: {{unprotectedDelete}} formerly checked for {{deleteAllowed}}. Is that check no longer required? Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5982: Attachment: HDFS-5982.001.patch Thanks for the review Chris! Update the patch to fix the failed test. bq. {{unprotectedDelete}} formerly checked for {{deleteAllowed}}. Is that check no longer required You're right, we still need the check. This is also the cause of the failed unit test. Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5982: Attachment: HDFS-5982.001.patch Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, HDFS-5982.001.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907277#comment-13907277 ] Haohui Mai commented on HDFS-5939: -- {code} + LOG.error( Caught InvalidTopologyException ( + te + ) + + when trying to redirectURI namenode= + namenode.toString() + + path= + path + op= + op.toString() + + , suggest to examine the cluster health.); + throw new NoDatanodeException(No datanode found); {code} You can simply throw an IOException with the message. This is not an error condition thus I don't think it should log as error. The client will have sufficient information. You can fold the unit test into {{TestWebHDFS}}. Based on your description, you can start the cluster with zero datanode to reproduce the failure. {code} + UserGroupInformation.createUserForTesting(me, new String[]{my-group}) + .doAs(new PrivilegedExceptionActionVoid() { {code} Why {{doAs}} is required? WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5981: - Summary: PBImageXmlWriter generates malformed XML (was: PBImageXmlWriter closes SnapshotDiffSection incorrectly.) PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the tag {{SnapshotDiffSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5981: - Description: {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. (was: {{PBImageXmlWriter}} outputs malformed XML file because it closes the tag {{SnapshotDiffSection}} incorrectly.) PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907308#comment-13907308 ] Yongjun Zhang commented on HDFS-5939: - Thanks Haohui. I had one thought, I wonder why method RuntimeException.unwrapRemoteException only assume the wrapped exception class to be of subclasses of IOExceptoin. I kind of assumed there may be a good reason behind it yesterday. But thinking about it a bit more, why can't we also let it also deal with RuntimeException? I will give it a try later today too. Do you agree? About your first comment, first of all, In production, it's an error that user need to deal with I think. Secondly, I want to be able to find out exactly what's the reason of exception at the unit test side, and claim success if it's because of no datanode. If we throw an IOException, I can't be sure whether my unit test is successful or not. That's why I had the NoDatanodeException. Of course, I can try to see if there is no datanode string in the exception's message. But what about if there is other IOException thrown and it also has the no datanode message. I put the above comments for discussion purpose. I think if I can make InvalidTopologyException work, then it should be good. Would you please confirm if you agree that we can try to make RuntimeException.unwrapRemoteException to handle RuntimeException? I'm on something else now and I will address your second comment a bit later today. Thanks. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907320#comment-13907320 ] Akira AJISAKA commented on HDFS-5981: - Thank you for updating the patch! LGTM, +1. PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5981: Hadoop Flags: Reviewed PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5775) Consolidate the code for serialization in CacheManager
[ https://issues.apache.org/jira/browse/HDFS-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907341#comment-13907341 ] Hudson commented on HDFS-5775: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5197 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5197/]) Move HDFS-5768 and HDFS-5775 to Section 2.4.0 in CHANGES.txt (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570302) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Consolidate the code for serialization in CacheManager -- Key: HDFS-5775 URL: https://issues.apache.org/jira/browse/HDFS-5775 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5775.000.patch This jira proposes to consolidate the code that is responsible for serializing / deserializing cache manager state into a separate class, so that it is easier to introduce new code path to serialize the data using protobuf. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5982: Hadoop Flags: Reviewed +1 for the patch, pending Jenkins run with the new version. I confirmed locally that this version fixes the test failure. Thanks for the patch, Jing! Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, HDFS-5982.001.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907373#comment-13907373 ] Haohui Mai commented on HDFS-5939: -- bq. But thinking about it a bit more, why can't we also let it also deal with RuntimeException? I will give it a try later today too. Do you agree? No. This is a fundamental contract of the RPC layer which defines what kinds of exceptions can be transferred over the wire. At least I don't think it is a good idea to address it in this jira. bq. In production, it's an error that user need to deal with I think. Agree. The client explicitly says this is an error, which the user needs to deal with anyway. What I'm yet to be convinced is that why you're logging it into the server log. As a rule of thumb, you only log operations in the server side if it provides valuable information for debugging or auditing. Otherwise it does no good but confuses the operator. (e.g., HADOOP-10274) bq. Of course, I can try to see if there is no datanode string in the exception's message. But what about if there is other IOException thrown and it also has the no datanode message. The exception is actually a public interface from the client's prospective. We need to be very conservative on it. We can kind of get away from it by throwing an {{IOException}} with specific message. If the code throws a new type of exception we need to reason through whether it'll introduce backward compatibility issues in the future. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907393#comment-13907393 ] Masatake Iwasaki commented on HDFS-5274: Hi [~stack], thanks for your review comments. bq. Have you tried it outside of the unit tests to make sure you get sensible looking spans and numbers? I checked the trace of putting and getting a big file by Zipkin today. There seems to be too many spans concerning DFSInputStream.read and DFSOutputStream.write. I will fix this in the next version of patch. Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5865) Document 'FileDistribution' argument in 'hdfs oiv --processor' option
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned HDFS-5865: --- Assignee: Akira AJISAKA Document 'FileDistribution' argument in 'hdfs oiv --processor' option - Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907404#comment-13907404 ] Tsz Wo (Nicholas), SZE commented on HDFS-5496: -- Hi Vinay, there are some replication related tests failed in https://builds.apache.org/job/PreCommit-HDFS-Build/6189//testReport/ . Could you take a look? Make replication queue initialization asynchronous -- Key: HDFS-5496 URL: https://issues.apache.org/jira/browse/HDFS-5496 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Kihwal Lee Assignee: Vinayakumar B Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch Today, initialization of replication queues blocks safe mode exit and certain HA state transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem write lock held. During this time, important requests (e.g. initial block reports, heartbeat, etc) are blocked. The effect of delaying the initialization would be not starting replication right away, but I think the benefit outweighs. If we make it asynchronous, the work per iteration should be limited, so that the lock duration is capped. If full/incremental block reports and any other requests that modifies block state properly performs replication checks while the blocks are scanned and the queues populated in background, every block will be processed. (Some may be done twice) The replication monitor should run even before all blocks are processed. This will allow namenode to exit safe mode and start serving immediately even with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907446#comment-13907446 ] Chris Nauroth commented on HDFS-5981: - Thanks, Haohui. The patch looks good. Would it be possible to add a test that runs the XML processor on an fsimage containing snapshots, and then asserts that the output is well-formed XML? PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907451#comment-13907451 ] stack commented on HDFS-5776: - Thanks for the numbers [~xieliang007] I ran some loadings yesterday and little discernible overall difference in spite of my flushing file system cache with regularity (good news, no errors). Today I was going to try and set up measurement of the 99th-percentile, etc... but you did the work. Thanks. Hopefully the +1s still stand (If anything, this final patch is more conservative than the one that got the original +1s). I intend to commit this tomorrow unless objection. I will then backport to branch-2. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, HDFS-5776v18.txt, HDFS-5776v21.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5924) Utilize OOB upgrade message processing for writes
[ https://issues.apache.org/jira/browse/HDFS-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907471#comment-13907471 ] Brandon Li commented on HDFS-5924: -- {quote}This feature does not guarantee all client writes to continue across restart. {quote} Would it cause data loss? especially when the only one or more than one datanode in the pipeline is shutting down for upgrade. Utilize OOB upgrade message processing for writes - Key: HDFS-5924 URL: https://issues.apache.org/jira/browse/HDFS-5924 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5924_RBW_RECOVERY.patch, HDFS-5924_RBW_RECOVERY.patch After HDFS-5585 and HDFS-5583, clients and datanodes can coordinate shutdown-restart in order to minimize failures or locality loss. In this jira, HDFS client is made aware of the restart OOB ack and perform special write pipeline recovery. Datanode is also modified to load marked RBW replicas as RBW instead of RWR as long as the restart did not take long. For clients, it considers doing this kind of recovery only when there is only one node left in the pipeline or the restarting node is a local datanode. For both clients and datanodes, the timeout or expiration is configurable, meaning this feature can be turned off by setting timeout variables to 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907474#comment-13907474 ] Hadoop QA commented on HDFS-5982: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630107/HDFS-5982.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6192//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6192//console This message is automatically generated. Need to update snapshot manager when applying editlog for deleting a snapshottable directory Key: HDFS-5982 URL: https://issues.apache.org/jira/browse/HDFS-5982 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Jing Zhao Priority: Critical Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, HDFS-5982.001.patch Currently after deleting a snapshottable directory which does not have snapshots any more, we also remove the directory from the snapshottable directory list in SnapshotManager. This works fine when handling a delete request from user. However, when we apply the OP_DELETE editlog, FSDirectory#unprotectedDelete(String, long) is called, which does not contain the updating snapshot manager process. This may leave an non-existent inode id in the snapshottable directory list, and can even lead to FSImage corruption. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5924) Utilize OOB upgrade message processing for writes
[ https://issues.apache.org/jira/browse/HDFS-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907520#comment-13907520 ] Kihwal Lee commented on HDFS-5924: -- If the delivery of the OOB ack was not successful due to a network or hardware issue and there was only one replica in the pipeline, the write will fail. This is no worse than the current behavior. Data loss is typically referred to situations where data was successfully written, but a part or all of it becomes unavailable permanently. Here, it is different; the write simply fails. In short, OOB acking is used for the smoother upgrade process, but (1) this feature won't block shutdown indefinitely and (2) if an OOB ack not delivered, things will fall back to the existing non-upgrade behavior. Utilize OOB upgrade message processing for writes - Key: HDFS-5924 URL: https://issues.apache.org/jira/browse/HDFS-5924 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5924_RBW_RECOVERY.patch, HDFS-5924_RBW_RECOVERY.patch After HDFS-5585 and HDFS-5583, clients and datanodes can coordinate shutdown-restart in order to minimize failures or locality loss. In this jira, HDFS client is made aware of the restart OOB ack and perform special write pipeline recovery. Datanode is also modified to load marked RBW replicas as RBW instead of RWR as long as the restart did not take long. For clients, it considers doing this kind of recovery only when there is only one node left in the pipeline or the restarting node is a local datanode. For both clients and datanodes, the timeout or expiration is configurable, meaning this feature can be turned off by setting timeout variables to 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5984) Fix TestEditLog and TestStandbyCheckpoints
Jing Zhao created HDFS-5984: --- Summary: Fix TestEditLog and TestStandbyCheckpoints Key: HDFS-5984 URL: https://issues.apache.org/jira/browse/HDFS-5984 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5984.000.patch This jira aims to fix current test failures in TestEditLog, TestStandbyCheckpoints, and TestBookKeeperHACheckpoints. These failures are caused by changes in the NameNode side. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5984) Fix TestEditLog and TestStandbyCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5984: Attachment: HDFS-5984.000.patch A simple patch is attached. Fix TestEditLog and TestStandbyCheckpoints -- Key: HDFS-5984 URL: https://issues.apache.org/jira/browse/HDFS-5984 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5984.000.patch This jira aims to fix current test failures in TestEditLog, TestStandbyCheckpoints, and TestBookKeeperHACheckpoints. These failures are caused by changes in the NameNode side. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907526#comment-13907526 ] Kihwal Lee commented on HDFS-5983: -- This is related to HDFS-4001. The symptom seems different though. TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Attachments: testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907525#comment-13907525 ] Hadoop QA commented on HDFS-5981: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630112/HDFS-5981.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6193//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6193//console This message is automatically generated. PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, HDFS-5981.002.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907524#comment-13907524 ] Suresh Srinivas commented on HDFS-5274: --- Is there any plan on making HTrace libraries org.cloudera.htrace.* available in Hadoop common? Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5981: - Attachment: HDFS-5981.002.patch The v2 patch addresses the comments from Chris. PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, HDFS-5981.002.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Issue Comment Deleted] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out
[ https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4001: - Comment: was deleted (was: unsubscribe ) TestSafeMode#testInitializeReplQueuesEarly may time out --- Key: HDFS-4001 URL: https://issues.apache.org/jira/browse/HDFS-4001 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Attachments: timeout.txt.gz Saw this failure on a recent branch-2 jenkins run, has also been seen on trunk. {noformat} java.util.concurrent.TimeoutException: Timed out waiting for condition at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107) at org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Issue Comment Deleted] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out
[ https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4001: - Comment: was deleted (was: unsubscribe ) TestSafeMode#testInitializeReplQueuesEarly may time out --- Key: HDFS-4001 URL: https://issues.apache.org/jira/browse/HDFS-4001 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Attachments: timeout.txt.gz Saw this failure on a recent branch-2 jenkins run, has also been seen on trunk. {noformat} java.util.concurrent.TimeoutException: Timed out waiting for condition at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107) at org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5924) Utilize OOB upgrade message processing for writes
[ https://issues.apache.org/jira/browse/HDFS-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907520#comment-13907520 ] Kihwal Lee edited comment on HDFS-5924 at 2/20/14 9:27 PM: --- If the delivery of the OOB ack was not successful due to a network or hardware issue and there was only one replica in the pipeline, the write will fail. This is no worse than the current behavior. Data loss is typically referred to situations where data was successfully written, but a part or all of it becomes unavailable permanently. Here, it is different; the write simply fails. In short, OOB acking is used for the smoother upgrade process, but (1) this feature won't block shutdown indefinitely and (2) if an OOB ack is not delivered, things will fall back to the existing non-upgrade behavior. was (Author: kihwal): If the delivery of the OOB ack was not successful due to a network or hardware issue and there was only one replica in the pipeline, the write will fail. This is no worse than the current behavior. Data loss is typically referred to situations where data was successfully written, but a part or all of it becomes unavailable permanently. Here, it is different; the write simply fails. In short, OOB acking is used for the smoother upgrade process, but (1) this feature won't block shutdown indefinitely and (2) if an OOB ack not delivered, things will fall back to the existing non-upgrade behavior. Utilize OOB upgrade message processing for writes - Key: HDFS-5924 URL: https://issues.apache.org/jira/browse/HDFS-5924 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5924_RBW_RECOVERY.patch, HDFS-5924_RBW_RECOVERY.patch After HDFS-5585 and HDFS-5583, clients and datanodes can coordinate shutdown-restart in order to minimize failures or locality loss. In this jira, HDFS client is made aware of the restart OOB ack and perform special write pipeline recovery. Datanode is also modified to load marked RBW replicas as RBW instead of RWR as long as the restart did not take long. For clients, it considers doing this kind of recovery only when there is only one node left in the pipeline or the restarting node is a local datanode. For both clients and datanodes, the timeout or expiration is configurable, meaning this feature can be turned off by setting timeout variables to 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML
[ https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5981: - Attachment: HDFS-5981.003.patch The v3 patch creates an snapshot in the unit tests. It also removes a redundant println statement. PBImageXmlWriter generates malformed XML Key: HDFS-5981 URL: https://issues.apache.org/jira/browse/HDFS-5981 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, HDFS-5981.002.patch, HDFS-5981.003.patch {{PBImageXmlWriter}} outputs malformed XML file because it closes the {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} incorrectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages
[ https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907542#comment-13907542 ] Haohui Mai commented on HDFS-5935: -- The patch looks pretty good for me. One nit: {code} + if(jqxhr.responseJSON !== undefined) { +if(jqxhr.responseJSON.RemoteException !== undefined) { {code} I think you can combine the two if statement in one line. New Namenode UI FS browser should throw smarter error messages -- Key: HDFS-5935 URL: https://issues.apache.org/jira/browse/HDFS-5935 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch When browsing using the new FS browser in the namenode, if I try to browse a folder that I don't have permission to view, it throws the error: {noformat} Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: Forbidden WebHDFS might be disabled. WebHDFS is required to browse the filesystem. {noformat} The reason I'm not allowed to see /system is because I don't have permission, not because WebHDFS is disabled. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5984) Fix TestEditLog and TestStandbyCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5984: - Component/s: (was: ha) (was: hdfs-client) (was: datanode) Hadoop Flags: Reviewed +1 patch looks good. Fix TestEditLog and TestStandbyCheckpoints -- Key: HDFS-5984 URL: https://issues.apache.org/jira/browse/HDFS-5984 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5984.000.patch This jira aims to fix current test failures in TestEditLog, TestStandbyCheckpoints, and TestBookKeeperHACheckpoints. These failures are caused by changes in the NameNode side. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5984) Fix TestEditLog and TestStandbyCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5984. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) I have committed this. Thanks, Jing! Fix TestEditLog and TestStandbyCheckpoints -- Key: HDFS-5984 URL: https://issues.apache.org/jira/browse/HDFS-5984 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5984.000.patch This jira aims to fix current test failures in TestEditLog, TestStandbyCheckpoints, and TestBookKeeperHACheckpoints. These failures are caused by changes in the NameNode side. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5865) Document 'FileDistribution' argument in 'hdfs oiv --processor' option
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5865: Attachment: HDFS-5865.patch Attaching a patch. Document 'FileDistribution' argument in 'hdfs oiv --processor' option - Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: HDFS-5865.patch The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5865) Update OfflineImageViewer document
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5865: Summary: Update OfflineImageViewer document (was: Document 'FileDistribution' argument in 'hdfs oiv --processor' option) Update OfflineImageViewer document -- Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: HDFS-5865.patch The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
Jing Zhao created HDFS-5985: --- Summary: SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5985: Priority: Minor (was: Major) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5985: Attachment: HDFS-5985.000.patch SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907575#comment-13907575 ] Kihwal Lee commented on HDFS-5985: -- +1 lgtm SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907583#comment-13907583 ] Jing Zhao commented on HDFS-5985: - With the patch and the fix in HDFS-5984 all the failed unit tests can pass except TestOfflineEditsViewer. SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5865) Update OfflineImageViewer document
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5865: Target Version/s: 2.4.0 (was: 3.0.0) Affects Version/s: (was: 3.0.0) 2.4.0 Status: Patch Available (was: Open) Update OfflineImageViewer document -- Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-5865.patch OfflineImageViewer is renewed to handle the new format of fsimage by HDFS-5698 (fsimage in protobuf). We should document followings: * The tool can handle the layout version of Hadoop 2.4 and up. (If you want to handle the older version, you can use OfflineImageViewer of Hadoop 2.3) * Remove deprecated options such as Delimited and Indented processor. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5865) Update OfflineImageViewer document
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5865: Description: OfflineImageViewer is renewed to handle the new format of fsimage by HDFS-5698 (fsimage in protobuf). We should document followings: * The tool can handle the layout version of Hadoop 2.4 and up. (If you want to handle the older version, you can use OfflineImageViewer of Hadoop 2.3) * Remove deprecated options such as Delimited and Indented processor. was:The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. Priority: Major (was: Minor) Update OfflineImageViewer document -- Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-5865.patch OfflineImageViewer is renewed to handle the new format of fsimage by HDFS-5698 (fsimage in protobuf). We should document followings: * The tool can handle the layout version of Hadoop 2.4 and up. (If you want to handle the older version, you can use OfflineImageViewer of Hadoop 2.3) * Remove deprecated options such as Delimited and Indented processor. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907591#comment-13907591 ] Kihwal Lee commented on HDFS-5985: -- Committed to the HDFS-5535 branch. Thanks, Jing for fixing this. SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-5985. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) Hadoop Flags: Reviewed SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907595#comment-13907595 ] Suresh Srinivas commented on HDFS-5977: --- bq. For future upgrades, I think that this mechanism is no longer required. I think this mechanism may be required, as not everything can start with /.reserved. I think we should address this issue. We should also do another code review to ensure all the previous relevant fsimage related mechanism is available in protobuf based fsimage solution as well. FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5985) SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HDFS-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907602#comment-13907602 ] Arpit Agarwal commented on HDFS-5985: - Thanks for fixing this Jing. SimulatedFSDataset#disableAndPurgeTrashStorage should not throw UnsupportedOperationException - Key: HDFS-5985 URL: https://issues.apache.org/jira/browse/HDFS-5985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5985.000.patch Instead, we can let the method do nothing. This can fix part of the failed unit tests in https://issues.apache.org/jira/browse/HDFS-5535?focusedCommentId=13906717page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906717 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907609#comment-13907609 ] Yongjun Zhang commented on HDFS-5939: - Thanks Haohui. Good info. All looks good to me, except I have one question: The case reported in this bug is about no datanode is running, which is about unhealthy cluster and definitely need to catch operator's attention. So I think it makes sense to log a message in server log. Do you still think we don't need to log an error there? It could save the operator time to investigate the problem. What about make it a WARN instead of an error the server log? Thanks. --Yongjun WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI
Suresh Srinivas created HDFS-5986: - Summary: Capture the number of blocks pending deletion on namenode webUI Key: HDFS-5986 URL: https://issues.apache.org/jira/browse/HDFS-5986 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Suresh Srinivas When a directory that has large number of directories and files are deleted, the namespace deletes the corresponding inodes immediately. However it is hard to to know when the invalidated blocks are actually deleted on the datanodes, which could take a while. I propose adding on namenode webUI, along with under replicated blocks, the number of blocks that are pending deletion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5586) Add quick-restart option for datanode
[ https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-5586. -- Resolution: Duplicate Most of the planned changes will be covered after HDFS-5498. There are some missing, but I don't think it is critical at this point. To name a few for later reference, - Quick registration with NN. When NN get a registration request from a datanode that isn't dead (i.e. restart), the blocks on the node will be removed from the blocksmap and readded when the initial block report is received. If DN isn't going to change its content significantly and the identity (storage ID) stays the same, NN may be better off keeping the block list for the DN and update it few minutes later when the block report is received. - DN to persist more state so that it can start serving sooner. Even if a DN is up, it won't be able to serve clients before registering with NN, because it cannot verify the block token. Saving the shared secret is risky though. The quick DN registration change will lower the DN restart overhead on NN, but reasonably paced DN rolling upgrades should still be acceptable even without this. This will be more useful in the case where DNs are restarted en masse. So I will not call it a necessary improvement for rolling upgrades. Add quick-restart option for datanode - Key: HDFS-5586 URL: https://issues.apache.org/jira/browse/HDFS-5586 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee This feature, combined with the graceful shutdown feature, will enable data nodes to come back up and start serving quickly. This is likely a command line option for data node, which triggers it to look for saved state information in its local storage. If the information is present and reasonably up-to-date, data node would skip some of the startup steps. Ideally it should be able to do quick registration without requiring removal of all blocks from the date node descriptor on the name node and reconstructing it with the initial full block report. This implies that all RBW blocks are recorded during shutdown and on start-up they are not turned into RWR. Other than the quick registration, name node should treat the restart as if few heart beats were lost from the node. There should be no unexpected replica state changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907623#comment-13907623 ] Haohui Mai commented on HDFS-5977: -- Thanks [~andrew.wang] and [~sureshms] for the info. Let me resolve this jira as later and keep it around. We can reopen this jira if we need to add another reserved path. FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5944: - Summary: LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint (was: LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI
[ https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907627#comment-13907627 ] Kihwal Lee commented on HDFS-5986: -- The jmx on NN already has {{PendingDeletionBlocks}} and [~wheat9] made NN webUI render on the client-side using the jmx data, so it should be relatively a simple change. Is {{PendingDeletionBlocks}} what we want, or is it something else? Capture the number of blocks pending deletion on namenode webUI --- Key: HDFS-5986 URL: https://issues.apache.org/jira/browse/HDFS-5986 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas When a directory that has large number of directories and files are deleted, the namespace deletes the corresponding inodes immediately. However it is hard to to know when the invalidated blocks are actually deleted on the datanodes, which could take a while. I propose adding on namenode webUI, along with under replicated blocks, the number of blocks that are pending deletion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI
[ https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5986: - Issue Type: Improvement (was: Bug) Seems like a decent idea to me. We should expose this as a metric as well, if not also in the NN web UI. Capture the number of blocks pending deletion on namenode webUI --- Key: HDFS-5986 URL: https://issues.apache.org/jira/browse/HDFS-5986 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas When a directory that has large number of directories and files are deleted, the namespace deletes the corresponding inodes immediately. However it is hard to to know when the invalidated blocks are actually deleted on the datanodes, which could take a while. I propose adding on namenode webUI, along with under replicated blocks, the number of blocks that are pending deletion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch
Tsz Wo (Nicholas), SZE created HDFS-5987: Summary: Fix findbugs warnings in Rolling Upgrade branch Key: HDFS-5987 URL: https://issues.apache.org/jira/browse/HDFS-5987 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907615#comment-13907615 ] Haohui Mai commented on HDFS-5939: -- bq. The case reported in this bug is about no datanode is running, which is about unhealthy cluster and definitely need to catch operator's attention. So I think it makes sense to log a message in server log. Do you still think we don't need to log an error there? It could save the operator time to investigate the problem. Personally I think it is an overkill. Note that if this happens, it means that either (1) all datanodes are dead, or (2) there at least one block is missing (i.e., no datanodes can serve it) in HDFS. Both the web UI and the monitoring applications (e.g., Ambari / CDH) would catch it much earlier before the operator looks into the log. The log has little value since it cannot flag the error at the first place, and it provides sufficient information to reproduce the error (in this case only the client can reproduce it in a reliable way). WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch
[ https://issues.apache.org/jira/browse/HDFS-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5987: - Description: {noformat} RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.mkdirs() RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.renameTo(File) RV org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles() ignores exceptional return value of java.io.File.mkdirs() IS Inconsistent synchronization of org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of time NP Dereference of the result of readLine() without nullcheck in org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File) {noformat} Fix findbugs warnings in Rolling Upgrade branch --- Key: HDFS-5987 URL: https://issues.apache.org/jira/browse/HDFS-5987 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor {noformat} RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.mkdirs() RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.renameTo(File) RV org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles() ignores exceptional return value of java.io.File.mkdirs() ISInconsistent synchronization of org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of time NPDereference of the result of readLine() without nullcheck in org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI
[ https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907636#comment-13907636 ] Suresh Srinivas commented on HDFS-5986: --- Yes. It is the PendingDeletetionBlocksCount from invalidateBlocks. I like what @atm suggested as well. I do not think there is a metrics corresponding to this. Capture the number of blocks pending deletion on namenode webUI --- Key: HDFS-5986 URL: https://issues.apache.org/jira/browse/HDFS-5986 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas When a directory that has large number of directories and files are deleted, the namespace deletes the corresponding inodes immediately. However it is hard to to know when the invalidated blocks are actually deleted on the datanodes, which could take a while. I propose adding on namenode webUI, along with under replicated blocks, the number of blocks that are pending deletion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch
[ https://issues.apache.org/jira/browse/HDFS-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5987: - Attachment: h5987_20140220.patch h5987_20140220.patch: fixes the findbugs warning and adds more cases to TestRollingUpgrade.testRollback(). Fix findbugs warnings in Rolling Upgrade branch --- Key: HDFS-5987 URL: https://issues.apache.org/jira/browse/HDFS-5987 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5987_20140220.patch {noformat} RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.mkdirs() RV org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File) ignores exceptional return value of java.io.File.renameTo(File) RV org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles() ignores exceptional return value of java.io.File.mkdirs() ISInconsistent synchronization of org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of time NPDereference of the result of readLine() without nullcheck in org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-5977. -- Resolution: Later FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907640#comment-13907640 ] Yongjun Zhang commented on HDFS-5939: - HI Haohui. At least we got a report from the field that we need to provide better message so user can quickly tell what's going on. I wonder if a WARN instead of an ERROR is more acceptable? Thanks. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907644#comment-13907644 ] Brandon Li commented on HDFS-5944: -- I've committed the patch. LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5944: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5944: - Fix Version/s: 2.4.0 LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5951) Provide diagnosis information in the Web UI
[ https://issues.apache.org/jira/browse/HDFS-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907653#comment-13907653 ] Suresh Srinivas commented on HDFS-5951: --- I think the scope of this jira is probably misunderstood. The proposal is not to do away the monitoring systems. Frequently I see many issues that can be flagged from HDFS itself. To name a few: # Configuration issues #* Using /tmp for storage #* For a given size of the cluster getting ipc handler count wrong, number of datanode transceivers wrong, and ulimit for daemons wrong etc. #* JVM heap size misconfiguration for the size of the cluster and for the number of the objects etc. # Flag issues that need to be addressed, which sometimes is missed even with monitoring in place, where alerts are categorized incorrectly or were ignored. #* Checkpoints not happening (I know instances where missing this has resulted in startup times of clusters over 18 hours!) #* Growth in editlog size or editlog. #* Corruption in fsimage and editlog checkpointing silently ignored. Some of these are covered in best practices documents that vendors put out or in hadoop operations related tech talks. Some of them can be covered in this WebUI where issues described can be flagged, with information on why it needs to be addressed and how to address it. Provide diagnosis information in the Web UI --- Key: HDFS-5951 URL: https://issues.apache.org/jira/browse/HDFS-5951 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5951.000.patch, diagnosis-failure.png, diagnosis-succeed.png HDFS should provide operation statistics in its UI. it can go one step further by leveraging the information to diagnose common problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)