[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903896#comment-13903896 ] dan dan zheng commented on HDFS-5892: - Here's a patch which addresses the issue. The cause of the intermittent failure is that the test tries to set name services in the configuration when starting the federation, but MiniDFSTopology generates the services ids without considering the name services set in the configuration. So the BPOfferServices started are actually for ns1 and ns2, not the ones set during the test ("namesServerId1,namesServerId2"). Later on, the test refreshes the service using the id namesServerId2, which starts the service for the first time. Also, ns1 and ns2 are not in the refresh list anymore, they are stopped. The test fails when namesServerId2 is not completely started and tries to create file /gamma, which is the reason we see the failure is intermittent due to the race condition. Refer to current log for issue, 2014-02-13 22:14:02,489 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for nameservices: ns1,ns2 2014-02-13 22:14:02,491 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: ns1,ns2 2014-02-13 22:51:40,326 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: namesServerId2 2014-02-13 22:51:40,327 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for nameservices: ns1,ns2 After apply patch, MiniDFSTopology can get name service from configuration correctly, then BPOfferServices are started for correct nameservices. Correct one should be, 2014-02-13 22:14:02,489 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for nameservices: namesServerId1,namesServerId2 2014-02-13 22:14:02,491 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: namesServerId1,namesServerId2 2014-02-13 22:51:40,327 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for nameservices: namesServerId1 > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Attachments: > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dan dan zheng updated HDFS-5892: Attachment: HDFS-5892.patch > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5892.patch, > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java
[ https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903950#comment-13903950 ] Hudson commented on HDFS-5959: -- FAILURE: Integrated in Hadoop-Yarn-trunk #485 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/485/]) HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed by Akira Ajisaka. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java > Fix typo at section name in FSImageFormatProtobuf.java > -- > > Key: HDFS-5959 > URL: https://issues.apache.org/jira/browse/HDFS-5959 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 2.4.0 > > Attachments: HDFS-5959.patch > > > There's a typo "REFRENCE" > {code} > public enum SectionName { > NS_INFO("NS_INFO"), > STRING_TABLE("STRING_TABLE"), > INODE("INODE"), > INODE_REFRENCE("INODE_REFRENCE"), > SNAPSHOT("SNAPSHOT"), > {code} > should be "REFERENCE". -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java
[ https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903989#comment-13903989 ] Hudson commented on HDFS-5959: -- ABORTED: Integrated in Hadoop-Hdfs-trunk #1677 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1677/]) HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed by Akira Ajisaka. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java > Fix typo at section name in FSImageFormatProtobuf.java > -- > > Key: HDFS-5959 > URL: https://issues.apache.org/jira/browse/HDFS-5959 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 2.4.0 > > Attachments: HDFS-5959.patch > > > There's a typo "REFRENCE" > {code} > public enum SectionName { > NS_INFO("NS_INFO"), > STRING_TABLE("STRING_TABLE"), > INODE("INODE"), > INODE_REFRENCE("INODE_REFRENCE"), > SNAPSHOT("SNAPSHOT"), > {code} > should be "REFERENCE". -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5529) { Disk Fail } Can we shutdown the DN when it meet's disk failed condition
[ https://issues.apache.org/jira/browse/HDFS-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved HDFS-5529. Resolution: Duplicate Closing since it will be handled as part of the HDFS-2882 > { Disk Fail } Can we shutdown the DN when it meet's disk failed condition > - > > Key: HDFS-5529 > URL: https://issues.apache.org/jira/browse/HDFS-5529 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula > > Scenario : > > had configured the two dir's for the datanode > One dir is not having the permissions,Hence is throwing following exception > and getting NPE while sending the heartbeat.. > {noformat} > 2013-11-19 17:35:26,599 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool BP-994471486-10.18.40.21-1384754500555 (storage id > DS-1184111760-10.18.40.38-50010-1384862726499) service to > HOST-10-18-91-26/10.18.40.21:8020 > org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed > volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, > volume failures tolerated: 0 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:202) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:966) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:928) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:285) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) > at java.lang.Thread.run(Thread.java:662) > 2013-11-19 17:35:26,602 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Ending block pool service for: Block pool > BP-994471486-10.18.40.21-1384754500555 (storage id > DS-1184111760-10.18.40.38-50010-1384862726499) service to > HOST-10-18-91-26/10.18.40.21:8020 > 2013-11-19 17:35:26,602 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool BP-994471486-10.18.40.21-1384754500555 (storage id > DS-1184111760-10.18.40.38-50010-1384862726499) service to > linux-hadoop/10.18.40.14:8020 beginning handshake with NN > 2013-11-19 17:35:26,648 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool Block pool BP-994471486-10.18.40.21-1384754500555 (storage id > DS-1184111760-10.18.40.38-50010-1384862726499) service to > linux-hadoop/10.18.40.14:8020 successfully registered with NN > 2013-11-19 17:35:26,648 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > For namenode linux-hadoop/10.18.40.14:8020 using DELETEREPORT_INTERVAL of > 30 msec BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; > heartBeatInterval=3000 > 2013-11-19 17:35:26,649 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService > for Block pool BP-994471486-10.18.40.21-1384754500555 (storage id > DS-1184111760-10.18.40.38-50010-1384862726499) service to > linux-hadoop/10.18.40.14:8020 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:439) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > at java.lang.Thread.run(Thread.java:662) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5892: - Status: Patch Available (was: Open) > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5892.patch, > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904104#comment-13904104 ] Kihwal Lee commented on HDFS-5780: -- +1 > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java
[ https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904109#comment-13904109 ] Hudson commented on HDFS-5959: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1702 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1702/]) HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed by Akira Ajisaka. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java > Fix typo at section name in FSImageFormatProtobuf.java > -- > > Key: HDFS-5959 > URL: https://issues.apache.org/jira/browse/HDFS-5959 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 2.4.0 > > Attachments: HDFS-5959.patch > > > There's a typo "REFRENCE" > {code} > public enum SectionName { > NS_INFO("NS_INFO"), > STRING_TABLE("STRING_TABLE"), > INODE("INODE"), > INODE_REFRENCE("INODE_REFRENCE"), > SNAPSHOT("SNAPSHOT"), > {code} > should be "REFERENCE". -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5780: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks for working on the issue, Mit. Thanks for the review, Arpit. > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904121#comment-13904121 ] Kihwal Lee commented on HDFS-5225: -- [~lars_francke], what is the version of Hadoop you are using? > datanode keeps logging the same 'is no longer in the dataset' message over > and over again > - > > Key: HDFS-5225 > URL: https://issues.apache.org/jira/browse/HDFS-5225 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.1.1-beta >Reporter: Roman Shaposhnik >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, > HDFS-5225.2.patch > > > I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following > configuration: 4 nodes fully distributed cluster with security on. > All of a sudden my DN ate up all of the space in /var/log logging the > following message repeatedly: > {noformat} > 2013-09-18 20:51:12,046 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer > in the dataset > {noformat} > It wouldn't answer to a jstack and jstack -F ended up being useless. > Here's what I was able to find in the NameNode logs regarding this block ID: > {noformat} > fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log > 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: > /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. > BP-1884637155-10.224.158.152-1379524544853 > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} > 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.224.158.152:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.34.74.206:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.83.107.80:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,899 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, > newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, > 10.34.74.206:1004, 10.224.158.152:1004], > clientName=DFSClient_NONMAPREDUCE_-450304083_1) > 2013-09-18 18:03:17,904 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) > successfully to > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 > 2013-09-18 18:03:18,540 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, > newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, > 10.34.74.206:1004, 10.224.158.152:1004], > clientName=DFSClient_NONMAPREDUCE_-450304083_1) > 2013-09-18 18:03:18,548 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) > successfully to > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 > 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: > blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 > 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, > blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, > blk_1073742188_1368, blk_1073742189_1371] > 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904124#comment-13904124 ] Hadoop QA commented on HDFS-5892: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629497/HDFS-5892.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6167//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6167//console This message is automatically generated. > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5892.patch, > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904127#comment-13904127 ] Hudson commented on HDFS-5780: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5180 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5180/]) HDFS-5780. TestRBWBlockInvalidation times out intemittently. Contributed by Mit Desai. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569368) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904144#comment-13904144 ] Lars Francke commented on HDFS-5225: We're running CDH 4.5.0 which is using Hadoop 2.0. I see that a fix for this issue is in CDH 4.6 but that's not released yet. > datanode keeps logging the same 'is no longer in the dataset' message over > and over again > - > > Key: HDFS-5225 > URL: https://issues.apache.org/jira/browse/HDFS-5225 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.1.1-beta >Reporter: Roman Shaposhnik >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, > HDFS-5225.2.patch > > > I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following > configuration: 4 nodes fully distributed cluster with security on. > All of a sudden my DN ate up all of the space in /var/log logging the > following message repeatedly: > {noformat} > 2013-09-18 20:51:12,046 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer > in the dataset > {noformat} > It wouldn't answer to a jstack and jstack -F ended up being useless. > Here's what I was able to find in the NameNode logs regarding this block ID: > {noformat} > fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log > 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: > /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. > BP-1884637155-10.224.158.152-1379524544853 > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} > 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.224.158.152:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.34.74.206:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.83.107.80:1004 is added to > blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], > ReplicaUnderConstruction[10.34.74.206:1004|RBW], > ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 > 2013-09-18 18:03:17,899 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, > newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, > 10.34.74.206:1004, 10.224.158.152:1004], > clientName=DFSClient_NONMAPREDUCE_-450304083_1) > 2013-09-18 18:03:17,904 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) > successfully to > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 > 2013-09-18 18:03:18,540 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, > newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, > 10.34.74.206:1004, 10.224.158.152:1004], > clientName=DFSClient_NONMAPREDUCE_-450304083_1) > 2013-09-18 18:03:18,548 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) > successfully to > BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 > 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: > blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 > 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, > blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, > blk_1073742188_1368, blk_1073742189_1371] > 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange:
[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904157#comment-13904157 ] Kihwal Lee commented on HDFS-5803: -- I just wanted make sure that the test timeout is not due to regression in the core code. It looks like the trunk version has 3 extra test cases and {{testExitZeroOnSuccess}} accounts for most of the extra execution time. I did not see any sign of performance regression in the three common ones. +1 for the patch. > TestBalancer.testBalancer0 fails > > > Key: HDFS-5803 > URL: https://issues.apache.org/jira/browse/HDFS-5803 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Chen He > Attachments: HDFS-5803.patch > > > The test testBalancer0 fails on branch 2. Below is the stack trace > {noformat} > java.util.concurrent.TimeoutException: Cluster failed to reached expected > values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: > 280, expected: 300), in more than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5803: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks for working on this, Chen. > TestBalancer.testBalancer0 fails > > > Key: HDFS-5803 > URL: https://issues.apache.org/jira/browse/HDFS-5803 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Chen He > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5803.patch > > > The test testBalancer0 fails on branch 2. Below is the stack trace > {noformat} > java.util.concurrent.TimeoutException: Cluster failed to reached expected > values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: > 280, expected: 300), in more than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904167#comment-13904167 ] Hudson commented on HDFS-5803: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5182 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5182/]) HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569391) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > TestBalancer.testBalancer0 fails > > > Key: HDFS-5803 > URL: https://issues.apache.org/jira/browse/HDFS-5803 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Chen He > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5803.patch > > > The test testBalancer0 fails on branch 2. Below is the stack trace > {noformat} > java.util.concurrent.TimeoutException: Cluster failed to reached expected > values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: > 280, expected: 300), in more than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904244#comment-13904244 ] Arpit Agarwal commented on HDFS-5318: - @Eric Sirianni could you post a rebased patch? I reviewed this today and the changes look mostly fine. A couple of questions: # It looks like read-only storages don't get returned to clients for read. Is this intentional? # It would be nice to have an additional test to verify corrupt blocks on read-only storages don't get counted towards corrupt blocks. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, > HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904247#comment-13904247 ] Arpit Agarwal commented on HDFS-5318: - Tag [~sirianni] > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, > HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904249#comment-13904249 ] Benoy Antony commented on HDFS-5944: Good job in finding and fixing this bug, [~zhaoyunjiong]. Would there be multiple trailing "/" ? If so, removing the last one character may not be enough. > LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right > cause SecondaryNameNode failed do checkpoint > - > > Key: HDFS-5944 > URL: https://issues.apache.org/jira/browse/HDFS-5944 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.2.0, 2.2.0 >Reporter: zhaoyunjiong >Assignee: zhaoyunjiong > Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, > HDFS-5944.test.txt > > > In our cluster, we encountered error like this: > java.io.IOException: saveLeases found path > /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) > What happened: > Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. > And Client A continue refresh it's lease. > Client B deleted /XXX/20140206/04_30/ > Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write > Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log > Then secondaryNameNode try to do checkpoint and failed due to failed to > delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. > The reason is a bug in findLeaseWithPrefixPath: > int srclen = prefix.length(); > if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { > entries.put(entry.getKey(), entry.getValue()); > } > Here when prefix is /XXX/20140206/04_30/, and p is > /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. > The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904364#comment-13904364 ] Suresh Srinivas commented on HDFS-5958: --- [~kovyrin], can you please any logs you may have for this issue? > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904403#comment-13904403 ] Jing Zhao commented on HDFS-5893: - +1. I will commit the patch shortly. > HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory > which does not import SSL certificates > > > Key: HDFS-5893 > URL: https://issues.apache.org/jira/browse/HDFS-5893 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5893.000.patch > > > When {{HftpFileSystem}} tries to get the data, it create a > {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. > However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default > URLConnectionFactory. It does not import the SSL certificates from > ssl-client.xml. Therefore {{HsftpFileSystem}} fails. > To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the > same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option
[ https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904416#comment-13904416 ] Haohui Mai commented on HDFS-5956: -- The patch mostly looks good. Some minor comments: {code} +long maxFileSize = 0; +for (FileStatus fs : writtenFiles.values()) { + maxFileSize = Math.max(maxFileSize, fs.getLen()); +} {code} You can use {{Collections.max}} instead. nit: can you change the name of the test (i.e., {{testFileDistributionVisitor}}) in this patch as well? > A file size is multiplied by the replication factor in 'hdfs oiv -p > FileDistribution' option > > > Key: HDFS-5956 > URL: https://issues.apache.org/jira/browse/HDFS-5956 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: newbie > Attachments: HDFS-5956.patch > > > In FileDistributionCalculator.java, > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes() * f.getReplication(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize; > {code} > should be > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize * f.getReplication(); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5893: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2 and branch-2.4. > HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory > which does not import SSL certificates > > > Key: HDFS-5893 > URL: https://issues.apache.org/jira/browse/HDFS-5893 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Fix For: 2.4.0 > > Attachments: HDFS-5893.000.patch > > > When {{HftpFileSystem}} tries to get the data, it create a > {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. > However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default > URLConnectionFactory. It does not import the SSL certificates from > ssl-client.xml. Therefore {{HsftpFileSystem}} fails. > To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the > same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904429#comment-13904429 ] Hudson commented on HDFS-5893: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5184 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5184/]) HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates. Contributed by Haohui Mai. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569477) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java > HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory > which does not import SSL certificates > > > Key: HDFS-5893 > URL: https://issues.apache.org/jira/browse/HDFS-5893 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Fix For: 2.4.0 > > Attachments: HDFS-5893.000.patch > > > When {{HftpFileSystem}} tries to get the data, it create a > {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. > However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default > URLConnectionFactory. It does not import the SSL certificates from > ssl-client.xml. Therefore {{HsftpFileSystem}} fails. > To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the > same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904430#comment-13904430 ] Alexey Kovyrin commented on HDFS-5958: -- [~sureshms], here is a piece of my log from the balancer: https://gist.github.com/kovyrin/9077741/raw/a30429b213fc4a5faca40f96c54f01d52c60706e/gistfile1.txt Here is a screenshot with all the nodes in the cluster: http://snap.kovyrin.net/Hadoop_NameNode%C2%A0ops01.dal05.swiftype.net_8020-20140218-141308.jpg name to address map: {code} 10.84.56.2work01 10.60.120.8 work02 10.84.56.10 work03 10.84.56.12 logs01 10.80.72.204 backup01 {code} > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage
[ https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904434#comment-13904434 ] Haohui Mai commented on HDFS-5952: -- Is it okay to use the XML-based tool for debugging? Otherwise you'll end up with duplicating the code in {{PBImageXmlWriter}} to parse the fsimage. Note that the XML / delimited formats are intended to capture all internal details of the fsimage. I understand that the delimited format is more compact than the XML one. The delimited format does not include a schema thus it could be problematic when the format of fsimage changes. Unfortunately we changes the fsimage format quite often. :-( If you really want to output in delimited format, I think it might be easier to take the output of {{PBImageXmlWriter}} and to use SAX to convert the XML into the delimited format. It should work fairly efficiently. > Create a tool to run data analysis on the PB format fsimage > --- > > Key: HDFS-5952 > URL: https://issues.apache.org/jira/browse/HDFS-5952 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > > Delimited processor in OfflineImageViewer is not supported after HDFS-5698 > was merged. > The motivation of delimited processor is to run data analysis on the fsimage, > therefore, there might be more values to create a tool for Hive or Pig that > reads the PB format fsimage directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904460#comment-13904460 ] Haohui Mai commented on HDFS-5939: -- {code} @@ -712,6 +712,9 @@ private Node chooseRandom(String scope, String excludedScope){ numOfDatanodes -= ((InnerNode)node).getNumOfLeaves(); } } +if (numOfDatanodes == 0) { + return null; +} int leaveIndex = r.nextInt(numOfDatanodes); return innerNode.getLeaf(leaveIndex, node); } {code} This changes affect a couple downstream callers. For example, {{BlockPlacementByDefault}}. I think we need to file a separate jira for this change so that the callers are aware of the fact that the function can return {{null}}. > WebHdfs returns misleading error code and logs nothing if trying to create a > file with no DNs in cluster > > > Key: HDFS-5939 > URL: https://issues.apache.org/jira/browse/HDFS-5939 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-5939.001.patch > > > When trying to access hdfs via webhdfs, and when datanode is dead, user will > see an exception below without any clue that it's caused by dead datanode: > $ curl -i -X PUT > ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false" > ... > {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n > must be positive"}} > Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage
[ https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5945: Attachment: HDFS-5945.protobuf.patch Make a small change to the latest patch to use protobuf based fsimage. > Add rolling upgrade information to fsimage > -- > > Key: HDFS-5945 > URL: https://issues.apache.org/jira/browse/HDFS-5945 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, > h5945_20140214.patch, h5945_20140216.patch > > > When rolling upgrade is in progress, the standby namenode may create > checkpoint. The rolling upgrade information should be added to fsimage in > order to support namenode restart and continue rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option
[ https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5956: Attachment: HDFS-5956.2.patch Thanks for your review, [~wheat9]. Attaching a patch to reflect your comments. > A file size is multiplied by the replication factor in 'hdfs oiv -p > FileDistribution' option > > > Key: HDFS-5956 > URL: https://issues.apache.org/jira/browse/HDFS-5956 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: newbie > Attachments: HDFS-5956.2.patch, HDFS-5956.patch > > > In FileDistributionCalculator.java, > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes() * f.getReplication(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize; > {code} > should be > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize * f.getReplication(); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option
[ https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5956: Target Version/s: 2.4.0 Affects Version/s: 2.4.0 > A file size is multiplied by the replication factor in 'hdfs oiv -p > FileDistribution' option > > > Key: HDFS-5956 > URL: https://issues.apache.org/jira/browse/HDFS-5956 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0, 2.4.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: newbie > Attachments: HDFS-5956.2.patch, HDFS-5956.patch > > > In FileDistributionCalculator.java, > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes() * f.getReplication(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize; > {code} > should be > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize * f.getReplication(); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904492#comment-13904492 ] Eric Sirianni commented on HDFS-5318: - bq. It looks like read-only storages don't get returned to clients for read. Is this intentional? Can you elaborate? As far as I can see read-only storages _are_ returned to clients for read. Also, the {{TestReadOnlySharedStorage}} JUnit validates that {{client.getLocatedBlocks()}} returns the read-only locations in addition to the normal ones. bq. It would be nice to have an additional test to verify corrupt blocks on read-only storages don't get counted towards corrupt blocks. I will look into adding this test case to {{TestReadOnlySharedStorage}}. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, > HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage
[ https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5945: - Hadoop Flags: Reviewed +1 the protobuf change looks good. > Add rolling upgrade information to fsimage > -- > > Key: HDFS-5945 > URL: https://issues.apache.org/jira/browse/HDFS-5945 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, > h5945_20140214.patch, h5945_20140216.patch > > > When rolling upgrade is in progress, the standby namenode may create > checkpoint. The rolling upgrade information should be added to fsimage in > order to support namenode restart and continue rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage
[ https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5945: - I have committed this. > Add rolling upgrade information to fsimage > -- > > Key: HDFS-5945 > URL: https://issues.apache.org/jira/browse/HDFS-5945 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, > h5945_20140214.patch, h5945_20140216.patch > > > When rolling upgrade is in progress, the standby namenode may create > checkpoint. The rolling upgrade information should be added to fsimage in > order to support namenode restart and continue rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5945) Add rolling upgrade information to fsimage
[ https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5945. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) > Add rolling upgrade information to fsimage > -- > > Key: HDFS-5945 > URL: https://issues.apache.org/jira/browse/HDFS-5945 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, > h5945_20140214.patch, h5945_20140216.patch > > > When rolling upgrade is in progress, the standby namenode may create > checkpoint. The rolling upgrade information should be added to fsimage in > order to support namenode restart and continue rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5905) Upgrade and rolling upgrade should not be allowed simultaneously
[ https://issues.apache.org/jira/browse/HDFS-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5905. -- Resolution: Duplicate This was fixed by HDFS-5945. > Upgrade and rolling upgrade should not be allowed simultaneously > > > Key: HDFS-5905 > URL: https://issues.apache.org/jira/browse/HDFS-5905 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > The existing upgrade/finalize mechanism and the new rolling upgrade mechanism > are two distinct features for upgrading the HDFS software. They cannot be > executed simultaneously. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5778) Document new commands and parameters for improved rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-5778: Assignee: Tsz Wo (Nicholas), SZE > Document new commands and parameters for improved rolling upgrades > -- > > Key: HDFS-5778 > URL: https://issues.apache.org/jira/browse/HDFS-5778 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: HDFS-5535 (Rolling upgrades) >Reporter: Akira AJISAKA >Assignee: Tsz Wo (Nicholas), SZE > > "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and > some other commands and parameters will be added in the future. This issue > exists to flag undocumented commands and parameters when HDFS-5535 branch is > merging to trunk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904568#comment-13904568 ] Arpit Agarwal commented on HDFS-5889: - This patch seems to have broken the rolling upgrade tests. The new edit log ops {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} trigger a {{RollingUpgradeException}} during NN restart. I think the fix should be to invoke {{startRollingUpgrade/finalizeRollingUpgrade}} (and write to editLog only when invoked via RPC). I filed HDFS-5960. > When rolling upgrade is in progress, standby NN should create checkpoint for > downgrade. > --- > > Key: HDFS-5889 > URL: https://issues.apache.org/jira/browse/HDFS-5889 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: h5889_20140211.patch, h5889_20140212b.patch, > h5889_20140212c.patch, h5889_20140213.patch > > > After rolling upgrade is started and checkpoint is disabled, the edit log may > grow to a huge size. It is not a problem if rolling upgrade is finalized > normally since NN keeps the current state in memory and it writes a new > checkpoint during finalize. However, it is a problem if admin decides to > downgrade. It could take a long time to apply edit log. Rollback does not > have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5960) Fix TestRollingUpgrade
Arpit Agarwal created HDFS-5960: --- Summary: Fix TestRollingUpgrade Key: HDFS-5960 URL: https://issues.apache.org/jira/browse/HDFS-5960 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: HDFS-5535 (Rolling upgrades) {{TestRollingUpgrade}} fails when restarting the NN because {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not expected. The fix is to start/finalize rolling upgrade when the corresponding edit log op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5960 started by Arpit Agarwal. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5318: Attachment: HDFS-5318-trunkb.patch Updated patch based on Arpit's feedback. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, > HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5960: Attachment: HDFS-5960.01.patch Patch to process {{start/finalizeRollingUpgrade}} from the corresponding edit log operations. Also update edit log only when the operations are initiated via RPC. Verified this fixes {{TestRollingUpgrade}}. {{TestEditLogUpgradeMarker}} still needs to be updated. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904631#comment-13904631 ] Jing Zhao commented on HDFS-5920: - I've committed this. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, > HDFS-5920.001.patch, HDFS-5920.002.patch, HDFS-5920.003.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. > Currently the proposed rollback for rolling upgrade is: > 1. Shutdown both NN > 2. Start one of the NN using "-rollingUpgrade rollback" option > 3. This NN will load the special fsimage right before the upgrade marker, > then discard all the editlog segments after the txid of the fsimage > 4. The NN will also send RPC requests to all the JNs to discard editlog > segments. This call expects response from all the JNs. The NN will keep > running if the call succeeds. > 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade > rollback" -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5920. - Resolution: Fixed Hadoop Flags: Reviewed > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, > HDFS-5920.001.patch, HDFS-5920.002.patch, HDFS-5920.003.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. > Currently the proposed rollback for rolling upgrade is: > 1. Shutdown both NN > 2. Start one of the NN using "-rollingUpgrade rollback" option > 3. This NN will load the special fsimage right before the upgrade marker, > then discard all the editlog segments after the txid of the fsimage > 4. The NN will also send RPC requests to all the JNs to discard editlog > segments. This call expects response from all the JNs. The NN will keep > running if the call succeeds. > 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade > rollback" -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904644#comment-13904644 ] Tsz Wo (Nicholas), SZE commented on HDFS-5960: -- Arpit, thanks for fixing the test! For the patch, let's refactor the startRollingUpgrade to startRollingUpgrade and startRollingUpgradeInternal (like startFile and startFileInternal) for rpc and edit log processing. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904679#comment-13904679 ] Arpit Agarwal commented on HDFS-5318: - You're right. +1 pending Jenkins. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, > HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option
[ https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904686#comment-13904686 ] Hadoop QA commented on HDFS-5956: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629608/HDFS-5956.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6168//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6168//console This message is automatically generated. > A file size is multiplied by the replication factor in 'hdfs oiv -p > FileDistribution' option > > > Key: HDFS-5956 > URL: https://issues.apache.org/jira/browse/HDFS-5956 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0, 2.4.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: newbie > Attachments: HDFS-5956.2.patch, HDFS-5956.patch > > > In FileDistributionCalculator.java, > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes() * f.getReplication(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize; > {code} > should be > {code} > long fileSize = 0; > for (BlockProto b : f.getBlocksList()) { > fileSize += b.getNumBytes(); > } > maxFileSize = Math.max(fileSize, maxFileSize); > totalSpace += fileSize * f.getReplication(); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904698#comment-13904698 ] Brandon Li commented on HDFS-5583: -- Sure. I will review it. > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904701#comment-13904701 ] Kihwal Lee commented on HDFS-5961: -- I have verified that adding {{processPermission()}} to the symlink INode loading fixes the issue. > OIV cannot load fsimages containing a symbolic link > --- > > Key: HDFS-5961 > URL: https://issues.apache.org/jira/browse/HDFS-5961 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > > In {{ImageLoaderCurrent#processINode}}, the permission is not read for > symlink INodes. So after incorrectly reading in the first symbolic link , the > next INode can't be read. > HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
Kihwal Lee created HDFS-5961: Summary: OIV cannot load fsimages containing a symbolic link Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5961: - Attachment: HDFS-5961.patch > OIV cannot load fsimages containing a symbolic link > --- > > Key: HDFS-5961 > URL: https://issues.apache.org/jira/browse/HDFS-5961 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-5961.patch > > > In {{ImageLoaderCurrent#processINode}}, the permission is not read for > symlink INodes. So after incorrectly reading in the first symbolic link , the > next INode can't be read. > HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5961: - Status: Patch Available (was: Open) > OIV cannot load fsimages containing a symbolic link > --- > > Key: HDFS-5961 > URL: https://issues.apache.org/jira/browse/HDFS-5961 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-5961.patch > > > In {{ImageLoaderCurrent#processINode}}, the permission is not read for > symlink INodes. So after incorrectly reading in the first symbolic link , the > next INode can't be read. > HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5962) Mtime and atime are not persisted for symbolic links
Kihwal Lee created HDFS-5962: Summary: Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime is not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5962: - Summary: Mtime is not persisted for symbolic links (was: Mtime and atime are not persisted for symbolic links) > Mtime is not persisted for symbolic links > - > > Key: HDFS-5962 > URL: https://issues.apache.org/jira/browse/HDFS-5962 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > > In {{FSImageSerialization}}, the mtime and atime of symbolic links are > hardcoded to be 0 when saving to fsimage, even though they are recorded in > memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime is not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5962: - Description: In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 when saving to fsimage, even though it is recorded in memory and shown in the listing until restarting namenode. (was: In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode.) > Mtime is not persisted for symbolic links > - > > Key: HDFS-5962 > URL: https://issues.apache.org/jira/browse/HDFS-5962 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > > In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 > when saving to fsimage, even though it is recorded in memory and shown in the > listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904759#comment-13904759 ] Arpit Agarwal commented on HDFS-5960: - Unfortunately the branch is being actively changed while broken so that nature of the failure seems to have changed since the last patch. I think we need to hold off on checkins till the branch is fixed. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5898: -- Attachment: HDFS-5898-with-documentation.patch Added documentation. This now won't require a separate doc patch. > Allow NFS gateway to login/relogin from its kerberos keytab > --- > > Key: HDFS-5898 > URL: https://issues.apache.org/jira/browse/HDFS-5898 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 2.2.0, 2.4.0 >Reporter: Jing Zhao >Assignee: Abin Shahab > Attachments: HDFS-5898-documentation.patch, > HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, > HDFS-5898.patch, HDFS-5898.patch, HDFS-5898.patch > > > According to the discussion in HDFS-5804: > 1. The NFS gateway should be able to get it's own tgts, and renew them. > 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5960: Attachment: HDFS-5960.02.patch Thanks for taking a look Nicholas. Updated patch with your feedback. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.
[ https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904784#comment-13904784 ] Haohui Mai commented on HDFS-5796: -- HDFS-5716 allows pluggable authentication mechanism in WebHDFS which provides a solution to this problem. Is it okay to mark this bug as a duplicate of HDFS-5716? > The file system browser in the namenode UI requires SPNEGO. > --- > > Key: HDFS-5796 > URL: https://issues.apache.org/jira/browse/HDFS-5796 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Kihwal Lee >Assignee: Haohui Mai >Priority: Blocker > > After HDFS-5382, the browser makes webhdfs REST calls directly, requiring > SPNEGO to work between user's browser and namenode. This won't work if the > cluster's security infrastructure is isolated from the regular network. > Moreover, SPNEGO is not supposed to be required for user-facing web pages. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5960: Summary: Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands (was: Fix TestRollingUpgrade) > Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands > - > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
Arpit Agarwal created HDFS-5963: --- Summary: TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail Key: HDFS-5963 URL: https://issues.apache.org/jira/browse/HDFS-5963 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test but I did not spend much time on it. Commenting out this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
Arpit Agarwal created HDFS-5964: --- Summary: TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail Key: HDFS-5964 URL: https://issues.apache.org/jira/browse/HDFS-5964 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test but I did not spend much time on it. Commenting out this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5963: Description: {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test. Commenting out this test case makes other tests in the same class pass. (was: {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test but I did not spend much time on it. Commenting out this test case makes other tests in the same class pass.) > TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail > > > Key: HDFS-5963 > URL: https://issues.apache.org/jira/browse/HDFS-5963 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: HDFS-5535 (Rolling upgrades) >Reporter: Arpit Agarwal > > {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. > It seems to be caused by the terminate hook used by the test. Commenting out > this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5964: Issue Type: Bug (was: Sub-task) Parent: (was: HDFS-5535) > TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail > > > Key: HDFS-5964 > URL: https://issues.apache.org/jira/browse/HDFS-5964 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: HDFS-5535 (Rolling upgrades) >Reporter: Arpit Agarwal > > {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. > It seems to be caused by the terminate hook used by the test but I did not > spend much time on it. Commenting out this test case makes other tests in the > same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5964. - Resolution: Duplicate Dup of HDFS-5963. > TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail > > > Key: HDFS-5964 > URL: https://issues.apache.org/jira/browse/HDFS-5964 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: HDFS-5535 (Rolling upgrades) >Reporter: Arpit Agarwal > > {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. > It seems to be caused by the terminate hook used by the test but I did not > spend much time on it. Commenting out this test case makes other tests in the > same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904844#comment-13904844 ] Hadoop QA commented on HDFS-5318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629623/HDFS-5318-trunkb.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6169//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6169//console This message is automatically generated. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, > HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, > HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5962) Mtime is not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned HDFS-5962: --- Assignee: Akira AJISAKA > Mtime is not persisted for symbolic links > - > > Key: HDFS-5962 > URL: https://issues.apache.org/jira/browse/HDFS-5962 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Akira AJISAKA >Priority: Critical > > In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 > when saving to fsimage, even though it is recorded in memory and shown in the > listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904870#comment-13904870 ] Tsz Wo (Nicholas), SZE commented on HDFS-5960: -- Hi Arpit, does testDFSAdminRollingUpgradeCommands fail in your machine? I just have tried and it did not fail. > Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands > - > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value
Yongjun Zhang created HDFS-5965: --- Summary: caller of NetworkTopology's chooseRandom method to be expect null return value Key: HDFS-5965 URL: https://issues.apache.org/jira/browse/HDFS-5965 Project: Hadoop HDFS Issue Type: Bug Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Caller of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5968) Fix rollback of rolling upgrade in NameNode HA setup
Jing Zhao created HDFS-5968: --- Summary: Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5968 URL: https://issues.apache.org/jira/browse/HDFS-5968 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value
Yongjun Zhang created HDFS-5967: --- Summary: caller of NetworkTopology's chooseRandom method to be expect null return value Key: HDFS-5967 URL: https://issues.apache.org/jira/browse/HDFS-5967 Project: Hadoop HDFS Issue Type: Bug Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Caller of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
Jing Zhao created HDFS-5966: --- Summary: Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5966 URL: https://issues.apache.org/jira/browse/HDFS-5966 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5966: Attachment: HDFS-5966.000.patch > Fix rollback of rolling upgrade in NameNode HA setup > > > Key: HDFS-5966 > URL: https://issues.apache.org/jira/browse/HDFS-5966 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5966.000.patch > > > This jira does the following: > 1. When do rollback for rolling upgrade, we should call > FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade > in HA setup). > 2. After the rollback, we also need to rename the md5 file and change its > reference file name. > 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5969) caller of NetworkTopology's chooseRandom method to be expect null return value
Yongjun Zhang created HDFS-5969: --- Summary: caller of NetworkTopology's chooseRandom method to be expect null return value Key: HDFS-5969 URL: https://issues.apache.org/jira/browse/HDFS-5969 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Yongjun Zhang Priority: Minor Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Caller of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5966: Attachment: (was: HDFS-5966.000.patch) > Fix rollback of rolling upgrade in NameNode HA setup > > > Key: HDFS-5966 > URL: https://issues.apache.org/jira/browse/HDFS-5966 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > > This jira does the following: > 1. When do rollback for rolling upgrade, we should call > FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade > in HA setup). > 2. After the rollback, we also need to rename the md5 file and change its > reference file name. > 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904891#comment-13904891 ] Tsz Wo (Nicholas), SZE commented on HDFS-5958: -- The balancing policy assumes that there are enough blocks for moving around. In your case, it may be impossible to satisfy the percentage threshold requirement for the large the datanode since it remains under utilized even if it has a replica for all the blocks. > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5953: --- Summary: TestBlockReaderFactory fails if libhadoop.so has not been built (was: TestBlockReaderFactory fails in trunk) > TestBlockReaderFactory fails if libhadoop.so has not been built > --- > > Key: HDFS-5953 > URL: https://issues.apache.org/jira/browse/HDFS-5953 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Assignee: Akira AJISAKA > Fix For: 2.4.0 > > Attachments: HDFS-5953.patch > > > From > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ > : > {code} > java.lang.RuntimeException: Although a UNIX domain socket path is configured > as > /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, > we cannot start a localDataXceiverServer because libhadoop cannot be loaded. > at > org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) > at > org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) > {code} > This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
Yongjun Zhang created HDFS-5970: --- Summary: callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 3.0.0 Reporter: Yongjun Zhang Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5968) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5968. - Resolution: Duplicate Created the same jira twice because of some network issue. > Fix rollback of rolling upgrade in NameNode HA setup > > > Key: HDFS-5968 > URL: https://issues.apache.org/jira/browse/HDFS-5968 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > > This jira does the following: > 1. When do rollback for rolling upgrade, we should call > FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade > in HA setup). > 2. After the rollback, we also need to rename the md5 file and change its > reference file name. > 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5971) callers of NetworkTopology's chooseRandom method to expect null return value
Yongjun Zhang created HDFS-5971: --- Summary: callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5971 URL: https://issues.apache.org/jira/browse/HDFS-5971 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0 Reporter: Yongjun Zhang Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904905#comment-13904905 ] Tsz Wo (Nicholas), SZE commented on HDFS-5958: -- I think we might need a new balancing policy for such special cases. > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5966: Attachment: HDFS-5966.000.patch > Fix rollback of rolling upgrade in NameNode HA setup > > > Key: HDFS-5966 > URL: https://issues.apache.org/jira/browse/HDFS-5966 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5966.000.patch > > > This jira does the following: > 1. When do rollback for rolling upgrade, we should call > FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade > in HA setup). > 2. After the rollback, we also need to rename the md5 file and change its > reference file name. > 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904901#comment-13904901 ] Tsz Wo (Nicholas), SZE commented on HDFS-5960: -- > Unfortunately the branch is being actively changed while broken so that ... Sorry about that. I indeed plan to fix the tests after the feature implementation is complete. (This is also a reason that we create the feature branch. BTW, the feature in the NN side is complete now and I am also fixing the tests.) It is hard (and unnecessary) to keep the tests passing while the feature is incomplete. > Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands > - > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904904#comment-13904904 ] Alexey Kovyrin commented on HDFS-5958: -- I understand perfectly well why it is happening. I've reported the issue to make sure it will be fixed and other users wouldn't need to spend hours pulling their hair out trying to figure out what is going on with their balance processes hanging forever, promising to move data around and not doing it. > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5972) callers of NetworkTopology's chooseRandom method to expect null return value
Yongjun Zhang created HDFS-5972: --- Summary: callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5972 URL: https://issues.apache.org/jira/browse/HDFS-5972 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5960: Summary: Fix TestRollingUpgrade (was: Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands) > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904907#comment-13904907 ] Hadoop QA commented on HDFS-5961: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629658/HDFS-5961.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6170//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6170//console This message is automatically generated. > OIV cannot load fsimages containing a symbolic link > --- > > Key: HDFS-5961 > URL: https://issues.apache.org/jira/browse/HDFS-5961 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-5961.patch > > > In {{ImageLoaderCurrent#processINode}}, the permission is not read for > symlink INodes. So after incorrectly reading in the first symbolic link , the > next INode can't be read. > HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904908#comment-13904908 ] Alexey Kovyrin commented on HDFS-5958: -- Why not fix the default ones? Current behavior is clearly is bug, the balancer lies to a user's face by promising to move data around only to *silently* fail to do it and make another promise it could not keep. > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904909#comment-13904909 ] Arpit Agarwal commented on HDFS-5960: - Thanks Nicholas, you are right that {{testDFSAdminRollingUpgradeCommands}} no longer fails, I've re-edited the title. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space
[ https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904914#comment-13904914 ] Akira AJISAKA commented on HDFS-3570: - Thank you for verifying, [~ash211]! [~qwertymaniac], would you please review the patch? > Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used > space > > > Key: HDFS-3570 > URL: https://issues.apache.org/jira/browse/HDFS-3570 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Assignee: Akira AJISAKA >Priority: Minor > Attachments: HDFS-3570.2.patch, HDFS-3570.aash.1.patch > > > Report from a user here: > https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ, > post archived at http://pastebin.com/eVFkk0A0 > This user had a specific DN that had a large non-DFS usage among > dfs.data.dirs, and very little DFS usage (which is computed against total > possible capacity). > Balancer apparently only looks at the usage, and ignores to consider that > non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a > DFS Usage report from DN is 8% only, its got a lot of free space to write > more blocks, when that isn't true as shown by the case of this user. It went > on scheduling writes to the DN to balance it out, but the DN simply can't > accept any more blocks as a result of its disks' state. > I think it would be better if we _computed_ the actual utilization based on > {{(100-(actual remaining space))/(capacity)}}, as opposed to the current > {{(dfs used)/(capacity)}}. Thoughts? > This isn't very critical, however, cause it is very rare to see DN space > being used for non DN data, but it does expose a valid bug. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HDFS-5965. - Resolution: Duplicate > caller of NetworkTopology's chooseRandom method to be expect null return value > -- > > Key: HDFS-5965 > URL: https://issues.apache.org/jira/browse/HDFS-5965 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Minor > > Class NetworkTopology's method >public Node chooseRandom(String scope) > calls >private Node chooseRandom(String scope, String excludedScope) > which may return null value. > Caller of this method such as BlockPlacementPolicyDefault etc need to be > aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904920#comment-13904920 ] Colin Patrick McCabe commented on HDFS-5953: Thanks, guys. I added {{Drequire.test.libhadoop}} to the nightly build, to ensure we catch failures to build libhadoop.so > TestBlockReaderFactory fails if libhadoop.so has not been built > --- > > Key: HDFS-5953 > URL: https://issues.apache.org/jira/browse/HDFS-5953 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Assignee: Akira AJISAKA > Fix For: 2.4.0 > > Attachments: HDFS-5953.patch > > > From > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ > : > {code} > java.lang.RuntimeException: Although a UNIX domain socket path is configured > as > /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, > we cannot start a localDataXceiverServer because libhadoop cannot be loaded. > at > org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) > at > org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) > {code} > This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904919#comment-13904919 ] Jing Zhao commented on HDFS-5961: - +1 the patch looks good to me. Thanks for the fix [~kihwal]! > OIV cannot load fsimages containing a symbolic link > --- > > Key: HDFS-5961 > URL: https://issues.apache.org/jira/browse/HDFS-5961 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-5961.patch > > > In {{ImageLoaderCurrent#processINode}}, the permission is not read for > symlink INodes. So after incorrectly reading in the first symbolic link , the > next INode can't be read. > HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904924#comment-13904924 ] Colin Patrick McCabe commented on HDFS-5953: Thanks, guys. I added {{Drequire.test.libhadoop}} to the nightly build, to ensure we catch failures to build libhadoop.so > TestBlockReaderFactory fails if libhadoop.so has not been built > --- > > Key: HDFS-5953 > URL: https://issues.apache.org/jira/browse/HDFS-5953 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Assignee: Akira AJISAKA > Fix For: 2.4.0 > > Attachments: HDFS-5953.patch > > > From > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ > : > {code} > java.lang.RuntimeException: Although a UNIX domain socket path is configured > as > /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, > we cannot start a localDataXceiverServer because libhadoop cannot be loaded. > at > org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) > at > org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) > {code} > This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5960) Fix TestRollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5960. - Resolution: Not A Problem Cannot repro this failure anymore, fixed HDFS-5963 for a separate bug. > Fix TestRollingUpgrade > -- > > Key: HDFS-5960 > URL: https://issues.apache.org/jira/browse/HDFS-5960 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch > > > {{TestRollingUpgrade}} fails when restarting the NN because > {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not > expected. > The fix is to start/finalize rolling upgrade when the corresponding edit log > op is seen. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.
[ https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904932#comment-13904932 ] Colin Patrick McCabe commented on HDFS-5957: bq. This usage pattern in combination with zero-copy read causes retention of a large number of memory-mapped regions in the ShortCircuitCache. Eventually, YARN's resource check kills the container process for exceeding the enforced physical memory bounds. mmap regions don't consume physical memory. They do consume virtual memory. I don't think limiting virtual memory usage is a particularly helpful policy, and YARN should stop doing that if that is in fact what it is doing. bq. As a workaround, I advised Gopal to downtune dfs.client.mmap.cache.timeout.ms to make the munmap happen more quickly. A better solution would be to provide support in the HDFS client for a caching policy that fits this usage pattern. In our tests, mmap provided no performance advantage unless it was reused. If Gopal needs to purge mmaps immediately after using them, the correct thing is simply not to use zero-copy reads. > Provide support for different mmap cache retention policies in > ShortCircuitCache. > - > > Key: HDFS-5957 > URL: https://issues.apache.org/jira/browse/HDFS-5957 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Chris Nauroth > > Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by > multiple reads of the same block or by multiple threads. The eventual > {{munmap}} executes on a background thread after an expiration period. Some > client usage patterns would prefer strict bounds on this cache and > deterministic cleanup by calling {{munmap}}. This issue proposes additional > support for different caching policies that better fit these usage patterns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904937#comment-13904937 ] Hudson commented on HDFS-5953: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5186 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5186/]) Update change description for HDFS-5953 (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569579) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > TestBlockReaderFactory fails if libhadoop.so has not been built > --- > > Key: HDFS-5953 > URL: https://issues.apache.org/jira/browse/HDFS-5953 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Assignee: Akira AJISAKA > Fix For: 2.4.0 > > Attachments: HDFS-5953.patch > > > From > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ > : > {code} > java.lang.RuntimeException: Although a UNIX domain socket path is configured > as > /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, > we cannot start a localDataXceiverServer because libhadoop cannot be loaded. > at > org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) > at > org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) > {code} > This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904943#comment-13904943 ] Yongjun Zhang commented on HDFS-5965: - Accidentally created multiple jiras for the same issue, due to the incorrect response of JIRA gui today. > caller of NetworkTopology's chooseRandom method to be expect null return value > -- > > Key: HDFS-5965 > URL: https://issues.apache.org/jira/browse/HDFS-5965 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Minor > > Class NetworkTopology's method >public Node chooseRandom(String scope) > calls >private Node chooseRandom(String scope, String excludedScope) > which may return null value. > Caller of this method such as BlockPlacementPolicyDefault etc need to be > aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904948#comment-13904948 ] Yongjun Zhang commented on HDFS-5967: - Accidentally created multiple jiras for the same issue, due to the unexpected response of JIRA gui today. > caller of NetworkTopology's chooseRandom method to be expect null return value > -- > > Key: HDFS-5967 > URL: https://issues.apache.org/jira/browse/HDFS-5967 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Minor > > Class NetworkTopology's method >public Node chooseRandom(String scope) > calls >private Node chooseRandom(String scope, String excludedScope) > which may return null value. > Caller of this method such as BlockPlacementPolicyDefault etc need to be > aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-5963: Assignee: Tsz Wo (Nicholas), SZE > TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail > > > Key: HDFS-5963 > URL: https://issues.apache.org/jira/browse/HDFS-5963 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: HDFS-5535 (Rolling upgrades) >Reporter: Arpit Agarwal >Assignee: Tsz Wo (Nicholas), SZE > > {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. > It seems to be caused by the terminate hook used by the test. Commenting out > this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904949#comment-13904949 ] Yongjun Zhang commented on HDFS-5967: - Accidentally created multiple jiras for the same issue, due to the unexpected response of JIRA gui today. > caller of NetworkTopology's chooseRandom method to be expect null return value > -- > > Key: HDFS-5967 > URL: https://issues.apache.org/jira/browse/HDFS-5967 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Minor > > Class NetworkTopology's method >public Node chooseRandom(String scope) > calls >private Node chooseRandom(String scope, String excludedScope) > which may return null value. > Caller of this method such as BlockPlacementPolicyDefault etc need to be > aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)