[jira] [Commented] (HDFS-6133) Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover.
[ https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170211#comment-15170211 ] Hudson commented on HDFS-6133: -- FAILURE: Integrated in HBase-Trunk_matrix #742 (See [https://builds.apache.org/job/HBase-Trunk_matrix/742/]) HBASE-15332 Document how to take advantage of HDFS-6133 in HBase (mstanleyjones: rev c5288947ddc4abae2f4036544a775ff81538df2f) * src/main/asciidoc/_chapters/troubleshooting.adoc > Add a feature for replica pinning so that a pinned replica will not be moved > by Balancer/Mover. > --- > > Key: HDFS-6133 > URL: https://issues.apache.org/jira/browse/HDFS-6133 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, datanode >Reporter: zhaoyunjiong >Assignee: zhaoyunjiong > Fix For: 2.7.0 > > Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, > HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, > HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, > HDFS-6133-9.patch, HDFS-6133.patch > > > Currently, run Balancer will destroying Regionserver's data locality. > If getBlocks could exclude blocks belongs to files which have specific path > prefix, like "/hbase", then we can run Balancer without destroying > Regionserver's data locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9866) BlockManager#chooseExcessReplicasStriped may weaken rack fault tolerance
[ https://issues.apache.org/jira/browse/HDFS-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171230#comment-15171230 ] Hudson commented on HDFS-9866: -- FAILURE: Integrated in Hadoop-trunk-Commit #9386 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9386/]) HDFS-9866. BlockManager#chooseExcessReplicasStriped may weaken rack (jing9: rev 408f2c807bb37ce1b69a5dfa9d76ed427d6e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithUpgradeDomain.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReconstructStripedBlocksWithRackAwareness.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java > BlockManager#chooseExcessReplicasStriped may weaken rack fault tolerance > > > Key: HDFS-9866 > URL: https://issues.apache.org/jira/browse/HDFS-9866 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-9866.000.patch > > > In [~tfukudom]'s system tests, we find the following issue: > A striped block group B has redundant internal block replicas. 9 internal > blocks are stored in 10 datanodes across 6 racks. Datanode d1 and d2 both > store a replica for internal block b1. d1's rack contains multiple internal > blocks while d2's rack only has b1. Then when choosing a duplicated replica > to delete, the current implementation may wrongly choose d2 thus causes the > total number of racks to be decreased to 5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9867) Missing block exception should carry locatedBlocks information
[ https://issues.apache.org/jira/browse/HDFS-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171235#comment-15171235 ] Hudson commented on HDFS-9867: -- FAILURE: Integrated in Hadoop-trunk-Commit #9387 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9387/]) HDFS-9867. Missing block exception should carry locatedBlocks (jing9: rev 321a80c759e887f52bb4f40c49328527f04560a1) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Missing block exception should carry locatedBlocks information > -- > > Key: HDFS-9867 > URL: https://issues.apache.org/jira/browse/HDFS-9867 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-9867.000.patch > > > When more than {{parityBlkNum}} internal blocks are missing, {{StripeReader}} > throws IOException. Sample error message like: > {quote} > java.io.IOException: 5 missing blocks, the stripe is: Offset=44695552, > length=65536, fetchedChunksNum=0, missingChunksNum=5 > {quote} > According to our recent experience, it'd be useful for debugging and > diagnosing to dump the current block group information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9864) Correct reference for RENEWDELEGATIONTOKEN and CANCELDELEGATIONTOKEN in webhdfs doc
[ https://issues.apache.org/jira/browse/HDFS-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171667#comment-15171667 ] Hudson commented on HDFS-9864: -- FAILURE: Integrated in Hadoop-trunk-Commit #9391 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9391/]) HDFS-9864. Correct reference for RENEWDELEGATIONTOKEN and (aajisaka: rev 8bc023b3b17754bf422eb9a8e749e8ea01768ac2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md > Correct reference for RENEWDELEGATIONTOKEN and CANCELDELEGATIONTOKEN in > webhdfs doc > --- > > Key: HDFS-9864 > URL: https://issues.apache.org/jira/browse/HDFS-9864 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9864.patch > > > Currently renewDelegationToken and cancelDelegationToken are not present in > the FileSystem.java.. > {noformat} > RENEWDELEGATIONTOKEN (see FileSystem.renewDelegationToken) > CANCELDELEGATIONTOKEN (see FileSystem.cancelDelegationToken) > See also: token, FileSystem.cancelDelegationToken > See also: token, FileSystem.renewDelegationToken > {noformat} > Reference: > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172922#comment-15172922 ] Hudson commented on HDFS-7964: -- FAILURE: Integrated in Hadoop-trunk-Commit #9395 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9395/]) HDFS-7964. Add support for async edit logging. Contributed by Daryn (jing9: rev 2151716832ad14932dd65b1a4e47e64d8d6cd767) * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/resources/log4j.properties * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperAsHASharedDir.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogJournalFailures.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogAutoroll.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogs.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.9.0 > > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9733) Refactor DFSClient#getFileChecksum and DataXceiver#blockChecksum
[ https://issues.apache.org/jira/browse/HDFS-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173280#comment-15173280 ] Hudson commented on HDFS-9733: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9399 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9399/]) HDFS-9733. Refactor DFSClient#getFileChecksum and (uma.gangumalla: rev 307ec80acae3b4a41d21b2d4b3a55032e55fcdc6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/MD5Hash.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/IOStreamPair.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java > Refactor DFSClient#getFileChecksum and DataXceiver#blockChecksum > > > Key: HDFS-9733 > URL: https://issues.apache.org/jira/browse/HDFS-9733 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9733-v1.patch, HDFS-9733-v2.patch, > HDFS-9733-v3.patch, HDFS-9733-v4.patch, HDFS-9733-v5.patch, > HDFS-9733-v6.patch, HDFS-9733-v7.patch, HDFS-9733-v8.patch, HDFS-9733-v9.patch > > > To prepare for file checksum computing for striped files, this refactors the > existing codes in Refactor {{DFSClient#getFileChecksum}} and > {{DataXceiver#blockChecksum}} to make HDFS-8430 and HDFS-9694 easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9870) Remove unused imports from DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174274#comment-15174274 ] Hudson commented on HDFS-9870: -- FAILURE: Integrated in Hadoop-trunk-Commit #9402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9402/]) HDFS-9870. Remove unused imports from DFSUtil. Contributed by Brahma (cnauroth: rev 2137e8feeb5c5c88d3a80db3a334fd472f299ee4) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java > Remove unused imports from DFSUtil > -- > > Key: HDFS-9870 > URL: https://issues.apache.org/jira/browse/HDFS-9870 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch > > > Remove the following unused imports {{DFSUtil.java}} > {code} > import static > org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY; > import java.io.InterruptedIOException; > import com.google.common.collect.Sets; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174413#comment-15174413 ] Hudson commented on HDFS-8791: -- FAILURE: Integrated in Hadoop-trunk-Commit #9403 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9403/]) HDFS-8791. block ID-based DN storage layout can be very slow for (kihwal: rev 2c8496ebf3b7b31c2e18fdf8d4cb2a0115f43112) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-to-57-dn-layout-dir.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-56-layout-datanode-dir.tgz * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174605#comment-15174605 ] Hudson commented on HDFS-9880: -- FAILURE: Integrated in Hadoop-trunk-Commit #9406 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9406/]) HDFS-9880. TestDatanodeRegistration fails occasionally. Contributed by (kihwal: rev e76b13c415459e4062c4c9660a16759a11ffb34a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Fix For: 2.7.3 > > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174731#comment-15174731 ] Hudson commented on HDFS-9881: -- FAILURE: Integrated in Hadoop-trunk-Commit #9407 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9407/]) HDFS-9881. DistributedFileSystem#getTrashRoot returns incorrect path for (wang: rev 4abb2fa687a80d2b76f2751dd31513822601b235) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Fix For: 2.8.0 > > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174951#comment-15174951 ] Hudson commented on HDFS-9876: -- FAILURE: Integrated in Hadoop-trunk-Commit #9408 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9408/]) HDFS-9876. shouldProcessOverReplicated should not count number of (jing9: rev f2ba7da4f0df6cf0fc245093aeb4500158e6ee0b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, > HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175087#comment-15175087 ] Hudson commented on HDFS-9766: -- FAILURE: Integrated in Hadoop-trunk-Commit #9409 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9409/]) HDFS-9766. TestDataNodeMetrics#testDataNodeTimeSpend fails (aajisaka: rev e2ddf824694eb4605f3bb04a9c26e4b98529f5bc) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently > -- > > Key: HDFS-9766 > URL: https://issues.apache.org/jira/browse/HDFS-9766 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Xiao Chen > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9766.01.patch > > > *Stacktrace* > {code} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289) > {code} > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175124#comment-15175124 ] Hudson commented on HDFS-9851: -- FAILURE: Integrated in Hadoop-trunk-Commit #9410 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9410/]) HDFS-9851. NameNode throws NPE when setPermission is called on a path (aajisaka: rev 27e0681f28ee896ada163bbbc08fd44d113e7d15) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/security/TestPermission.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176047#comment-15176047 ] Hudson commented on HDFS-9887: -- FAILURE: Integrated in Hadoop-trunk-Commit #9411 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9411/]) HDFS-9887. WebHdfs socket timeouts should be configurable. Contributed (xyao: rev 5abf051249d485313dfffc6aeff6f81c0da1f623) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTimeouts.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/URLConnectionFactory.java > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Fix For: 2.8.0 > > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9886) Configuration properties for hedged read is broken
[ https://issues.apache.org/jira/browse/HDFS-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176269#comment-15176269 ] Hudson commented on HDFS-9886: -- FAILURE: Integrated in Hadoop-trunk-Commit #9413 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9413/]) HDFS-9886. Configuration properties for hedged read is broken. (zhz: rev 67880ccae6568516ff3d13185fd7f250b234a2cf) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java > Configuration properties for hedged read is broken > -- > > Key: HDFS-9886 > URL: https://issues.apache.org/jira/browse/HDFS-9886 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HDFS-9886.01.patch > > > {code:title=HdfsClientConfigKeys.java} > /** dfs.client.hedged.read configuration properties */ > interface HedgedRead { > String THRESHOLD_MILLIS_KEY = PREFIX + "threshold.millis"; > longTHRESHOLD_MILLIS_DEFAULT = 500; > String THREADPOOL_SIZE_KEY = PREFIX + "threadpool.size"; > int THREADPOOL_SIZE_DEFAULT = 0; > } > {code} > {{PREFIX}} is not defined in the interface, so "dfs.client" is used as > {{PREFIX}}. Therefore, the properties are "dfs.client.threshold.millis" and > "dfs.client.threadpool.size". They should be > "dfs.client.hedged.read.threshold.millis" and > "dfs.client.hedged.read.threadpool.size". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9835) OIV: add ReverseXML processor which reconstructs an fsimage from an XML file
[ https://issues.apache.org/jira/browse/HDFS-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176978#comment-15176978 ] Hudson commented on HDFS-9835: -- FAILURE: Integrated in Hadoop-trunk-Commit #9414 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9414/]) HDFS-9835. OIV: add ReverseXML processor which reconstructs an fsimage (cmccabe: rev 700b0e4019cf483f7532609711812150b8c44742) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java > OIV: add ReverseXML processor which reconstructs an fsimage from an XML file > > > Key: HDFS-9835 > URL: https://issues.apache.org/jira/browse/HDFS-9835 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools >Affects Versions: 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-9835.001.patch, HDFS-9835.002.patch, > HDFS-9835.003.patch, HDFS-9835.004.patch, HDFS-9835.005.patch, > HDFS-9835.006.patch > > > OIV: add ReverseXML processor which reconstructs an fsimage from an XML file. > This will make it easy to create fsimages for testing, and manually edit > fsimages when there is corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9534) Add CLI command to clear storage policy from a path.
[ https://issues.apache.org/jira/browse/HDFS-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177048#comment-15177048 ] Hudson commented on HDFS-9534: -- FAILURE: Integrated in Hadoop-trunk-Commit #9415 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9415/]) HDFS-9534. Add CLI command to clear storage policy from a path. (arp: rev 27941a1811831e0f2144a2f463d807755cd850b2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestStoragePolicyCommands.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestApplyingStoragePolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ArchivalStorage.md * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java > Add CLI command to clear storage policy from a path. > > > Key: HDFS-9534 > URL: https://issues.apache.org/jira/browse/HDFS-9534 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Chris Nauroth >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-9534.001.patch, HDFS-9534.002.patch, > HDFS-9534.003.patch, HDFS-9534.004.patch > > > The {{hdfs storagepolicies}} command has sub-commands for > {{-setStoragePolicy}} and {{-getStoragePolicy}} on a path. However, there is > no {{-removeStoragePolicy}} to remove a previously set storage policy on a > path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9048) DistCp documentation is out-of-dated
[ https://issues.apache.org/jira/browse/HDFS-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177607#comment-15177607 ] Hudson commented on HDFS-9048: -- FAILURE: Integrated in Hadoop-trunk-Commit #9418 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9418/]) HDFS-9048. DistCp documentation is out-of-dated (Daisuke Kobayashi via (iwasakims: rev 33a412e8a4ab729d588a9576fb7eb90239c6e383) * hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > DistCp documentation is out-of-dated > > > Key: HDFS-9048 > URL: https://issues.apache.org/jira/browse/HDFS-9048 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Daisuke Kobayashi > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9048-2.patch, HDFS-9048-3.patch, HDFS-9048-4.patch, > HDFS-9048.patch > > > There are a couple issues with the current distcp document: > * It recommends hftp / hsftp filesystem to copy data between different hadoop > version. hftp / hsftp have been deprecated in the flavor of webhdfs. > * If the users are copying between Hadoop 2.x they can use the hdfs protocol > directly for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9888) Allow reseting KerberosName in unit tests
[ https://issues.apache.org/jira/browse/HDFS-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180255#comment-15180255 ] Hudson commented on HDFS-9888: -- FAILURE: Integrated in Hadoop-trunk-Commit #9424 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9424/]) HDFS-9888. Allow reseting KerberosName in unit tests. Contributed by (zhz: rev 3e8099a45a4cfd4c5c0e3dce4370514cb2c90da9) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosName.java > Allow reseting KerberosName in unit tests > - > > Key: HDFS-9888 > URL: https://issues.apache.org/jira/browse/HDFS-9888 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-9888.01.patch > > > In some local environments, {{TestBalancer#testBalancerWithKeytabs}} may > fail. Specifically, running itself passes, but running {{TestBalancer}} suite > always fail. This is due to: > # Kerberos setup is done at the test case setup > # static variable {{KerberosName#defaultRealm}} is set when class > initialization - before {{testBalancerWithKeytabs}} setup > # local default realm is different than test case default realm > This is mostly an environment specific problem, but let's not make such > assumption in the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9889) Update balancer/mover document about HDFS-6133 feature
[ https://issues.apache.org/jira/browse/HDFS-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180571#comment-15180571 ] Hudson commented on HDFS-9889: -- FAILURE: Integrated in Hadoop-trunk-Commit #9425 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9425/]) HDFS-9889. Update balancer/mover document about HDFS-6133 feature. (yzhang: rev 8e08861a14cb5b6adce338543d7da08e9926ad46) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Update balancer/mover document about HDFS-6133 feature > -- > > Key: HDFS-9889 > URL: https://issues.apache.org/jira/browse/HDFS-9889 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9889.001.patch, HDFS-9889.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6133) Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover.
[ https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180570#comment-15180570 ] Hudson commented on HDFS-6133: -- FAILURE: Integrated in Hadoop-trunk-Commit #9425 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9425/]) HDFS-9889. Update balancer/mover document about HDFS-6133 feature. (yzhang: rev 8e08861a14cb5b6adce338543d7da08e9926ad46) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Add a feature for replica pinning so that a pinned replica will not be moved > by Balancer/Mover. > --- > > Key: HDFS-6133 > URL: https://issues.apache.org/jira/browse/HDFS-6133 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, datanode >Reporter: zhaoyunjiong >Assignee: zhaoyunjiong > Fix For: 2.7.0 > > Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, > HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, > HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, > HDFS-6133-9.patch, HDFS-6133.patch > > > Currently, run Balancer will destroying Regionserver's data locality. > If getBlocks could exclude blocks belongs to files which have specific path > prefix, like "/hbase", then we can run Balancer without destroying > Regionserver's data locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
[ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181353#comment-15181353 ] Hudson commented on HDFS-9239: -- FAILURE: Integrated in Hadoop-trunk-Commit #9426 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9426/]) HDFS-9239. DataNode Lifeline Protocol: an alternative protocol for (cnauroth: rev 2759689d7d23001f007cb0dbe2521de90734dd5c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeLifelineProtocolPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeRegister.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeLifelineProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/pom.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockPoolManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeLifeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamenodeProtocols.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeLifelineProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeLifelineProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeLifelineProtocolClientSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java > DataNode Lifeline Protocol: an alternative protocol for reporting DataNode > liveness > --- > > Key: HDFS-9239 > URL: https://issues.apache.org/jira/browse/HDFS-9239 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch, > HDFS-9239.002.patch, HDFS-9239.003.patch > > > This issue proposes introduction of a new feature: the DataNode Lifeline > Protocol. This is an RPC protocol that is responsible for reporting liveness > and basic health information about a DataNode to a NameNode. Compared to the > existing heartbeat messages, it is lightweight and not prone to resource > contention problems that can harm accurate tracking of DataNode liveness > currently. The attached design document contains more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk
[ https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182950#comment-15182950 ] Hudson commented on HDFS-9521: -- FAILURE: Integrated in Hadoop-trunk-Commit #9433 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9433/]) HDFS-9521. TransferFsImage.receiveFile should account and log separate (harsh: rev fd1c09be3e7c67c188a1dd7e4fccb3d92dcc5b5b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > TransferFsImage.receiveFile should account and log separate times for image > download and fsync to disk > --- > > Key: HDFS-9521 > URL: https://issues.apache.org/jira/browse/HDFS-9521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 2.9.0 > > Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, > HDFS-9521.004.patch, HDFS-9521.patch, HDFS-9521.patch.1 > > > Currently, TransferFsImage.receiveFile is logging total transfer time as > below: > {noformat} > double xferSec = Math.max( >((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001); > long xferKb = received / 1024; > LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / > xferSec)) > {noformat} > This is really useful, but it just measures the total method execution time, > which includes time taken to download the image and do an fsync to all the > namenode metadata directories. > Sometime when troubleshooting these imager transfer problems, it's > interesting to know which part of the process is being the bottleneck > (whether network or disk write). > This patch accounts time for image download and fsync to each disk > separately, logging how much time did it take on each operation. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9865) TestBlockReplacement fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1518#comment-1518 ] Hudson commented on HDFS-9865: -- FAILURE: Integrated in Hadoop-trunk-Commit #9436 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9436/]) HDFS-9865. TestBlockReplacement fails intermittently in trunk (Lin Yiqun (iwasakims: rev d718fc1ee5aee3628e105339ee3ea183b6242409) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java > TestBlockReplacement fails intermittently in trunk > -- > > Key: HDFS-9865 > URL: https://issues.apache.org/jira/browse/HDFS-9865 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9865.001.patch, HDFS-9865.002.patch > > > I found the testcase {{TestBlockReplacement}} will be failed sometimes in > testing. And I looked the unit log, always I will found these infos: > {code} > org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement > testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement) > Time elapsed: 8.764 sec <<< FAILURE! > java.lang.AssertionError: The block should be only on 1 datanode > expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436) > {code} > Finally I found the reason is that not deleting block completely in > testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. > And the time to wait FsDatasetAsyncDsikService to delete the block is not a > accurate value. > {code} > LOG.info("replaceBlock: " + replaceBlock(block, > (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc, > (DatanodeInfo)destDnDesc)); > // Waiting for the FsDatasetAsyncDsikService to delete the block > Thread.sleep(3000); > {code} > When I adjust this time to 1 seconds, it will be always failed. Also the 3 > seconds in test is not a accurate value too. We should adjust these code's > logic to a better way such as waiting for the block to be replicated in > testDecommision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted
[ https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183662#comment-15183662 ] Hudson commented on HDFS-9906: -- FAILURE: Integrated in Hadoop-trunk-Commit #9438 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9438/]) HDFS-9906. Remove spammy log spew when a datanode is restarted. (arp: rev 724d2299cd2516d90c030f6e20d814cceb439228) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Remove spammy log spew when a datanode is restarted > --- > > Key: HDFS-9906 > URL: https://issues.apache.org/jira/browse/HDFS-9906 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9906.patch > > > {code} > WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock > request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size > 268435456 > {code} > This happens wy too much to add any useful information. We should either > move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9812) Streamer threads leak if failure happens when closing DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184272#comment-15184272 ] Hudson commented on HDFS-9812: -- FAILURE: Integrated in Hadoop-trunk-Commit #9440 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9440/]) HDFS-9812. Streamer threads leak if failure happens when closing (aajisaka: rev 352d299cf8ebe330d24117df98d1e6a64ae38c26) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java > Streamer threads leak if failure happens when closing DFSOutputStream > - > > Key: HDFS-9812 > URL: https://issues.apache.org/jira/browse/HDFS-9812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9812-branch-2.7.patch, HDFS-9812.002.patch, > HDFS-9812.003.patch, HDFS-9812.004.patch, HDFS-9812.branch-2.patch, > HDFS.001.patch > > > In HDFS-9794, it has solved problem of that stream thread leak if failure > happens when closing the striped outputstream. And in {{DFSOutputStream}}, it > also exists the same problem in {{DFSOutputStream#closeImpl}}. If failures > happen when flushing data blocks, the streamer threads will also not be > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184426#comment-15184426 ] Hudson commented on HDFS-9882: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9441 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9441/]) HDFS-9882. Add heartbeatsTotal in Datanode metrics. (Contributed by Hua (arp: rev c2140d05efaf18b41caae8c61d9f6d668ab0e874) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Fix For: 2.8.0 > > Attachments: > 0001-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0002-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0003-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch, > 0004-HDFS-9882.Add-heartbeatsTotal-in-Datanode-metrics.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8786) Erasure coding: use simple replication for internal blocks on decommissioning datanodes
[ https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185517#comment-15185517 ] Hudson commented on HDFS-8786: -- FAILURE: Integrated in Hadoop-trunk-Commit #9443 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9443/]) HDFS-8786. Erasure coding: use simple replication for internal blocks on (jing9: rev 743a99f2dbc9a27e19f92ff3551937d90dba2e89) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReconstructionWork.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ReplicationWork.java > Erasure coding: use simple replication for internal blocks on decommissioning > datanodes > --- > > Key: HDFS-8786 > URL: https://issues.apache.org/jira/browse/HDFS-8786 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Rakesh R > Fix For: 3.0.0 > > Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, > HDFS-8786-003.patch, HDFS-8786-004.patch, HDFS-8786-005.patch, > HDFS-8786-006.patch, HDFS-8786-draft.patch > > > Per [discussion | > https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] > under HDFS-8697, it's too expensive to reconstruct block groups for decomm > purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9919) TestStandbyCheckpoints#testNonPrimarySBNUploadFSImage waitForCheckpoint incorrectly
[ https://issues.apache.org/jira/browse/HDFS-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185994#comment-15185994 ] Hudson commented on HDFS-9919: -- FAILURE: Integrated in Hadoop-trunk-Commit #9444 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9444/]) HDFS-9919. TestStandbyCheckpoints#testNonPrimarySBNUploadFSImage (wang: rev a14a6f08ee9404168affe91affd095e349630971) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java > TestStandbyCheckpoints#testNonPrimarySBNUploadFSImage waitForCheckpoint > incorrectly > --- > > Key: HDFS-9919 > URL: https://issues.apache.org/jira/browse/HDFS-9919 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-9919.001.patch > > > In HDFS-9787, it sloved the problem that standby NNs can upload FSImage to > ANN after become non-primary standby NN. But in its unit test, it seems there > is a small problem. when ANN change state to standby and standby nns should > do a checkpoint. And in test, it makes a checpoint check as follow: > {code} > for (int i = 0; i < NUM_NNS; i++) { > // Once the standby catches up, it should do a checkpoint > // and save to local directories. > HATestUtil.waitForCheckpoint(cluster, 1, ImmutableList.of(12)); > } > {code} > And in these code, the nnIdx for waitForCheckpoint is always {{1}}, not > {{i}}. It seems there is no need to check one standby nn's checkpoint > {{NUM_NNS}} times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186762#comment-15186762 ] Hudson commented on HDFS-7866: -- FAILURE: Integrated in Hadoop-trunk-Commit #9446 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9446/]) HDFS-7866. Erasure coding: NameNode manages multiple erasure coding (zhz: rev 7600e3c48ff2043654dbe9f415a186a336b5ea6c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReconstructStripedBlocksWithRackAwareness.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ErasureCodingPolicy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestStripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStripedINodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFileAttributes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirErasureCodingOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ErasureCodingPolicyManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicies.java > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Fix For: 3.0.0 > > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9927) Document the new OIV ReverseXML processor
[ https://issues.apache.org/jira/browse/HDFS-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190006#comment-15190006 ] Hudson commented on HDFS-9927: -- FAILURE: Integrated in Hadoop-trunk-Commit #9450 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9450/]) HDFS-9927. Document the new OIV ReverseXML processor (Wei-Chiu Chuang (cmccabe: rev 500875dfccdb3bb6709767962d1927ddb1cc5514) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md > Document the new OIV ReverseXML processor > - > > Key: HDFS-9927 > URL: https://issues.apache.org/jira/browse/HDFS-9927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: documentation, supportability > Fix For: 2.8.0 > > Attachments: HDFS-9927.001.patch > > > HDFS-9835 added a new ReverseXML processor which reconstructs an fsimage from > an XML file. > This new feature should be documented, and perhaps label it as "experimental" > in command line. > Also, OIV section in HDFSCommands.md should be updated too, to include new > processors options and it should also include links to OIV page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9934) ReverseXML oiv processor should bail out if the XML file's layoutVersion doesn't match oiv's
[ https://issues.apache.org/jira/browse/HDFS-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190004#comment-15190004 ] Hudson commented on HDFS-9934: -- FAILURE: Integrated in Hadoop-trunk-Commit #9450 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9450/]) HDFS-9934. ReverseXML oiv processor should bail out if the XML file's (cmccabe: rev bd49354c6d6387620b0de2219eab1714ec2d64f8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java > ReverseXML oiv processor should bail out if the XML file's layoutVersion > doesn't match oiv's > > > Key: HDFS-9934 > URL: https://issues.apache.org/jira/browse/HDFS-9934 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.8.0 > > Attachments: HDFS-9934.001.patch > > > ReverseXML oiv processor should bail out if the XML file's layoutVersion > doesn't match oiv's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9933) ReverseXML should be capitalized in oiv usage message
[ https://issues.apache.org/jira/browse/HDFS-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190005#comment-15190005 ] Hudson commented on HDFS-9933: -- FAILURE: Integrated in Hadoop-trunk-Commit #9450 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9450/]) HDFS-9933. ReverseXML should be capitalized in oiv usage message (cmccabe: rev 79961ecea888e0ee85b7a75e239bb6bb3335eb17) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java > ReverseXML should be capitalized in oiv usage message > - > > Key: HDFS-9933 > URL: https://issues.apache.org/jira/browse/HDFS-9933 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9933.001.patch > > > ReverseXML should be capitalized in oiv usage message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1477) Support reconfiguring dfs.heartbeat.interval and dfs.namenode.heartbeat.recheck-interval without NN restart
[ https://issues.apache.org/jira/browse/HDFS-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190456#comment-15190456 ] Hudson commented on HDFS-1477: -- FAILURE: Integrated in Hadoop-trunk-Commit #9452 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9452/]) HDFS-1477. Support reconfiguring dfs.heartbeat.interval and (arp: rev e01c6ea688e62f25c4310e771a0cd85b53a5fb87) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java > Support reconfiguring dfs.heartbeat.interval and > dfs.namenode.heartbeat.recheck-interval without NN restart > --- > > Key: HDFS-1477 > URL: https://issues.apache.org/jira/browse/HDFS-1477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Patrick Kling >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-1477-HDFS-9000.006.patch, > HDFS-1477-HDFS-9000.007.patch, HDFS-1477-HDFS-9000.008.patch, > HDFS-1477-HDFS-9000.009.patch, HDFS-1477.005.patch, HDFS-1477.2.patch, > HDFS-1477.3.patch, HDFS-1477.4.patch, HDFS-1477.patch > > > Modify NameNode to implement the interface Reconfigurable proposed in > HADOOP-7001. This would allow us to change certain configuration properties > without restarting the name node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9941) Do not log StandbyException on NN, other minor logging fixes
[ https://issues.apache.org/jira/browse/HDFS-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193891#comment-15193891 ] Hudson commented on HDFS-9941: -- FAILURE: Integrated in Hadoop-trunk-Commit #9459 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9459/]) HDFS-9941. Do not log StandbyException on NN, other minor logging fixes. (cnauroth: rev 5644137adad30c84e40d2c4719627b3aabc73628) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java > Do not log StandbyException on NN, other minor logging fixes > > > Key: HDFS-9941 > URL: https://issues.apache.org/jira/browse/HDFS-9941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.8.0 > > Attachments: HDFS-9941-branch-2.03.patch, HDFS-9941.01.patch, > HDFS-9941.02.patch, HDFS-9941.03.patch > > > The NameNode can skip logging StandbyException messages. These are seen > regularly in normal operation and convey no useful information. > We no longer log the locations of newly allocated blocks in 2.8.0. The DN IDs > can be useful for debugging so let's add that back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9947) Block#toString should not output information from derived classes
[ https://issues.apache.org/jira/browse/HDFS-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194315#comment-15194315 ] Hudson commented on HDFS-9947: -- FAILURE: Integrated in Hadoop-trunk-Commit #9460 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9460/]) HDFS-9947. Block#toString should not output information from derived (cmccabe: rev 9a43094e12ab8d35d49ceda2e2c5f83093bb3a5b) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java > Block#toString should not output information from derived classes > - > > Key: HDFS-9947 > URL: https://issues.apache.org/jira/browse/HDFS-9947 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 2.9.0 > > Attachments: HDFS-9947.001.patch > > > {{Block#toString}} should not output information from derived classes. > Thanks for [~cnauroth] for spotting this bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9928) Make HDFS commands guide up to date
[ https://issues.apache.org/jira/browse/HDFS-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195205#comment-15195205 ] Hudson commented on HDFS-9928: -- FAILURE: Integrated in Hadoop-trunk-Commit #9463 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9463/]) HDFS-9928. Make HDFS commands guide up to date (Wei-Chiu Chuang via (iwasakims: rev 5de848cd5d46527a8fba481c76089da21f533050) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Make HDFS commands guide up to date > --- > > Key: HDFS-9928 > URL: https://issues.apache.org/jira/browse/HDFS-9928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: documentation, supportability > Fix For: 2.9.0 > > Attachments: HDFS-9928-branch-2.002.patch, HDFS-9928-trunk.003.patch, > HDFS-9928.001.patch > > > A few HDFS subcommands and options are missing in the documentation. > # envvars: display computed Hadoop environment variables > I also noticed (in HDFS-9927) that a few OIV options are missing, and I'll be > looking for other missing options as well. > Filling this JIRA to fix them all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195578#comment-15195578 ] Hudson commented on HDFS-9904: -- FAILURE: Integrated in Hadoop-trunk-Commit #9464 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9464/]) HDFS-9904. testCheckpointCancellationDuringUpload occasionally fails. (kihwal: rev d4574017845cfa7521e703f80efd404afd09b8c4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java > testCheckpointCancellationDuringUpload occasionally fails > -- > > Key: HDFS-9904 > URL: https://issues.apache.org/jira/browse/HDFS-9904 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.3 >Reporter: Kihwal Lee >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-9904.001.patch, HDFS-9904.002.patch > > > The failure was at the end of the test case where the txid of the standby > (former active) is checked. Since the checkpoint/uploading was canceled , it > is not supposed to have the new checkpoint. Looking at the test log, that was > still the case, but the standby then did checkpoint on its own and bumped up > the txid, right before the check was performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10173) Typo in DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198599#comment-15198599 ] Hudson commented on HDFS-10173: --- FAILURE: Integrated in Hadoop-trunk-Commit #9470 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9470/]) HDFS-10173. Typo in DataXceiverServer. Contributed by Michael Han. (wang: rev 02a250db9f4bc54436cd9900a084215e5e3c8dae) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java > Typo in DataXceiverServer > - > > Key: HDFS-10173 > URL: https://issues.apache.org/jira/browse/HDFS-10173 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Michael Han >Assignee: Michael Han >Priority: Trivial > Labels: newbie > Fix For: 2.9.0 > > Attachments: HDFS-10173.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L78 > bandwith -> bindwidth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201875#comment-15201875 ] Hudson commented on HDFS-9874: -- FAILURE: Integrated in Hadoop-trunk-Commit #9474 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9474/]) HDFS-9874. Long living DataXceiver threads cause volume shutdown to (kihwal: rev 63c966a3fbeb675959fc4101e65de9f57aecd17d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633) > 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > delete old dfsUsed file in > volumeA/current/BP-1788428031-nnIp-1351700107344/current > 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > write dfsUsed to > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only > file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPool
[jira] [Commented] (HDFS-9857) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-1]
[ https://issues.apache.org/jira/browse/HDFS-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198598#comment-15198598 ] Hudson commented on HDFS-9857: -- FAILURE: Integrated in Hadoop-trunk-Commit #9470 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9470/]) HDFS-9857. Erasure Coding: Rename replication-based names in (zezhang: rev 32d043d9c5f4615058ea4f65a58ba271ba47fcb5) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlockQueues.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestPendingReplication.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestLowRedundancyBlockQueues.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Erasure Coding: Rename replication-based names in BlockManager to more > generic [part-1] > --- > > Key: HDFS-9857 > URL: https://issues.apache.org/jira/browse/HDFS-9857 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.0.0 > > Attachments: HDFS-9857-001.patch, HDFS-9857-003.patch, > HDFS-9857-004.patch, HDFS-9857-02.patch > > > The idea of this jira is to rename the following entities in BlockManager as, > - {{UnderReplicatedBlocks}} to {{LowRedundancyBlocks}} > - {{neededReplications}} to {{neededReconstruction}} > - {{replicationQueuesInitializer}} to {{reconstructionQueuesInitializer}} > Thanks [~zhz], [~andrew.wang] for the useful > [discussions|https://issues.apache.org/jira/browse/HDFS-7955?focusedCommentId=15149406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15149406] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3677) dfs.namenode.edits.dir.required missing from hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201877#comment-15201877 ] Hudson commented on HDFS-3677: -- FAILURE: Integrated in Hadoop-trunk-Commit #9474 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9474/]) HDFS-3677. dfs.namenode.edits.dir.required is missing from (aajisaka: rev 9b623fbaf79c0f854abc6b4a0539d4ea8c93dc1a) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > dfs.namenode.edits.dir.required missing from hdfs-default.xml > - > > Key: HDFS-3677 > URL: https://issues.apache.org/jira/browse/HDFS-3677 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, namenode >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Mark Yang > Labels: newbie > Fix For: 2.8.0 > > Attachments: HDFS-3677.patch > > > This config should be documented. It's useful for cases where the user would > like to ensure that (eg) an NFS filer always has the most up-to-date edits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202974#comment-15202974 ] Hudson commented on HDFS-9579: -- FAILURE: Integrated in Hadoop-trunk-Commit #9478 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9478/]) HDFS-9579. Provide bytes-read-by-network-distance metrics at (sjlee: rev cd8b6889a74a949e37f4b2eb664cdf3b59bfb93b) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ExternalBlockReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NodeBase.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReader.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestConnCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ReplicaAccessor.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestExternalBlockReader.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetUtils.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0 > > Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, > HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, > HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, HDFS-9579.patch, MR > job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9949) Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
[ https://issues.apache.org/jira/browse/HDFS-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200117#comment-15200117 ] Hudson commented on HDFS-9949: -- FAILURE: Integrated in Hadoop-trunk-Commit #9473 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9473/]) HDFS-9949. Add a test case to ensure that the DataNode does not (cmccabe: rev dc951e606f40bb779632a8a3e3a46aeccc4a446a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeUUID.java > Add a test case to ensure that the DataNode does not regenerate its UUID when > a storage directory is cleared > > > Key: HDFS-9949 > URL: https://issues.apache.org/jira/browse/HDFS-9949 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.6.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9949.000.branch-2.7.not-for-commit.patch, > HDFS-9949.000.patch, HDFS-9949.001.patch > > > In the following scenario, in releases without HDFS-8211, the DN may > regenerate its UUIDs unintentionally. > 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} > 1. Stop DN > 2. Unmount the second disk, {{/data2/dfs/dn}} > 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root > path > 4. Start DN > 5. DN now considers /data2/dfs/dn empty so formats it, but during the format > it uses {{datanode.getDatanodeUuid()}} which is null until register() is > called. > 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} > gets called with successful condition, and it causes a new generation of UUID > which is written to all disks {{/data1/dfs/dn/current/VERSION}} and > {{/data2/dfs/dn/current/VERSION}}. > 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was > realised) > 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the > {{VERSION}} file to be the original one again on it (mounting masks the root > path that we last generated upon). > 9. DN fails to start up cause it finds mismatched UUID between the two disks, > with an error similar to: > {code}WARN org.apache.hadoop.hdfs.server.common.Storage: > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory > /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: > DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4, does not match > 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.{code} > The DN should not generate a new UUID if one of the storage disks already > have the older one. > HDFS-8211 unintentionally fixes this by changing the > {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} > representation of the UUID vs. the {{DatanodeID}} object which only gets > available (non-null) _after_ the registration. > It'd still be good to add a direct test case to the above scenario that > passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can > catch a regression around this in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7166) SbNN Web UI shows #Under replicated blocks and #pending deletion blocks
[ https://issues.apache.org/jira/browse/HDFS-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203586#comment-15203586 ] Hudson commented on HDFS-7166: -- FAILURE: Integrated in Hadoop-trunk-Commit #9480 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9480/]) HDFS-7166. SbNN Web UI shows #Under replicated blocks and #pending (wheat9: rev 8a3f0cb25540c7e70471aebcdd408feb478f878e) * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html > SbNN Web UI shows #Under replicated blocks and #pending deletion blocks > --- > > Key: HDFS-7166 > URL: https://issues.apache.org/jira/browse/HDFS-7166 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Juan Yu >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0 > > Attachments: HDFS-7166.001.patch > > > I believe that's an regression of HDFS-5333 > According to HDFS-2901 and HDFS-6178 > The Standby Namenode doesn't compute replication queues, we shouldn't show > under-replicated/missing blocks or corrupt files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread
[ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204863#comment-15204863 ] Hudson commented on HDFS-9405: -- FAILURE: Integrated in Hadoop-trunk-Commit #9482 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9482/]) HDFS-9405. Warmup NameNode EDEK caches in background thread. Contributed (wang: rev e3bb38d62567eafe57d16b78deeba1b71c58e41c) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestValueQueue.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java > Warmup NameNode EDEK caches in background thread > > > Key: HDFS-9405 > URL: https://issues.apache.org/jira/browse/HDFS-9405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption, namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, > HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, > HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, > HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, > HDFS-9405.12.patch, HDFS-9405.13.patch > > > {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation > to the key provider, which could be slow or cause timeout. It should be done > as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor
[ https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204956#comment-15204956 ] Hudson commented on HDFS-9951: -- FAILURE: Integrated in Hadoop-trunk-Commit #9483 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9483/]) HDFS-9951. Use string constants for XML tags in (cmccabe: rev 680716f31e120f4d3ee70b095e4db46c05b891d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java > Use string constants for XML tags in OfflineImageReconstructor > -- > > Key: HDFS-9951 > URL: https://issues.apache.org/jira/browse/HDFS-9951 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, > HDFS-9551.003.patch, HDFS-9551.004.patch > > > In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to > process xml files and load the subtree of the XML into a Node structure. But > there are lots of places that node removes key by directively writing value > in methods rather than define them first. Like this: > {code} > Node expiration = directive.removeChild("expiration"); > {code} > We could improve this to define them in Node and them invoked like this way: > {code} > Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION); > {code} > And it will be good to manager node key's name in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10189) PacketResponder#toString should include the downstreams for PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE
[ https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207714#comment-15207714 ] Hudson commented on HDFS-10189: --- FAILURE: Integrated in Hadoop-trunk-Commit #9486 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9486/]) HDFS-10189. PacketResponder#toString should include the downstreams for (cmccabe: rev a7d8f2b3960d27c74abb17ce2aa4bcd999706ad2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java > PacketResponder#toString should include the downstreams for > PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE > -- > > Key: HDFS-10189 > URL: https://issues.apache.org/jira/browse/HDFS-10189 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Joe Pallas >Assignee: Joe Pallas >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10189.patch > > > The constructor for {{BlockReceiver.PacketResponder}} says > {code} > final StringBuilder b = new StringBuilder(getClass().getSimpleName()) > .append(": ").append(block).append(", type=").append(type); > if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) { > b.append(", downstreams=").append(downstreams.length) > .append(":").append(Arrays.asList(downstreams)); > } > {code} > So it includes the list of downstreams only when it has no downstreams. The > {{if}} test should be for equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10193) fuse_dfs segfaults if uid cannot be resolved to a username
[ https://issues.apache.org/jira/browse/HDFS-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208935#comment-15208935 ] Hudson commented on HDFS-10193: --- FAILURE: Integrated in Hadoop-trunk-Commit #9490 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9490/]) HDFS-10193. fuse_dfs segfaults if uid cannot be resolved to a username (cmccabe: rev 0d19a0ce98053572447bdadf88687ec55f2f1f46) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c > fuse_dfs segfaults if uid cannot be resolved to a username > -- > > Key: HDFS-10193 > URL: https://issues.apache.org/jira/browse/HDFS-10193 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 2.0.0-alpha, 2.6.0 > Environment: Confirmed with Cloudera > hadoop-hdfs-fuse-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64 on CentOS 6 >Reporter: John Thiltges >Assignee: John Thiltges > Fix For: 2.8.0 > > Attachments: HDFS-10193.001.patch > > > When a user does an 'ls' on a HDFS FUSE mount, dfs_getattr() is called and > fuse_dfs attempts to resolve the user's uid into a username string with > getUsername(). If this lookup is unsuccessful, getUsername() returns NULL > leading to a segfault in hdfsConnCompare(). > Sites storing NSS info in a remote database (such as LDAP) will occasionally > have NSS failures if there are connectivity or daemon issues. Running > processes accessing the HDFS mount during this time may cause the fuse_dfs > process to crash, disabling the mount. > To reproduce the issue: > 1) Add a new local user > 2) su to the new user > 3) As root, edit /etc/passwd, changing the new user's uid number > 4) As the new user, do an ls on an HDFS FUSE mount. This should cause a > segfault. > Backtrace from fuse_dfs segfault > (hadoop-hdfs-fuse-2.0.0+545-1.cdh4.1.1.p0.21.osg33.el6.x86_64) > {noformat} > #0 0x003f43c32625 in raise (sig=) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x003f43c33e05 in abort () at abort.c:92 > #2 0x003f46beb785 in os::abort (dump_core=true) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1640 > #3 0x003f46d5f03f in VMError::report_and_die (this=0x7ffa3cdf86f0) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1075 > #4 0x003f46d5f70b in crash_handler (sig=11, info=0x7ffa3cdf88b0, > ucVoid=0x7ffa3cdf8780) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/os/linux/vm/vmError_linux.cpp:106 > #5 > #6 os::is_first_C_frame (fr=) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/runtime/os.cpp:1025 > #7 0x003f46d5e071 in VMError::report (this=0x7ffa3cdf9560, > st=0x7ffa3cdf93e0) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:617 > #8 0x003f46d5ebad in VMError::report_and_die (this=0x7ffa3cdf9560) at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1009 > #9 0x003f46bf0322 in JVM_handle_linux_signal (sig=11, > info=0x7ffa3cdf9730, ucVoid=0x7ffa3cdf9600, abort_if_unrecognized=1021285600) > at > /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:531 > #10 > #11 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259 > #12 0x00403d3d in hdfsConnCompare (head=, > elm=) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_connect.c:204 > #13 hdfsConnTree_RB_FIND (head=, elm= out>) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_connect.c:81 > #14 0x00404245 in hdfsConnFind (usrname=0x0, ctx=0x7ff95013b800, > out=0x7ffa3cdf9c60) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_connect.c:220 > #15 fuseConnect (usrname=0x0, ctx=0x7ff95013b800, out=0x7ffa3cdf9c60) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_connect.c:517 > #16 0x00404337 in fuseConnectAsThreadUid (conn=0x7ffa3cdf9c60) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_connect.c:544 > #17 0x00404c55 in dfs_getattr (path=0x7ff950150de0 "/user/users01", > st=0x7ffa3cdf9d20) at > /usr/src/debug/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_impls_getattr.c:37 > #18 0x003f47c0b353 in lookup_path (f=0x15e39f0, nodeid=22546, > name=0x7ff9602d0058 "users01", path=, e=0x7ffa3cdf9d10, > fi=) at fuse.c:1824 > #19 0x003f47c0d865 in fuse_lib_lookup (req=0x7ff950003fe0, parent=22546, > name=0x7ff9602
[jira] [Commented] (HDFS-10200) Docs for WebHDFS still describe GETDELEGATIONTOKENS operation
[ https://issues.apache.org/jira/browse/HDFS-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209106#comment-15209106 ] Hudson commented on HDFS-10200: --- FAILURE: Integrated in Hadoop-trunk-Commit #9491 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9491/]) HDFS-10200. Docs for WebHDFS still describe GETDELEGATIONTOKENS (wang: rev 8f85e5d2128c54c47d2a0098f6f4d4e04d53d74b) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md > Docs for WebHDFS still describe GETDELEGATIONTOKENS operation > - > > Key: HDFS-10200 > URL: https://issues.apache.org/jira/browse/HDFS-10200 > Project: Hadoop HDFS > Issue Type: Task > Components: documentation >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-10200.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9005) Provide support for upgrade domain script
[ https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212680#comment-15212680 ] Hudson commented on HDFS-9005: -- FAILURE: Integrated in Hadoop-trunk-Commit #9502 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9502/]) HDFS-9005. Provide support for upgrade domain script. (Ming Ma via Lei (lei: rev 4fcfea71bfb16295f3a661e712d66351a1edc55e) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestUpgradeDomainBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/CombinedHostsFileWriter.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeAdminProperties.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostSet.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestCombinedHostsFileReader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/HostsFileWriter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/dfs.hosts.json * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/CombinedHostsFileReader.java Add missing files from HDFS-9005. (lei) (lei: rev fde8ac5d8514f5146f438f8d0794116aaef20416) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHostsFiles.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHostFileManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeReport.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsUserGuide.md * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeID.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java > Provide support for upgrade domain script > - > > Key: HDFS-9005 > URL: https://issues.apache.org/jira/browse/HDFS-9005 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, > HDFS-9005.patch > > > As part of the upgrade domain feature, we need to provide a mechanism to > specify upgrade domain for each datanode. One way to accomplish that is to > allow admins specify an upgrade domain script that takes DN ip or hostname as > input and return the upgrade domain. Then namenode will use it at run time to > set {{DatanodeInfo}}'s upgrade domain string. The configuration can be > something like: > {noformat} > > dfs.namenode.upgrade.domain.script.file.name > /etc/hadoop/conf/upgrade-domain.sh > > {noformat} > just like topology script, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212876#comment-15212876 ] Hudson commented on HDFS-9694: -- FAILURE: Integrated in Hadoop-trunk-Commit #9503 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9503/]) HDFS-9694. Make existing DFSClient#getFileChecksum() work for striped (uma.gangumalla: rev e5ff0ea7ba087984262f1f27200ae5bb40d9b838) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java > Make existing DFSClient#getFileChecksum() work for striped blocks > - > > Key: HDFS-9694 > URL: https://issues.apache.org/jira/browse/HDFS-9694 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9694-v1.patch, HDFS-9694-v2.patch, > HDFS-9694-v3.patch, HDFS-9694-v4.patch, HDFS-9694-v5.patch, > HDFS-9694-v6.patch, HDFS-9694-v7.patch, HDFS-9694-v8.patch, HDFS-9694-v9.patch > > > This is a sub-task of HDFS-8430 and will get the existing API > {{FileSystem#getFileChecksum(path)}} work for striped files. It will also > refactor existing codes and layout basic work for subsequent tasks like > support of the new API proposed there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213099#comment-15213099 ] Hudson commented on HDFS-9694: -- FAILURE: Integrated in Hadoop-trunk-Commit #9504 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9504/]) Revert "HDFS-9694. Make existing DFSClient#getFileChecksum() work for (arp: rev a337ceb74e984991dbf976236d2e785cf5921b16) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml > Make existing DFSClient#getFileChecksum() work for striped blocks > - > > Key: HDFS-9694 > URL: https://issues.apache.org/jira/browse/HDFS-9694 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9694-v1.patch, HDFS-9694-v2.patch, > HDFS-9694-v3.patch, HDFS-9694-v4.patch, HDFS-9694-v5.patch, > HDFS-9694-v6.patch, HDFS-9694-v7.patch, HDFS-9694-v8.patch, HDFS-9694-v9.patch > > > This is a sub-task of HDFS-8430 and will get the existing API > {{FileSystem#getFileChecksum(path)}} work for striped files. It will also > refactor existing codes and layout basic work for subsequent tasks like > support of the new API proposed there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213302#comment-15213302 ] Hudson commented on HDFS-9694: -- FAILURE: Integrated in Hadoop-trunk-Commit #9505 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9505/]) HDFS-9694. Make existing DFSClient#getFileChecksum() work for striped (uma.gangumalla: rev 3a4ff7776e8fab6cc87932b9aa8fb48f7b69c720) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/StripedBlockInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto > Make existing DFSClient#getFileChecksum() work for striped blocks > - > > Key: HDFS-9694 > URL: https://issues.apache.org/jira/browse/HDFS-9694 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9694-v1.patch, HDFS-9694-v2.patch, > HDFS-9694-v3.patch, HDFS-9694-v4.patch, HDFS-9694-v5.patch, > HDFS-9694-v6.patch, HDFS-9694-v7.patch, HDFS-9694-v8.patch, HDFS-9694-v9.patch > > > This is a sub-task of HDFS-8430 and will get the existing API > {{FileSystem#getFileChecksum(path)}} work for striped files. It will also > refactor existing codes and layout basic work for subsequent tasks like > support of the new API proposed there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9640) Remove hsftp from DistCp in trunk
[ https://issues.apache.org/jira/browse/HDFS-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213904#comment-15213904 ] Hudson commented on HDFS-9640: -- FAILURE: Integrated in Hadoop-trunk-Commit #9508 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9508/]) HDFS-9640. Remove hsftp from DistCp in trunk. Contributed by Wei-Chiu (aajisaka: rev 18c7e582839ea0b550463569b18b5827d23f8849) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm > Remove hsftp from DistCp in trunk > - > > Key: HDFS-9640 > URL: https://issues.apache.org/jira/browse/HDFS-9640 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: distcp >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 3.0.0 > > Attachments: HDFS-9640.001.patch, HDFS-9640.002.patch, > HDFS-9640.003.patch > > > Per discussion in HDFS-9638, > after HDFS-5570, hftp/hsftp are removed from Hadoop 3.0.0. But DistCp still > makes reference to hsftp via parameter -mapredSslConf. This parameter would > be useless after Hadoop 3.0.0; therefore it should be removed, and then > document the changes. > This JIRA is intended to track the status of the code/docs change involving > the removal of hsftp in DistCp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213951#comment-15213951 ] Hudson commented on HDFS-10182: --- FAILURE: Integrated in Hadoop-trunk-Commit #9510 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9510/]) HDFS-10182. Hedged read might overwrite user's buf. Contributed by (waltersu4549: rev d8383c687c95dbb37effa307ab2d41497da1cfc2) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Fix For: 2.7.3 > > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9871) "Bytes Being Moved" -ve(-1 B) when cluster was already balanced.
[ https://issues.apache.org/jira/browse/HDFS-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215420#comment-15215420 ] Hudson commented on HDFS-9871: -- FAILURE: Integrated in Hadoop-trunk-Commit #9517 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9517/]) HDFS-9871. "Bytes Being Moved" -ve(-1 B) when cluster was already (vinayakumarb: rev 1f004b3367c57de9e8a67040a57efc31c9ba8ee2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java > "Bytes Being Moved" -ve(-1 B) when cluster was already balanced. > > > Key: HDFS-9871 > URL: https://issues.apache.org/jira/browse/HDFS-9871 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9871-002.patch, HDFS-9871.patch > > > Run balancer when there is no {{over}} and {{under}} utlized nodes. > {noformat} > 16/02/29 02:39:40 INFO net.NetworkTopology: Adding a new node: > /default-rack/**.120:50076 > 16/02/29 02:39:40 INFO net.NetworkTopology: Adding a new node: > /default-rack/**.121:50076 > 16/02/29 02:39:40 INFO net.NetworkTopology: Adding a new node: > /default-rack/**.122:50076 > 16/02/29 02:39:41 INFO balancer.Balancer: 0 over-utilized: [] > 16/02/29 02:39:41 INFO balancer.Balancer: 0 underutilized: [] > The cluster is balanced. Exiting... > Feb 29, 2016 2:40:57 AM 0 0 B 0 B > -1 B > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10197) TestFsDatasetCache failing intermittently due to timeout
[ https://issues.apache.org/jira/browse/HDFS-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216707#comment-15216707 ] Hudson commented on HDFS-10197: --- FAILURE: Integrated in Hadoop-trunk-Commit #9520 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9520/]) HDFS-10197. TestFsDatasetCache failing intermittently due to timeout. (wang: rev f2aec4eb824647a01e14b4eede03af0babe65fb6) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java > TestFsDatasetCache failing intermittently due to timeout > > > Key: HDFS-10197 > URL: https://issues.apache.org/jira/browse/HDFS-10197 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0 > > Attachments: HDFS-10197.001.patch, HDFS-10197.002.patch > > > In {{TestFsDatasetCache}}, the unit tests failed sometimes. I collected some > failed reason in recent jenkins reports. They are all timeout errors. > {code} > Tests in error: > TestFsDatasetCache.testFilesExceedMaxLockedMemory:378 ? Timeout Timed out > wait... > TestFsDatasetCache.tearDown:149 ? Timeout Timed out waiting for condition. > Thr... > {code} > {code} > Tests in error: > TestFsDatasetCache.testPageRounder:474 ? test timed out after 6 > milliseco... > TestBalancer.testUnknownDatanodeSimple:1040->testUnknownDatanode:1098 ? > test ... > {code} > But there was a little different between these failure. > * The first because the total block time was exceed the > {{waitTimeMillis}}(here is 60s) then throw the timeout exception and print > thread diagnostic string in method {{DFSTestUtil#verifyExpectedCacheUsage}}. > {code} > long st = Time.now(); > do { > boolean result = check.get(); > if (result) { > return; > } > > Thread.sleep(checkEveryMillis); > } while (Time.now() - st < waitForMillis); > > throw new TimeoutException("Timed out waiting for condition. " + > "Thread diagnostics:\n" + > TimedOutTestsListener.buildThreadDiagnosticString()); > {code} > * The second is due to test elapsed time more than timeout time setting. Like > in {{TestFsDatasetCache#testPageRounder}}. > We should adjust timeout time for these unit test which would failed > sometimes due to timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9478) Reason for failing ipc.FairCallQueue contruction should be thrown
[ https://issues.apache.org/jira/browse/HDFS-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216793#comment-15216793 ] Hudson commented on HDFS-9478: -- FAILURE: Integrated in Hadoop-trunk-Commit #9521 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9521/]) HDFS-9478. Reason for failing ipc.FairCallQueue contruction should be (arp: rev 46a5245db95f2aad199100d2886381398070124f) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java > Reason for failing ipc.FairCallQueue contruction should be thrown > - > > Key: HDFS-9478 > URL: https://issues.apache.org/jira/browse/HDFS-9478 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Archana T >Assignee: Ajith S >Priority: Minor > Fix For: 2.7.3 > > Attachments: HDFS-9478.2.patch, HDFS-9478.3.patch, HDFS-9478.patch > > > When FairCallQueue Construction fails, NN fails to start throwing > RunTimeException without throwing any reason on why it fails. > 2015-11-30 17:45:26,661 INFO org.apache.hadoop.ipc.FairCallQueue: > FairCallQueue is in use with 4 queues. > 2015-11-30 17:45:26,665 DEBUG org.apache.hadoop.metrics2.util.MBeans: > Registered Hadoop:service=ipc.65110,name=DecayRpcScheduler > 2015-11-30 17:45:26,666 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > java.lang.RuntimeException: org.apache.hadoop.ipc.FairCallQueue could not be > constructed. > at > org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:96) > at org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:55) > at org.apache.hadoop.ipc.Server.(Server.java:2241) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:942) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:784) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:346) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:750) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:687) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:889) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:872) > Example: reason for above failure could have been -- > 1. the weights were not equal to the number of queues configured. > 2. decay-scheduler.thresholds not in sync with number of queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10228) TestHDFSCLI fails
[ https://issues.apache.org/jira/browse/HDFS-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216851#comment-15216851 ] Hudson commented on HDFS-10228: --- FAILURE: Integrated in Hadoop-trunk-Commit #9522 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9522/]) HDFS-10228. TestHDFSCLI fails. (Contributed by Akira AJISAKA) (arp: rev c9307e48b71d2e293be1b5517d12be959222c5fe) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml > TestHDFSCLI fails > - > > Key: HDFS-10228 > URL: https://issues.apache.org/jira/browse/HDFS-10228 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.8.0 > > Attachments: HDFS-10228.01.patch, TestHDFSCLI-detail.txt > > > TestHDFSCLI fails. > {noformat} > 2016-03-29 19:52:05,239 [main] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(231)) - Failing tests: > 2016-03-29 19:52:05,239 [main] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(232)) - -- > 2016-03-29 19:52:05,239 [main] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(238)) - 68: mv: file (absolute path) to > file (relative path) > 2016-03-29 19:52:05,240 [main] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(238)) - 103: cp: copying non existent file > (relative path) > 2016-03-29 19:52:05,240 [main] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(238)) - 295: touchz: touching file in > non-existent directory > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9439) Include status of closeAck into exception message in DataNode#run
[ https://issues.apache.org/jira/browse/HDFS-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216852#comment-15216852 ] Hudson commented on HDFS-9439: -- FAILURE: Integrated in Hadoop-trunk-Commit #9522 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9522/]) HDFS-9439. Support reconfiguring fs.protected.directories without NN (arp: rev ddfe6774c21c8ccf5582a05bb0b58e961bbec309) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestProtectedDirectories.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java > Include status of closeAck into exception message in DataNode#run > - > > Key: HDFS-9439 > URL: https://issues.apache.org/jira/browse/HDFS-9439 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Trivial > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9439.001.patch > > > When {{closeAck.getStatus}} is not {{SUCCESS}} and an IOException is thrown, > the status is not included in the message, which makes it harder to > investigate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5177) blocksScheduled count should be decremented for abandoned blocks
[ https://issues.apache.org/jira/browse/HDFS-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217519#comment-15217519 ] Hudson commented on HDFS-5177: -- FAILURE: Integrated in Hadoop-trunk-Commit #9526 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9526/]) HDFS-5177. blocksScheduled count should be decremented for abandoned (vinayakumarb: rev 09d63d5a192b5d6b172f94ff6c94da348fd49ea6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlocksScheduledCounter.java > blocksScheduled count should be decremented for abandoned blocks > - > > Key: HDFS-5177 > URL: https://issues.apache.org/jira/browse/HDFS-5177 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.8.0 > > Attachments: HDFS-5177-04.patch, HDFS-5177-05.patch, HDFS-5177.patch, > HDFS-5177.patch, HDFS-5177.patch > > > DatanodeDescriptor#incBlocksScheduled() will be called for all datanodes of > the block on each allocation. But same should be decremented for abandoned > blocks. > When one of the datanodes is down and same is allocated for the block along > with other live datanodes, then this block will be abandoned, but the > scheduled count on other datanodes will consider live datanodes as loaded, > but in reality these datanodes may not be loaded. > Anyway this scheduled count will be rolled every 20 mins. > Problem will come if the rate of creation of files is more. Due to increase > in the scheduled count, there might be chances of missing local datanode to > write to. and some times writes also can fail in small clusters. > So we need to decrement the unnecessary count on abandon block call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10223) peerFromSocketAndKey performs SASL exchange before setting connection timeouts
[ https://issues.apache.org/jira/browse/HDFS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218863#comment-15218863 ] Hudson commented on HDFS-10223: --- FAILURE: Integrated in Hadoop-trunk-Commit #9528 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9528/]) HDFS-10223. peerFromSocketAndKey performs SASL exchange before setting (cmccabe: rev 37e23ce45c592f3c9c48a08a52a5f46787f6c0e9) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java > peerFromSocketAndKey performs SASL exchange before setting connection timeouts > -- > > Key: HDFS-10223 > URL: https://issues.apache.org/jira/browse/HDFS-10223 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.4 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.8.0 > > Attachments: HDFS-10223.001.patch, HDFS-10223.002.patch > > > {{peerFromSocketAndKey}} performs the SASL exchange before setting up > connection timeouts. Because of this, the timeout used for setting up SASL > connections is the default system-wide TCP timeout, which is usually several > hours long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10221) Add .json to the rat exclusions
[ https://issues.apache.org/jira/browse/HDFS-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219102#comment-15219102 ] Hudson commented on HDFS-10221: --- FAILURE: Integrated in Hadoop-trunk-Commit #9530 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9530/]) HDFS-10221. Add .json to the rat exclusions. Contributed by Ming Ma. (aajisaka: rev 32c0c3ecdf72e89a63f4aee5e75d1c5a12714b89) * hadoop-hdfs-project/hadoop-hdfs/pom.xml > Add .json to the rat exclusions > --- > > Key: HDFS-10221 > URL: https://issues.apache.org/jira/browse/HDFS-10221 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.8.0 >Reporter: Ming Ma >Assignee: Ming Ma >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HDFS-10221-2.patch, HDFS-10221.patch > > > A new test resource dfs.hosts.json was added to HDFS-9005 for better > readability. Given json file doesn't allow comments, it violates ASF license. > To address this, we can add the file to rat exclusions list in the pom.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10253) Fix TestRefreshCallQueue failure.
[ https://issues.apache.org/jira/browse/HDFS-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223163#comment-15223163 ] Hudson commented on HDFS-10253: --- FAILURE: Integrated in Hadoop-trunk-Commit #9544 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9544/]) HDFS-10253. Fix TestRefreshCallQueue failure (Contributed by Xiaoyu Yao) (vinayakumarb: rev 54b2e78fd28c9def42bec7f0418833bad352686c) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/TestRefreshCallQueue.java > Fix TestRefreshCallQueue failure. > - > > Key: HDFS-10253 > URL: https://issues.apache.org/jira/browse/HDFS-10253 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-10253.00.patch > > > *Jenkins link* > https://builds.apache.org/job/PreCommit-HDFS-Build/15041/testReport/ > *Trace* > {noformat} > java.lang.RuntimeException: > org.apache.hadoop.TestRefreshCallQueue$MockCallQueue could not be constructed. > at > org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:164) > at > org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:70) > at org.apache.hadoop.ipc.Server.(Server.java:2579) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:421) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:759) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:701) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:900) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:879) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1596) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441) > at > org.apache.hadoop.TestRefreshCallQueue.setUp(TestRefreshCallQueue.java:71) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224757#comment-15224757 ] Hudson commented on HDFS-9599: -- FAILURE: Integrated in Hadoop-trunk-Commit #9551 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9551/]) HDFS-9599. TestDecommissioningStatus.testDecommissionStatus occasionally (iwasakims: rev 154d2532cf015e9ab9141864bd3ab0d6100ef597) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java > TestDecommissioningStatus.testDecommissionStatus occasionally fails > --- > > Key: HDFS-9599 > URL: https://issues.apache.org/jira/browse/HDFS-9599 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Lin Yiqun > Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch > > > From test result of a recent jenkins nightly > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/ > The test failed because the number of under replicated blocks is 4, instead > of 3. > Looking at the log, there is a strayed block, which might have caused the > faillure: > {noformat} > 2015-12-23 00:42:05,820 [Block report processor] INFO BlockStateChange > (BlockManager.java:processReport(2131)) - BLOCK* processReport: > blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any > file > {noformat} > The block size 16384 suggests this is left over from the sibling test case > testDecommissionStatusAfterDNRestart. This can happen, because the same > minidfs cluster is reused between tests. > The test implementation should do a better job isolating tests. > Another case of failure is when the load factor comes into play, and a block > can not find sufficient data nodes to place replica. In this test, the > runtime should not consider load factor: > {noformat} > conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, > false); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet
[ https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225148#comment-15225148 ] Hudson commented on HDFS-10178: --- FAILURE: Integrated in Hadoop-trunk-Commit #9552 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9552/]) HDFS-10178. Permanent write failures can happen if pipeline recoveries (kihwal: rev a7d1fb0cd2fdbf830602eb4dbbd9bbe62f4d5584) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java > Permanent write failures can happen if pipeline recoveries occur for the > first packet > - > > Key: HDFS-10178 > URL: https://issues.apache.org/jira/browse/HDFS-10178 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, > HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch > > > We have observed that write fails permanently if the first packet doesn't go > through properly and pipeline recovery happens. If the write op creates a > pipeline, but the actual data packet does not reach one or more datanodes in > time, the pipeline recovery will be done against the 0-byte partial block. > If additional datanodes are added, the block is transferred to the new nodes. > After the transfer, each node will have a meta file containing the header > and 0-length data block file. The pipeline recovery seems to work correctly > up to this point, but write fails when actual data packet is resent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225458#comment-15225458 ] Hudson commented on HDFS-8496: -- FAILURE: Integrated in Hadoop-trunk-Commit #9554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9554/]) HDFS-8496. Calling stopWriter() with FSDatasetImpl lock held may block (cmccabe: rev f6b1a818124cc42688c4c5acaf537d96cf00e43b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java > Calling stopWriter() with FSDatasetImpl lock held may block other threads > - > > Key: HDFS-8496 > URL: https://issues.apache.org/jira/browse/HDFS-8496 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: zhouyingchao >Assignee: Colin Patrick McCabe > Fix For: 2.8.0 > > Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, > HDFS-8496.003.patch, HDFS-8496.004.patch > > > On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and > heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By > looking at the stack, we found the calling of stopWriter() with FSDatasetImpl > lock blocked everything. > Following is the heartbeat stack, as an example, to show how threads are > blocked by FSDatasetImpl lock: > {code} >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) > - waiting to lock <0x0007701badc0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) > - locked <0x000770465dc0> (a java.lang.Object) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) > at java.lang.Thread.run(Thread.java:662) > {code} > The thread which held the FSDatasetImpl lock is just sleeping to wait another > thread to exit in stopWriter(). The stack is: > {code} >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1194) > - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon) > at > org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) > - locked <0x0007701badc0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:662) > {code} > In this case, we deployed quite a lot other workloads on the DN, the local > file system and disk is quite busy. We guess this is why the stopWriter took > quite a long time. > Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl > lock held. In HDFS-7999, the createTemporary() is changed to call > stopWriter without FSDatasetImpl lock. We guess we should do so in the other > three methods: recoverClose()/recoverAppend/recoverRbw(). > I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225529#comment-15225529 ] Hudson commented on HDFS-9917: -- FAILURE: Integrated in Hadoop-trunk-Commit #9555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9555/]) HDFS-9917. IBR accumulate more objects when SNN was down for sometime. (vinayakumarb: rev 818d6b799eead13a17a0214172df60a269b046fb) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java > IBR accumulate more objects when SNN was down for sometime. > --- > > Key: HDFS-9917 > URL: https://issues.apache.org/jira/browse/HDFS-9917 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, > HDFS-9917-branch-2.7.patch, HDFS-9917.patch > > > SNN was down for sometime because of some reasons..After restarting SNN,it > became unreponsive because > - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), > where as each datanode had only ~2.5 million blocks. > - GC can't trigger on this objects since all will be under RPC queue. > To recover this( to clear this objects) ,restarted all the DN's one by > one..This issue happened in 2.4.1 where split of blockreport was not > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10239) Fsshell mv fails if port usage doesn't match in src and destination paths
[ https://issues.apache.org/jira/browse/HDFS-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226324#comment-15226324 ] Hudson commented on HDFS-10239: --- FAILURE: Integrated in Hadoop-trunk-Commit #9559 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9559/]) HDFS-10239. Fsshell mv fails if port usage doesn't match in src and (kihwal: rev 917464505c0e930ebeb4c775d829e51c56a48686) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/MoveCommands.java > Fsshell mv fails if port usage doesn't match in src and destination paths > - > > Key: HDFS-10239 > URL: https://issues.apache.org/jira/browse/HDFS-10239 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-10239.001.patch, HDFS-10239.002.patch > > > If one of the src or destination fs URIs does not contain the port while the > other one does, the MoveCommands#processPath preemptively throws > PathIOException "Does not match target filesystem". > eg. > {code} > -bash-4.1$ hadoop fs -mv hdfs://localhost:8020/tmp/foo3 > hdfs://localhost/tmp/foo4 > mv: `hdfs://localhost:8020:8020/tmp/foo3': Does not match target filesystem > {code} > This is due to strict string check in {{processPath}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time
[ https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227120#comment-15227120 ] Hudson commented on HDFS-10235: --- FAILURE: Integrated in Hadoop-trunk-Commit #9563 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9563/]) HDFS-10235. Last contact for Live Nodes should be relative time. (raviprak: rev 0cd320a8463efe19a6228f9fe14693aa37ac8a10) * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html > [NN UI] Last contact for Live Nodes should be relative time > --- > > Key: HDFS-10235 > URL: https://issues.apache.org/jira/browse/HDFS-10235 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-10235-002.patch, HDFS-10235.patch > > > Last contact for Live Nodes should be relative time and we can keep absolute > date for Dead Nodes. Relative time will be more convenient to know the last > contact , we no need to extra maths here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10261) TestBookKeeperHACheckpoints doesn't handle ephemeral HTTP ports
[ https://issues.apache.org/jira/browse/HDFS-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227176#comment-15227176 ] Hudson commented on HDFS-10261: --- FAILURE: Integrated in Hadoop-trunk-Commit #9564 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9564/]) HDFS-10261. TestBookKeeperHACheckpoints doesn't handle ephemeral HTTP (kihwal: rev 9ba1e5af06070ba01dcf46e1a4c66713a1d43352) * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperHACheckpoints.java > TestBookKeeperHACheckpoints doesn't handle ephemeral HTTP ports > --- > > Key: HDFS-10261 > URL: https://issues.apache.org/jira/browse/HDFS-10261 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10261.001.patch, HDFS-10261.002.patch > > > The MiniDFSCluster HTTP ports are hard-coded to 10001 and 10002. This makes > it impossible to run these tests simultaneously and also allows for failures > if those ports are in use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10192) Namenode safemode not coming out during failover
[ https://issues.apache.org/jira/browse/HDFS-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228781#comment-15228781 ] Hudson commented on HDFS-10192: --- FAILURE: Integrated in Hadoop-trunk-Commit #9568 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9568/]) HDFS-10192. Namenode safemode not coming out during failover. (jing9: rev 221b3a8722f84f8e9ad0a98eea38a12cc4ad2f24) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Namenode safemode not coming out during failover > > > Key: HDFS-10192 > URL: https://issues.apache.org/jira/browse/HDFS-10192 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.9.0 > > Attachments: HDFS-10192-01.patch, HDFS-10192-02.patch, > HDFS-10192-03.patch > > > Scenario: > === > write some blocks > wait till roll edits happen > Stop SNN > Delete some blocks in ANN, wait till the blocks are deleted in DN also. > restart the SNN and Wait till block reports come from datanode to SNN > Kill ANN then make SNN to Active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6520) hdfs fsck passes invalid length value when creating BlockReader
[ https://issues.apache.org/jira/browse/HDFS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228940#comment-15228940 ] Hudson commented on HDFS-6520: -- FAILURE: Integrated in Hadoop-trunk-Commit #9569 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9569/]) HDFS-6520. hdfs fsck passes invalid length value when creating (cmccabe: rev 188f65287d5b2f26a8862c88198f83ac59035016) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java > hdfs fsck passes invalid length value when creating BlockReader > --- > > Key: HDFS-6520 > URL: https://issues.apache.org/jira/browse/HDFS-6520 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Shengjun Xin >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-6520-partial.001.patch, HDFS-6520.01.patch, > HDFS-6520.02.patch, john.test.patch > > > I met some error when I run fsck -move. > My steps are as the following: > 1. Set up a pseudo cluster > 2. Copy a file to hdfs > 3. Corrupt a block of the file > 4. Run fsck to check: > {code} > Connecting to namenode via http://localhost:50070 > FSCK started by hadoop (auth:SIMPLE) from /127.0.0.1 for path /user/hadoop at > Wed Jun 11 15:58:38 CST 2014 > . > /user/hadoop/fsck-test: CORRUPT blockpool > BP-654596295-10.37.7.84-1402466764642 block blk_1073741825 > /user/hadoop/fsck-test: MISSING 1 blocks of total size 1048576 B.Status: > CORRUPT > Total size:4104304 B > Total dirs:1 > Total files: 1 > Total symlinks:0 > Total blocks (validated): 4 (avg. block size 1026076 B) > > CORRUPT FILES:1 > MISSING BLOCKS: 1 > MISSING SIZE: 1048576 B > CORRUPT BLOCKS: 1 > > Minimally replicated blocks: 3 (75.0 %) > Over-replicated blocks:0 (0.0 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor:1 > Average block replication: 0.75 > Corrupt blocks:1 > Missing replicas: 0 (0.0 %) > Number of data-nodes: 1 > Number of racks: 1 > FSCK ended at Wed Jun 11 15:58:38 CST 2014 in 1 milliseconds > The filesystem under path '/user/hadoop' is CORRUPT > {code} > 5. Run fsck -move to move the corrupted file to /lost+found and the error > message in the namenode log: > {code} > 2014-06-11 15:48:16,686 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > FSCK started by hadoop (auth:SIMPLE) from /127.0.0.1 for path /user/hadoop at > Wed Jun 11 15:48:16 CST 2014 > 2014-06-11 15:48:16,894 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 35 > Total time for transactions(ms): 9 Number of transactions batched in Syncs: 0 > Number of syncs: 25 SyncTimes(ms): 73 > 2014-06-11 15:48:16,991 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Error reading block > java.io.IOException: Expected empty end-of-read packet! Header: PacketHeader > with packetLen=66048 header data: offsetInBlock: 65536 > seqno: 1 > lastPacketInBlock: false > dataLen: 65536 > at > org.apache.hadoop.hdfs.RemoteBlockReader2.readTrailingEmptyPacket(RemoteBlockReader2.java:259) > at > org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:220) > at > org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.copyBlock(NamenodeFsck.java:649) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.copyBlocksToLostFound(NamenodeFsck.java:543) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:460) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:324) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >
[jira] [Commented] (HDFS-9945) Datanode command for evicting writers
[ https://issues.apache.org/jira/browse/HDFS-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229087#comment-15229087 ] Hudson commented on HDFS-9945: -- FAILURE: Integrated in Hadoop-trunk-Commit #9570 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9570/]) HDFS-9945. Datanode command for evicting writers. Contributed by Kihwal (epayne: rev aede8c10ecad4f2a8802a834e4bd0b8286cebade) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientDatanodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java > Datanode command for evicting writers > - > > Key: HDFS-9945 > URL: https://issues.apache.org/jira/browse/HDFS-9945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9945.patch, HDFS-9945.v2.patch > > > It will be useful if there is a command to evict writers from a datanode. > When a set of datanodes are being decommissioned, they can get blocked by > slow writers at the end. It was rare in the old days since mapred jobs > didn't last too long, but with many different types of apps running on > today's YARN cluster, we are often see very long tail in datanode > decommissioning. > I propose a new dfsadmin command, {{evictWriters}}, to be added. I initially > thought about having namenode automatically telling datanodes on > decommissioning, but realized that having a command is more flexible. E.g. > users can choose not to do this at all, choose when to evict writers, or > whether to try multiple times for whatever reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10267) Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
[ https://issues.apache.org/jira/browse/HDFS-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229663#comment-15229663 ] Hudson commented on HDFS-10267: --- FAILURE: Integrated in Hadoop-trunk-Commit #9573 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9573/]) HDFS-10267. Extra "synchronized" on FsDatasetImpl#recoverAppend and (cmccabe: rev 4bd7cbc29d142fc56324156333b9a8a7d7b68042) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java > Extra "synchronized" on FsDatasetImpl#recoverAppend and > FsDatasetImpl#recoverClose > -- > > Key: HDFS-10267 > URL: https://issues.apache.org/jira/browse/HDFS-10267 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.8.0 > > Attachments: HDFS-10267.001.patch, HDFS-10267.002.patch, > HDFS-10267.003.patch, HDFS-10267.004.patch > > > There is an extra "synchronized" on FsDatasetImpl#recoverAppend and > FsDatasetImpl#recoverClose that prevents the HDFS-8496 fix from working as > intended. This should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories
[ https://issues.apache.org/jira/browse/HDFS-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229693#comment-15229693 ] Hudson commented on HDFS-10186: --- FAILURE: Integrated in Hadoop-trunk-Commit #9574 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9574/]) HDFS-10186. DirectoryScanner: Improve logs by adding full path of both (szetszwo: rev 654cd1d0c0427c23e73804fc9d87208f76bbf6aa) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java > DirectoryScanner: Improve logs by adding full path of both actual and > expected block directories > > > Key: HDFS-10186 > URL: https://issues.apache.org/jira/browse/HDFS-10186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Minor > Fix For: 2.7.3 > > Attachments: HDFS-10186-001.patch > > > As per the > [discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=15195908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15195908], > this jira is to improve directory scanner log by adding the wrong and > correct directory path so that admins can take necessary actions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9719) Refactoring ErasureCodingWorker into smaller reusable constructs
[ https://issues.apache.org/jira/browse/HDFS-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229763#comment-15229763 ] Hudson commented on HDFS-9719: -- FAILURE: Integrated in Hadoop-trunk-Commit #9575 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9575/]) HDFS-9719. Refactoring ErasureCodingWorker into smaller reusable (uma.gangumalla: rev 3c18a53cbd2efabb2ad108d63a0b0b558424115f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockWriter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockReader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/package-info.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedReconstructor.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReconstructStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedWriter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedReader.java > Refactoring ErasureCodingWorker into smaller reusable constructs > > > Key: HDFS-9719 > URL: https://issues.apache.org/jira/browse/HDFS-9719 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9719-v1.patch, HDFS-9719-v2.patch, > HDFS-9719-v3.patch, HDFS-9719-v4.patch, HDFS-9719-v5.patch, > HDFS-9719-v6.patch, HDFS-9719-v7.patch, HDFS-9719-v8.patch, HDFS-9719-v9.patch > > > This would suggest and refactor {{ErasureCodingWorker}} into smaller > constructs to be reused in other places like block group checksum computing > in datanode side. As discussed in HDFS-8430 and implemented in HDFS-9694 > patch, checksum computing for striped block groups would be distributed to > datanode in the group, where data block data should be able to be > reconstructed when missed/corrupted to recompute the block checksum. The most > needed codes are in the current ErasureCodingWorker and could be reused in > order to avoid duplication. Fortunately, we have very good and complete > tests, which would make the refactoring much easier. The refactoring will > also help a lot for subsequent tasks in phase II for non-striping erasure > coded files and blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10266) Remove unused properties dfs.client.file-block-storage-locations.num-threads and dfs.client.file-block-storage-locations.timeout.millis
[ https://issues.apache.org/jira/browse/HDFS-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231741#comment-15231741 ] Hudson commented on HDFS-10266: --- FAILURE: Integrated in Hadoop-trunk-Commit #9577 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9577/]) HDFS-10266. Remove unused properties (aajisaka: rev 9c32f8785e8d7957e3f8a3946cfd15a4d5c82fec) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java > Remove unused properties dfs.client.file-block-storage-locations.num-threads > and dfs.client.file-block-storage-locations.timeout.millis > --- > > Key: HDFS-10266 > URL: https://issues.apache.org/jira/browse/HDFS-10266 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: newbie > Fix For: 3.0.0 > > Attachments: HDFS-10266.001.patch > > > The properties: > dfs.client.file-block-storage-locations.num-threads > dfs.client.file-block-storage-locations.timeout.millis > exist in DfsConfigKeys and HdfsClientConfigKeys but nowhere else. It should > be safe to remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10271) Extra bytes are getting released from reservedSpace for append
[ https://issues.apache.org/jira/browse/HDFS-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236682#comment-15236682 ] Hudson commented on HDFS-10271: --- FAILURE: Integrated in Hadoop-trunk-Commit #9596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9596/]) HDFS-10271. Extra bytes are getting released from reservedSpace for (vinayakumarb: rev a9a607f8fc0d996af3fb37f7efa7591d6655900d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestSpaceReservation.java > Extra bytes are getting released from reservedSpace for append > -- > > Key: HDFS-10271 > URL: https://issues.apache.org/jira/browse/HDFS-10271 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.3, 2.6.5 > > Attachments: HDFS-10271-01.patch, HDFS-10271-branch-2.7-01.patch > > > 1. File already have some bytes available in block. (ex: 1024B) > 2. Re-open the file for append, (Here reserving for (BlockSize-1024) bytes) > 3. write one byte and flush, > 4. close() > After close(), releasing *BlockSize-1* bytes from reservedspace instead of > *BlockSize-1025* bytes. > Extra bytes released from reservedSpace may create problem for other writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10277) PositionedReadable test testReadFullyZeroByteFile failing in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236683#comment-15236683 ] Hudson commented on HDFS-10277: --- FAILURE: Integrated in Hadoop-trunk-Commit #9596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9596/]) HDFS-10277. PositionedReadable test testReadFullyZeroByteFile failing in (aajisaka: rev a409508b3f4c46b419c41b9cdff83429d9d025ce) * hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > PositionedReadable test testReadFullyZeroByteFile failing in HDFS > - > > Key: HDFS-10277 > URL: https://issues.apache.org/jira/browse/HDFS-10277 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.8.0 > > Attachments: HDFS-10277-001.patch > > > Jenkins is failing after HADOOP-12994, > {{che.hadoop.fs.contract.AbstractContractSeekTest.testReadFullyZeroByteFile(AbstractContractSeekTest.java:373)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8356) Document missing properties in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236681#comment-15236681 ] Hudson commented on HDFS-8356: -- FAILURE: Integrated in Hadoop-trunk-Commit #9596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9596/]) HDFS-8356. Document missing properties in hdfs-default.xml. Contributed (aajisaka: rev 209303be3a4d038b420ae3c11230d419bba07575) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md > Document missing properties in hdfs-default.xml > --- > > Key: HDFS-8356 > URL: https://issues.apache.org/jira/browse/HDFS-8356 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.0 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability, test > Attachments: HDFS-8356.001.patch, HDFS-8356.002.patch, > HDFS-8356.003.patch, HDFS-8356.004.patch, HDFS-8356.005.patch, > HDFS-8356.006.patch, HDFS-8356.007.patch, HDFS-8356.008.patch, > HDFS-8356.009.patch, HDFS-8356.010.patch, HDFS-8356.011.patch, > HDFS-8356.012.patch, HDFS-8356.013.patch, HDFS-8356.014.branch-2.patch, > HDFS-8356.014.patch, HDFS-8356.015.branch-2.patch, HDFS-8356.015.patch > > > The following properties are currently not defined in hdfs-default.xml. These > properties should either be > A) documented in hdfs-default.xml OR > B) listed as an exception (with comments, e.g. for internal use) in the > TestHdfsConfigFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10273) Remove duplicate logSync() and log message in FSN#enterSafemode()
[ https://issues.apache.org/jira/browse/HDFS-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237672#comment-15237672 ] Hudson commented on HDFS-10273: --- FAILURE: Integrated in Hadoop-trunk-Commit #9598 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9598/]) HDFS-10273. Remove duplicate logSync() and log message in (cmccabe: rev 600d129bb8d52ae820e7b74b8ce363aabd69d25c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Remove duplicate logSync() and log message in FSN#enterSafemode() > - > > Key: HDFS-10273 > URL: https://issues.apache.org/jira/browse/HDFS-10273 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.9.0 > > Attachments: HDFS-10273-01.patch > > > Remove duplicate logSync() and log message in FSN#enterSafemode() > {code:title=FSN#enterSafemode(..)} > // Before Editlog is in OpenForWrite mode, editLogStream will be null. > So, > // logSyncAll call can be called only when Edlitlog is in OpenForWrite > mode > if (isEditlogOpenForWrite) { > getEditLog().logSyncAll(); > } > setManualAndResourceLowSafeMode(!resourcesLow, resourcesLow); > NameNode.stateChangeLog.info("STATE* Safe mode is ON.\n" + > getSafeModeTip()); > if (isEditlogOpenForWrite) { > getEditLog().logSyncAll(); > } > NameNode.stateChangeLog.info("STATE* Safe mode is ON" + > getSafeModeTip()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238032#comment-15238032 ] Hudson commented on HDFS-9918: -- FAILURE: Integrated in Hadoop-trunk-Commit #9599 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9599/]) HDFS-9918. Erasure Coding: Sort located striped blocks based on (zhz: rev 6ef42873a02bfcbff5521869f4d6f66539d1db41) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSortLocatedStripedBlock.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.0.0 > > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch, > HDFS-9918-009.patch, HDFS-9918-010.patch, HDFS-9918-011.patch, > HDFS-9918-012.patch, HDFS-9918-013.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239018#comment-15239018 ] Hudson commented on HDFS-9772: -- FAILURE: Integrated in Hadoop-trunk-Commit #9602 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9602/]) HDFS-9772. TestBlockReplacement#testThrottler doesn't work as expected. (waltersu4549: rev 903428bf946827b4d58c7c577ed0c574a7cff029) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java > TestBlockReplacement#testThrottler doesn't work as expected > --- > > Key: HDFS-9772 > URL: https://issues.apache.org/jira/browse/HDFS-9772 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Labels: test > Fix For: 2.7.3 > > Attachments: HDFS.001.patch > > > In {{TestBlockReplacement#testThrottler}}, it use a fault variable to > calculate the ended bandwidth. It use variable {{totalBytes}} rathe than > final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to > {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make > {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. > The method code is below: > {code} > @Test > public void testThrottler() throws IOException { > Configuration conf = new HdfsConfiguration(); > FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); > long bandwidthPerSec = 1024*1024L; > final long TOTAL_BYTES =6*bandwidthPerSec; > long bytesToSend = TOTAL_BYTES; > long start = Time.monotonicNow(); > DataTransferThrottler throttler = new > DataTransferThrottler(bandwidthPerSec); > long totalBytes = 0L; > long bytesSent = 1024*512L; // 0.5MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > bytesSent = 1024*768L; // 0.75MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > try { > Thread.sleep(1000); > } catch (InterruptedException ignored) {} > throttler.throttle(bytesToSend); > long end = Time.monotonicNow(); > assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10270) TestJMXGet:testNameNode() fails
[ https://issues.apache.org/jira/browse/HDFS-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239566#comment-15239566 ] Hudson commented on HDFS-10270: --- FAILURE: Integrated in Hadoop-trunk-Commit #9603 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9603/]) HDFS-10270. TestJMXGet:testNameNode() fails. Contributed by Gergely (kihwal: rev d2f3bbc29046435904ad9418073795439c71b441) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestJMXGet.java > TestJMXGet:testNameNode() fails > --- > > Key: HDFS-10270 > URL: https://issues.apache.org/jira/browse/HDFS-10270 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.8.0 >Reporter: Andras Bokor >Assignee: Gergely Novák >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10270.001.patch, TestJMXGet.log, TestJMXGetFails.log > > > It fails with java.util.concurrent.TimeoutException. Actually the problem > here is that we expect 2 as NumOpenConnections metric but it is only 1. So > the test waits 60 sec then fails. > Please find maven output so the stack trace attached ([^TestJMXGetFails.log]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240245#comment-15240245 ] Hudson commented on HDFS-10209: --- FAILURE: Integrated in Hadoop-trunk-Commit #9605 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9605/]) HDFS-10209. Support enable caller context in HDFS namenode audit log (xyao: rev 192112d5a2e7ce4ec8eb47e21ab744b34c848893) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-10209-HDFS-9000.000.patch, > HDFS-10209-HDFS-9000.001.patch > > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10279) Improve validation of the configured number of tolerated failed volumes
[ https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240295#comment-15240295 ] Hudson commented on HDFS-10279: --- SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9606/]) HDFS-10279. Improve validation of the configured number of tolerated (wang: rev 314aa21a89134fac68ac3cb95efdeb56bd3d7b05) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java > Improve validation of the configured number of tolerated failed volumes > --- > > Key: HDFS-10279 > URL: https://issues.apache.org/jira/browse/HDFS-10279 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0 > > Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch > > > Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are > detected too late and not easily be found. We can move the validation logic > for tolerated volumes to a eariler time that before datanode regists to > namenode. And this will let us detect the misconfiguration soon and easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240296#comment-15240296 ] Hudson commented on HDFS-10209: --- SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9606/]) Revert "HDFS-10209. Support enable caller context in HDFS namenode audit (xyao: rev 4895c73dd493a53eab43f0d16e92c19af15c460b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java HDFS-10209. Support enable caller context in HDFS namenode audit log (xyao: rev 5566177c9af913baf380811dbbb1fa7e70235491) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10209-HDFS-9000.000.patch, > HDFS-10209-HDFS-9000.001.patch > > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10282) The VolumeScanner should warn about replica files which are misplaced
[ https://issues.apache.org/jira/browse/HDFS-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241136#comment-15241136 ] Hudson commented on HDFS-10282: --- FAILURE: Integrated in Hadoop-trunk-Commit #9611 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9611/]) HDFS-10282. The VolumeScanner should warn about replica files which are (kihwal: rev 0d1c1152f1ce2706f92109bfbdff0d62e98e6797) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/FsDatasetTestUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/VolumeScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java > The VolumeScanner should warn about replica files which are misplaced > - > > Key: HDFS-10282 > URL: https://issues.apache.org/jira/browse/HDFS-10282 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.9.0 > > Attachments: HDFS-10282.001.patch, HDFS-10282.002.patch > > > The VolumeScanner should warn about replica files which are misplaced -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10280) Document new dfsadmin command -evictWriters
[ https://issues.apache.org/jira/browse/HDFS-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241629#comment-15241629 ] Hudson commented on HDFS-10280: --- FAILURE: Integrated in Hadoop-trunk-Commit #9612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9612/]) HDFS-10280. Document new dfsadmin command -evictWriters. Contributed by (kihwal: rev c970f1d00525e4273075cff7406dcbd71305abd5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Document new dfsadmin command -evictWriters > --- > > Key: HDFS-10280 > URL: https://issues.apache.org/jira/browse/HDFS-10280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10280.001.patch > > > HDFS-9945 added a new dfsadmin command -evictWriters, which is great. > However I noticed typing {{dfs dfsadmin}} does not show a command line help > summary. It is shown only when I type {{dfs dfsadmin -help}}. > Also, it would be great to document it in {{HDFS Commands Guide}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10216) distcp -diff relative path exception
[ https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241630#comment-15241630 ] Hudson commented on HDFS-10216: --- FAILURE: Integrated in Hadoop-trunk-Commit #9612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9612/]) HDFS-10216. Distcp -diff throws exception when handling relative path. (jing9: rev 404f57f328b00a42ec8b952ad08cd7a80207c7f2) * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java > distcp -diff relative path exception > > > Key: HDFS-10216 > URL: https://issues.apache.org/jira/browse/HDFS-10216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Takashi Ohnishi > Fix For: 2.9.0 > > Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, > HDFS-10216.3.patch, HDFS-10216.4.patch > > > Got this exception when running {{distcp -diff}} with relative paths: > {code} > $ hadoop distcp -update -diff s1 s2 d1 d2 > 16/03/25 09:45:40 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], > targetPath=d2, targetPathExists=true, preserveRawXattrs=false, > filtersFile='null'} > 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at > jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032 > 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:197) > at > org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193) > at > org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243) > at > org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) > at > org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:123) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:436) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2 > at java.net.URI.checkPath(URI.java:1804) > at java.net.URI.(URI.java:752) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > ... 11 more > {code} > But theses commands worked: > * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 > /user/systest/d2}} > * No {{-diff}}: {{hadoop distcp -update d1 d2}} > However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 > d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. > Trying to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
[ https://issues.apache.org/jira/browse/HDFS-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241668#comment-15241668 ] Hudson commented on HDFS-10286: --- FAILURE: Integrated in Hadoop-trunk-Commit #9613 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9613/]) HDFS-10286. Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties. (xyao: rev 809226752dd109e16956038017dece16ada6ee0f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java > Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties > > > Key: HDFS-10286 > URL: https://issues.apache.org/jira/browse/HDFS-10286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10286.000.patch > > > HDFS-10209 introduced a new reconfigurable properties which requires an > update to the validation in > TestDFSAdmin#testNameNodeGetReconfigurableProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10292) Add block id when client got Unable to close file exception
[ https://issues.apache.org/jira/browse/HDFS-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241778#comment-15241778 ] Hudson commented on HDFS-10292: --- FAILURE: Integrated in Hadoop-trunk-Commit #9615 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9615/]) HDFS-10292. Add block id when client got Unable to close file exception. (kihwal: rev 2c155afe2736a5571bbb3bdfb2fe6f9709227229) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java > Add block id when client got Unable to close file exception > --- > > Key: HDFS-10292 > URL: https://issues.apache.org/jira/browse/HDFS-10292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-10292.patch > > > Add block id when client got Unable to close file exception,, It's good to > have block id, for better debugging purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10281) o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241905#comment-15241905 ] Hudson commented on HDFS-10281: --- FAILURE: Integrated in Hadoop-trunk-Commit #9616 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9616/]) HDFS-10281. TestPendingCorruptDnMessages fails intermittently. (kihwal: rev b9c9d03591a49be31f3fbc738d01a31700bfdbc4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPendingCorruptDnMessages.java > o.a.h.hdfs.server.namenode.ha.TestPendingCorruptDnMessages fails > intermittently > --- > > Key: HDFS-10281 > URL: https://issues.apache.org/jira/browse/HDFS-10281 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10281.000.patch, HDFS-10281.001.patch > > > In our daily UT test, we found the > {{TestPendingCorruptDnMessages#testChangedStorageId}} failed intermittently, > see following information: > *Error Message* > expected:<1> but was:<0> > *Stacktrace* > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.getRegisteredDatanodeUid(TestPendingCorruptDnMessages.java:124) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages.testChangedStorageId(TestPendingCorruptDnMessages.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky
[ https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243401#comment-15243401 ] Hudson commented on HDFS-10293: --- FAILURE: Integrated in Hadoop-trunk-Commit #9619 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9619/]) HDFS-10293. StripedFileTestUtil#readAll flaky. Contributed by Mingliang (jing9: rev 55e19b7f0c1243090dff2d08ed785cefd420b009) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/StripedFileTestUtil.java > StripedFileTestUtil#readAll flaky > - > > Key: HDFS-10293 > URL: https://issues.apache.org/jira/browse/HDFS-10293 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 3.0.0 > > Attachments: HDFS-10293.000.patch > > > The flaky test helper method cause several UT test failing intermittently. > For example, the > {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}} > timed out in a recent run (see > [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]), > which can be easily reproduced locally. > Debugging at the code, chances are that the helper method is stuck in an > infinite loop. We need a fix to make the test robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243399#comment-15243399 ] Hudson commented on HDFS-10283: --- FAILURE: Integrated in Hadoop-trunk-Commit #9619 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9619/]) HDFS-10283. (jing9: rev 89a838769ff5b6c64565e6949b14d7fed05daf54) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java > o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending > fails intermittently > -- > > Key: HDFS-10283 > URL: https://issues.apache.org/jira/browse/HDFS-10283 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-10283.000.patch > > > The test fails with exception as following: > {code} > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK], > > DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]], > > original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK], > > DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162) > at > org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232) > at > org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245049#comment-15245049 ] Hudson commented on HDFS-9412: -- FAILURE: Integrated in Hadoop-trunk-Commit #9625 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9625/]) HDFS-9412. getBlocks occupies FSLock and takes too long to complete. (waltersu4549: rev 67523ffcf491f4f2db5335899c00a174d0caaa9b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Fix For: 2.8.0 > > Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, > HDFS-9412.0002.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value
[ https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245623#comment-15245623 ] Hudson commented on HDFS-10302: --- FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9626/]) HDFS-10302. BlockPlacementPolicyDefault should use default replication (kihwal: rev d8b729e16fb253e6c84f414d419b5663d9219a43) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > BlockPlacementPolicyDefault should use default replication considerload value > - > > Key: HDFS-10302 > URL: https://issues.apache.org/jira/browse/HDFS-10302 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-10302.001.patch > > > Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value > {{true}} as the replication considerload default value rather than using the > existed string constant value > {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}. > {code} > @Override > public void initialize(Configuration conf, FSClusterStats stats, > NetworkTopology clusterMap, > Host2NodesMap host2datanodeMap) { > this.considerLoad = conf.getBoolean( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true); > this.considerLoadFactor = conf.getDouble( > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR, > DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT); > this.stats = stats; > this.clusterMap = clusterMap; > this.host2datanodeMap = host2datanodeMap; > this.heartbeatInterval = conf.getLong( > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, > DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000; > this.tolerateHeartbeatMultiplier = conf.getInt( > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY, > DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT); > this.staleInterval = conf.getLong( > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, > DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT); > this.preferLocalNode = conf.getBoolean( > DFSConfigKeys. > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY, > DFSConfigKeys. > > DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT); > } > {code} > And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be > used in any place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245624#comment-15245624 ] Hudson commented on HDFS-10275: --- FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9626/]) HDFS-10275. TestDataNodeMetrics failing intermittently due to (waltersu4549: rev ab903029a9d353677184ff5602966b11ffb408b9) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)