[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long
[ https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16432: -- Labels: pull-request-available (was: ) > Namenode block report add yield to avoid holding write lock too long > > > Key: HDFS-16432 > URL: https://issues.apache.org/jira/browse/HDFS-16432 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-01-20-15-19-28-384.png > > Time Spent: 10m > Remaining Estimate: 0h > > !image-2022-01-20-15-19-28-384.png|width=683,height=132! > In our cluster, namenode block report will held write lock for a long time if > the storage block number more than 10. So we want to add a yield > mechanism in block reporting process to avoid holding write lock too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long
[ https://issues.apache.org/jira/browse/HDFS-16432?focusedWorklogId=711904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711904 ] ASF GitHub Bot logged work on HDFS-16432: - Author: ASF GitHub Bot Created on: 20/Jan/22 07:32 Start Date: 20/Jan/22 07:32 Worklog Time Spent: 10m Work Description: liubingxing opened a new pull request #3907: URL: https://github.com/apache/hadoop/pull/3907 JIRA: [HDFS-16432](https://issues.apache.org/jira/browse/HDFS-16432) ![image](https://user-images.githubusercontent.com/2844826/150293279-07d7bbf0-1471-464f-af81-7d5c23aeadcd.png) In our cluster, namenode block report will held write lock for a long time if the storage block number more than 10. So we want to add a yield mechanism in block reporting process to avoid holding write lock too long. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711904) Remaining Estimate: 0h Time Spent: 10m > Namenode block report add yield to avoid holding write lock too long > > > Key: HDFS-16432 > URL: https://issues.apache.org/jira/browse/HDFS-16432 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Priority: Major > Attachments: image-2022-01-20-15-19-28-384.png > > Time Spent: 10m > Remaining Estimate: 0h > > !image-2022-01-20-15-19-28-384.png|width=683,height=132! > In our cluster, namenode block report will held write lock for a long time if > the storage block number more than 10. So we want to add a yield > mechanism in block reporting process to avoid holding write lock too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long
[ https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qinyuren updated HDFS-16432: Description: !image-2022-01-20-15-19-28-384.png|width=683,height=132! In our cluster, namenode block report will held write lock for a long time if the storage block number more than 10. So we want to add a yield mechanism in block reporting process to avoid holding write lock too long. was: !image-2022-01-20-15-19-28-384.png|width=652,height=126! In our cluster, namenode block report will held write lock for a long time if the storage block number more than 10. So we want to add a yield mechanism in block reporting process to avoid holding write lock too long. > Namenode block report add yield to avoid holding write lock too long > > > Key: HDFS-16432 > URL: https://issues.apache.org/jira/browse/HDFS-16432 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Priority: Major > Attachments: image-2022-01-20-15-19-28-384.png > > > !image-2022-01-20-15-19-28-384.png|width=683,height=132! > In our cluster, namenode block report will held write lock for a long time if > the storage block number more than 10. So we want to add a yield > mechanism in block reporting process to avoid holding write lock too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long
[ https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qinyuren updated HDFS-16432: Description: !image-2022-01-20-15-19-28-384.png|width=652,height=126! In our cluster, namenode block report will held write lock for a long time if the storage block number more than 10. So we want to add a yield mechanism in block reporting process to avoid holding write lock too long. was:!image-2022-01-20-15-19-28-384.png|width=652,height=126! > Namenode block report add yield to avoid holding write lock too long > > > Key: HDFS-16432 > URL: https://issues.apache.org/jira/browse/HDFS-16432 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Priority: Major > Attachments: image-2022-01-20-15-19-28-384.png > > > !image-2022-01-20-15-19-28-384.png|width=652,height=126! > In our cluster, namenode block report will held write lock for a long time if > the storage block number more than 10. So we want to add a yield > mechanism in block reporting process to avoid holding write lock too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long
qinyuren created HDFS-16432: --- Summary: Namenode block report add yield to avoid holding write lock too long Key: HDFS-16432 URL: https://issues.apache.org/jira/browse/HDFS-16432 Project: Hadoop HDFS Issue Type: Improvement Reporter: qinyuren Attachments: image-2022-01-20-15-19-28-384.png !image-2022-01-20-15-19-28-384.png|width=652,height=126! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479090#comment-17479090 ] Yuanbo Liu commented on HDFS-15180: --- [~sodonnell] Thanks for your comments. There's a background that needs to be clarified. Nowadays, the storage machine becomes bigger and bigger. We've seen 12TB x 36 disks (which means 436TB of single datanode) in production environment. Global lock will be the key impact of IO performance, we'd be glad if this Jira has further progress to discuss or even be merged. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.2.0 >Reporter: Qi Zhu >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, > HDFS-15180.003.patch, HDFS-15180.004.patch, > image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, > image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png > > Time Spent: 40m > Remaining Estimate: 0h > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16430) Validate maximum blocks in EC group when adding an EC policy
[ https://issues.apache.org/jira/browse/HDFS-16430?focusedWorklogId=711858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711858 ] ASF GitHub Bot logged work on HDFS-16430: - Author: ASF GitHub Bot Created on: 20/Jan/22 03:56 Start Date: 20/Jan/22 03:56 Worklog Time Spent: 10m Work Description: cndaimin commented on pull request #3899: URL: https://github.com/apache/hadoop/pull/3899#issuecomment-1017095749 Thanks for your review! @ayushtkn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711858) Time Spent: 0.5h (was: 20m) > Validate maximum blocks in EC group when adding an EC policy > > > Key: HDFS-16430 > URL: https://issues.apache.org/jira/browse/HDFS-16430 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HDFS EC adopts the last 4 bits of block ID to store the block index in EC > block group. Therefore maximum blocks in EC block group is 2^4=16, and which > is defined here: HdfsServerConstants#MAX_BLOCKS_IN_GROUP. > Currently there is no limitation or warning when adding a bad EC policy with > numDataUnits + numParityUnits > 16. It only results in read/write error on EC > file with bad EC policy. To users this is not very straightforward. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-14768: -- Attachment: HDFS-14768-branch-3.1.patch > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768-branch-3.1.patch, HDFS-14768-branch-3.2.patch, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, > HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); >
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479070#comment-17479070 ] Yuanbo Liu commented on HDFS-14768: --- Attached for branch-3.2, and branch-3.1, once it's merged, I'll attach new patch for HDFS-15186 along with branch-3.2 and branch-3.1 > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768-branch-3.1.patch, HDFS-14768-branch-3.2.patch, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, > HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-14768: -- Attachment: HDFS-14768-branch-3.2.patch > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768-branch-3.2.patch, HDFS-14768.000.patch, HDFS-14768.001.patch, > HDFS-14768.002.patch, HDFS-14768.003.patch, HDFS-14768.004.patch, > HDFS-14768.005.patch, HDFS-14768.006.patch, HDFS-14768.007.patch, > HDFS-14768.008.patch, HDFS-14768.009.patch, HDFS-14768.010.patch, > HDFS-14768.011.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes,
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-14768: -- Attachment: (was: HDFS-14768-branch-3.2.patch) > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, > HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes,
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-14768: -- Attachment: HDFS-14768-branch-3.2.patch > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768-branch-3.2.patch, HDFS-14768.000.patch, HDFS-14768.001.patch, > HDFS-14768.002.patch, HDFS-14768.003.patch, HDFS-14768.004.patch, > HDFS-14768.005.patch, HDFS-14768.006.patch, HDFS-14768.007.patch, > HDFS-14768.008.patch, HDFS-14768.009.patch, HDFS-14768.010.patch, > HDFS-14768.011.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes,
[jira] [Work logged] (HDFS-16398) Reconfig block report parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16398?focusedWorklogId=711813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711813 ] ASF GitHub Bot logged work on HDFS-16398: - Author: ASF GitHub Bot Created on: 20/Jan/22 01:29 Start Date: 20/Jan/22 01:29 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3831: URL: https://github.com/apache/hadoop/pull/3831#issuecomment-1017027989 Hi @tasanuma , I have solved the conflict. If you are free, please help to review this PR. Thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711813) Time Spent: 1h (was: 50m) > Reconfig block report parameters for datanode > - > > Key: HDFS-16398 > URL: https://issues.apache.org/jira/browse/HDFS-16398 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped
[ https://issues.apache.org/jira/browse/HDFS-16427?focusedWorklogId=711801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711801 ] ASF GitHub Bot logged work on HDFS-16427: - Author: ASF GitHub Bot Created on: 20/Jan/22 01:11 Start Date: 20/Jan/22 01:11 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3888: URL: https://github.com/apache/hadoop/pull/3888#issuecomment-1017019075 Hi @tasanuma @ayushtkn , could you please take a look. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711801) Time Spent: 0.5h (was: 20m) > Add debug log for BlockManager#chooseExcessRedundancyStriped > > > Key: HDFS-16427 > URL: https://issues.apache.org/jira/browse/HDFS-16427 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > To solve this issueHDFS-16420 , we added some debug logs, which were also > necessary. If there is a problem, we set the log level to DEBUG, which is > convenient to analyze it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped
[ https://issues.apache.org/jira/browse/HDFS-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-16427: --- Description: To solve this issueHDFS-16420 , we added some debug logs, which were also necessary. If there is a problem, we set the log level to DEBUG, which is convenient to analyze it. (was: To solve this issue[HDFS-16420|https://issues.apache.org/jira/browse/HDFS-16420] , we added some debug logs, which were also necessary.) > Add debug log for BlockManager#chooseExcessRedundancyStriped > > > Key: HDFS-16427 > URL: https://issues.apache.org/jira/browse/HDFS-16427 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > To solve this issueHDFS-16420 , we added some debug logs, which were also > necessary. If there is a problem, we set the log level to DEBUG, which is > convenient to analyze it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16398) Reconfig block report parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16398?focusedWorklogId=711746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711746 ] ASF GitHub Bot logged work on HDFS-16398: - Author: ASF GitHub Bot Created on: 19/Jan/22 23:18 Start Date: 19/Jan/22 23:18 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3831: URL: https://github.com/apache/hadoop/pull/3831#issuecomment-1016959135 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 12s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 10s | | trunk passed | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 0m 59s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 28s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 23s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 53s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 110 unchanged - 2 fixed = 112 total (was 112) | | +1 :green_heart: | mvnsite | 1m 20s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 16s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 368m 38s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 477m 13s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3831 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 48c7e19233c8 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6bd9edc48583506a286bd453f71dd64862ca5961 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/testReport/ | | Max. process+thread count | 2009 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Work logged] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation
[ https://issues.apache.org/jira/browse/HDFS-16428?focusedWorklogId=711643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711643 ] ASF GitHub Bot logged work on HDFS-16428: - Author: ASF GitHub Bot Created on: 19/Jan/22 20:43 Start Date: 19/Jan/22 20:43 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3898: URL: https://github.com/apache/hadoop/pull/3898#issuecomment-1016852033 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 20s | | trunk passed | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 0s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 46s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 21s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 49s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 21s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 59s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 406m 39s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings. | | | | 506m 42s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3898 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 1ad4e8770126 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 8e5315ca5b46938552dc10ee5cf4771c2613a7a0 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/testReport/ | | Max. process+thread count | 2658 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This
[jira] [Work logged] (HDFS-16429) Add DataSetLockManager to maintain locks for FsDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-16429?focusedWorklogId=711454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711454 ] ASF GitHub Bot logged work on HDFS-16429: - Author: ASF GitHub Bot Created on: 19/Jan/22 15:56 Start Date: 19/Jan/22 15:56 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3900: URL: https://github.com/apache/hadoop/pull/3900#issuecomment-1016609231 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 5s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 21m 42s | | trunk passed | | +1 :green_heart: | compile | 22m 17s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 19m 29s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 3m 41s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 21s | | trunk passed | | +1 :green_heart: | javadoc | 2m 27s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 3m 29s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 14s | | the patch passed | | +1 :green_heart: | compile | 21m 33s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 21m 33s | | the patch passed | | +1 :green_heart: | compile | 19m 21s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 19m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 3m 31s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 201 unchanged - 0 fixed = 202 total (was 201) | | +1 :green_heart: | mvnsite | 3m 21s | | the patch passed | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | javadoc | 2m 24s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 3m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 12s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 15s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 17m 39s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 231m 58s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 5s | | The patch does not generate ASF License warnings. | | | | 454m 1s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ipc.TestIPC | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3900 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml | | uname | Linux 9b1217eef5cf 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision |
[jira] [Work logged] (HDFS-16401) Remove the worthless DatasetVolumeChecker#numAsyncDatasetChecks
[ https://issues.apache.org/jira/browse/HDFS-16401?focusedWorklogId=711359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711359 ] ASF GitHub Bot logged work on HDFS-16401: - Author: ASF GitHub Bot Created on: 19/Jan/22 12:30 Start Date: 19/Jan/22 12:30 Worklog Time Spent: 10m Work Description: jianghuazhu commented on pull request #3838: URL: https://github.com/apache/hadoop/pull/3838#issuecomment-1016421126 Hi @ferhui , can this pr be merged into trunk branch or other main branch? If not for now, I will continue to work hard. Thank you very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711359) Time Spent: 1h 20m (was: 1h 10m) > Remove the worthless DatasetVolumeChecker#numAsyncDatasetChecks > --- > > Key: HDFS-16401 > URL: https://issues.apache.org/jira/browse/HDFS-16401 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > As early as HDFS-11279, DataNode#checkDiskErrorAsync() has been cleaned up, > It seems to have neglected to clean up > DatasetVolumeChecker#numAsyncDatasetChecks together. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation
[ https://issues.apache.org/jira/browse/HDFS-16428?focusedWorklogId=711295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711295 ] ASF GitHub Bot logged work on HDFS-16428: - Author: ASF GitHub Bot Created on: 19/Jan/22 11:09 Start Date: 19/Jan/22 11:09 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3898: URL: https://github.com/apache/hadoop/pull/3898#discussion_r787638952 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java ## @@ -958,6 +959,33 @@ public void testQuotaByStorageType() throws Exception { 6 * fileSpace); } + @Test + public void testRenameInodeWithStorageType() throws IOException { +final int SIZE = 64; +final short REPL = 1; +final Path foo = new Path("/foo"); +final Path bs1 = new Path(foo, "bs1"); +final Path wow = new Path(bs1, "wow"); +final Path bs2 = new Path(foo, "bs2"); +final Path wow2 = new Path(bs2,"wow2"); + +dfs.mkdirs(bs1, FsPermission.getDirDefault()); +dfs.mkdirs(bs2, FsPermission.getDirDefault()); +dfs.setQuota(bs1,1000, 434217728); +dfs.setQuota(bs2,1000, 434217728); +dfs.setStoragePolicy(bs2, HdfsConstants.ONESSD_STORAGE_POLICY_NAME); + +DFSTestUtil.createFile(dfs, wow, SIZE, REPL, 0); +DFSTestUtil.createFile(dfs, wow2, SIZE, REPL, 0); +assertTrue("without storage policy, typeConsumed should be 0.", +dfs.getQuotaUsage(bs1).getTypeConsumed(StorageType.SSD) == 0); +assertTrue("with storage policy, typeConsumed should not be 0.", +dfs.getQuotaUsage(bs2).getTypeConsumed(StorageType.SSD) != 0); +dfs.rename(bs2, bs1); +assertTrue("rename with storage policy, typeConsumed should not be 0.", +dfs.getQuotaUsage(bs1).getTypeConsumed(StorageType.SSD) != 0); Review comment: Hmm, Thanx @ThinkerLei for confirming. Rechecking, seems correct. The issue and fix makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711295) Time Spent: 50m (was: 40m) > Source path setted storagePolicy will cause wrong typeConsumed in rename > operation > --- > > Key: HDFS-16428 > URL: https://issues.apache.org/jira/browse/HDFS-16428 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: lei w >Priority: Major > Labels: pull-request-available > Attachments: example.txt > > Time Spent: 50m > Remaining Estimate: 0h > > When compute quota in rename operation , we use storage policy of the target > directory to compute src quota usage. This will cause wrong value of > typeConsumed when source path was setted storage policy. I provided a unit > test to present this situation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478544#comment-17478544 ] Xiangyi Zhu commented on HDFS-16214: [~John Smith] Currently Issues wants to solve the problem of long lock-holding time when collecting blocks when deleting large directories. This [HDFS-16043|https://issues.apache.org/jira/browse/HDFS-16043] Issuss is to achieve asynchronous deletion of blocks. These two issues are not the same. > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Description: Add markedDeleteBlockScrubberThread to delete blocks asynchronously. (was: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue HDFS-16000. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms) > Add markedDeleteBlockScrubberThread to delete blocks asynchronously > --- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 12.5h > Remaining Estimate: 0h > > Add markedDeleteBlockScrubberThread to delete blocks asynchronously. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711233 ] ASF GitHub Bot logged work on HDFS-16331: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:56 Start Date: 19/Jan/22 09:56 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3676: URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016268249 @tomscut Thanks for working on it and letting me know. I would like to review it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711233) Time Spent: 5h 40m (was: 5.5h) > Make dfs.blockreport.intervalMsec reconfigurable > > > Key: HDFS-16331 > URL: https://issues.apache.org/jira/browse/HDFS-16331 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Attachments: image-2021-11-18-09-33-24-236.png, > image-2021-11-18-09-35-35-400.png > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We have a cold data cluster, which stores as EC policy. There are 24 fast > disks on each node and each disk is 7 TB. > Recently, many nodes have more than 10 million blocks, and the interval of > FBR is 6h as default. Frequent FBR caused great pressure on NN. > !image-2021-11-18-09-35-35-400.png|width=334,height=229! > !image-2021-11-18-09-33-24-236.png|width=566,height=159! > We want to increase the interval of FBR, but have to rolling restart the DNs, > this operation is very heavy. In this scenario, it is necessary to make > _dfs.blockreport.intervalMsec_ reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478505#comment-17478505 ] Yuxuan Wang commented on HDFS-16214: [~zhuxiangyi] I'm not read entirely but it looks similar to HDFS-16043 ? > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16399) Reconfig cache report parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-16399: Fix Version/s: 3.3.3 > Reconfig cache report parameters for datanode > - > > Key: HDFS-16399 > URL: https://issues.apache.org/jira/browse/HDFS-16399 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711228 ] ASF GitHub Bot logged work on HDFS-16331: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:49 Start Date: 19/Jan/22 09:49 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3676: URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016262441 > Cherry-picked it into branch-3.3 with fixing small conflicts. Thank @tasanuma for your cherry-picking. I submitted a new PR [#3831](https://github.com/apache/hadoop/pull/3831) with other parameters related to blockReport, but currently there is a conflict with the branch-trunk. I'll resolve the conflict later. I'm sorry I didn't tell you earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711228) Time Spent: 5.5h (was: 5h 20m) > Make dfs.blockreport.intervalMsec reconfigurable > > > Key: HDFS-16331 > URL: https://issues.apache.org/jira/browse/HDFS-16331 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Attachments: image-2021-11-18-09-33-24-236.png, > image-2021-11-18-09-35-35-400.png > > Time Spent: 5.5h > Remaining Estimate: 0h > > We have a cold data cluster, which stores as EC policy. There are 24 fast > disks on each node and each disk is 7 TB. > Recently, many nodes have more than 10 million blocks, and the interval of > FBR is 6h as default. Frequent FBR caused great pressure on NN. > !image-2021-11-18-09-35-35-400.png|width=334,height=229! > !image-2021-11-18-09-33-24-236.png|width=566,height=159! > We want to increase the interval of FBR, but have to rolling restart the DNs, > this operation is very heavy. In this scenario, it is necessary to make > _dfs.blockreport.intervalMsec_ reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16399) Reconfig cache report parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16399?focusedWorklogId=711224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711224 ] ASF GitHub Bot logged work on HDFS-16399: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:45 Start Date: 19/Jan/22 09:45 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3841: URL: https://github.com/apache/hadoop/pull/3841#issuecomment-1016258796 Cherry-picked it into branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711224) Time Spent: 1h 40m (was: 1.5h) > Reconfig cache report parameters for datanode > - > > Key: HDFS-16399 > URL: https://issues.apache.org/jira/browse/HDFS-16399 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16431) Truncate CallerContext in client side
Chengwei Wang created HDFS-16431: Summary: Truncate CallerContext in client side Key: HDFS-16431 URL: https://issues.apache.org/jira/browse/HDFS-16431 Project: Hadoop HDFS Issue Type: Improvement Components: nn Reporter: Chengwei Wang Assignee: Chengwei Wang The context of CallerContext would be truncated when it exceeds the maximum allowed length in server side. I think it's better to do check and truncate in client side to reduce the unnecessary overhead of network and memory for NN. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16400) Reconfig DataXceiver parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-16400: Fix Version/s: 3.3.3 > Reconfig DataXceiver parameters for datanode > > > Key: HDFS-16400 > URL: https://issues.apache.org/jira/browse/HDFS-16400 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > To avoid frequent rolling restarts of the DN, we should make DataXceiver > parameters reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16400) Reconfig DataXceiver parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16400?focusedWorklogId=711223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711223 ] ASF GitHub Bot logged work on HDFS-16400: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:44 Start Date: 19/Jan/22 09:44 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3843: URL: https://github.com/apache/hadoop/pull/3843#issuecomment-1016258225 Cherry-picked it into branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711223) Time Spent: 3h 40m (was: 3.5h) > Reconfig DataXceiver parameters for datanode > > > Key: HDFS-16400 > URL: https://issues.apache.org/jira/browse/HDFS-16400 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > To avoid frequent rolling restarts of the DN, we should make DataXceiver > parameters reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-16331: Fix Version/s: 3.3.3 > Make dfs.blockreport.intervalMsec reconfigurable > > > Key: HDFS-16331 > URL: https://issues.apache.org/jira/browse/HDFS-16331 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Attachments: image-2021-11-18-09-33-24-236.png, > image-2021-11-18-09-35-35-400.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > We have a cold data cluster, which stores as EC policy. There are 24 fast > disks on each node and each disk is 7 TB. > Recently, many nodes have more than 10 million blocks, and the interval of > FBR is 6h as default. Frequent FBR caused great pressure on NN. > !image-2021-11-18-09-35-35-400.png|width=334,height=229! > !image-2021-11-18-09-33-24-236.png|width=566,height=159! > We want to increase the interval of FBR, but have to rolling restart the DNs, > this operation is very heavy. In this scenario, it is necessary to make > _dfs.blockreport.intervalMsec_ reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711221 ] ASF GitHub Bot logged work on HDFS-16331: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:42 Start Date: 19/Jan/22 09:42 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3676: URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016256231 Cherry-picked it into branch-3.3 with fixing small conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711221) Time Spent: 5h 20m (was: 5h 10m) > Make dfs.blockreport.intervalMsec reconfigurable > > > Key: HDFS-16331 > URL: https://issues.apache.org/jira/browse/HDFS-16331 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-11-18-09-33-24-236.png, > image-2021-11-18-09-35-35-400.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > We have a cold data cluster, which stores as EC policy. There are 24 fast > disks on each node and each disk is 7 TB. > Recently, many nodes have more than 10 million blocks, and the interval of > FBR is 6h as default. Frequent FBR caused great pressure on NN. > !image-2021-11-18-09-35-35-400.png|width=334,height=229! > !image-2021-11-18-09-33-24-236.png|width=566,height=159! > We want to increase the interval of FBR, but have to rolling restart the DNs, > this operation is very heavy. In this scenario, it is necessary to make > _dfs.blockreport.intervalMsec_ reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages
[ https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711211 ] ASF GitHub Bot logged work on HDFS-16423: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:34 Start Date: 19/Jan/22 09:34 Worklog Time Spent: 10m Work Description: liubingxing commented on pull request #3883: URL: https://github.com/apache/hadoop/pull/3883#issuecomment-1016249836 Thanks @tasanuma @tomscut -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711211) Time Spent: 3h 20m (was: 3h 10m) > balancer should not get blocks on stale storages > > > Key: HDFS-16423 > URL: https://issues.apache.org/jira/browse/HDFS-16423 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-01-13-17-18-32-409.png > > Time Spent: 3h 20m > Remaining Estimate: 0h > > We have met a problems as described in HDFS-16420 > We found that balancer copied a block multi times without deleting the source > block if this block was placed in a stale storage. And resulting a block with > many copies, but these redundant copies are not deleted until the storage > become not stale. > > !image-2022-01-13-17-18-32-409.png|width=657,height=275! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16426) fix nextBlockReportTime when trigger full block report force
[ https://issues.apache.org/jira/browse/HDFS-16426?focusedWorklogId=711210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711210 ] ASF GitHub Bot logged work on HDFS-16426: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:33 Start Date: 19/Jan/22 09:33 Worklog Time Spent: 10m Work Description: liubingxing commented on pull request #3887: URL: https://github.com/apache/hadoop/pull/3887#issuecomment-1016249379 Thanks @tasanuma @tomscut -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711210) Time Spent: 2h 10m (was: 2h) > fix nextBlockReportTime when trigger full block report force > > > Key: HDFS-16426 > URL: https://issues.apache.org/jira/browse/HDFS-16426 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When we trigger full block report force by command line, the next block > report time will be set like this: > nextBlockReportTime.getAndAdd(blockReportIntervalMs); > nextBlockReportTime will larger than blockReportIntervalMs. > If we trigger full block report twice, the nextBlockReportTime will larger > than 2 * blockReportIntervalMs. This is obviously not what we want. > We fix the nextBlockReportTime = now + blockReportIntervalMs after full block > report trigger by command line. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16423) balancer should not get blocks on stale storages
[ https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma resolved HDFS-16423. - Fix Version/s: 3.4.0 Resolution: Fixed > balancer should not get blocks on stale storages > > > Key: HDFS-16423 > URL: https://issues.apache.org/jira/browse/HDFS-16423 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-01-13-17-18-32-409.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We have met a problems as described in HDFS-16420 > We found that balancer copied a block multi times without deleting the source > block if this block was placed in a stale storage. And resulting a block with > many copies, but these redundant copies are not deleted until the storage > become not stale. > > !image-2022-01-13-17-18-32-409.png|width=657,height=275! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages
[ https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711179 ] ASF GitHub Bot logged work on HDFS-16423: - Author: ASF GitHub Bot Created on: 19/Jan/22 09:00 Start Date: 19/Jan/22 09:00 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3883: URL: https://github.com/apache/hadoop/pull/3883#issuecomment-1016220775 Merged it. Thanks for your contribution, @liubingxing, and thanks for your review, @tomscut. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711179) Time Spent: 3h 10m (was: 3h) > balancer should not get blocks on stale storages > > > Key: HDFS-16423 > URL: https://issues.apache.org/jira/browse/HDFS-16423 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-01-13-17-18-32-409.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We have met a problems as described in HDFS-16420 > We found that balancer copied a block multi times without deleting the source > block if this block was placed in a stale storage. And resulting a block with > many copies, but these redundant copies are not deleted until the storage > become not stale. > > !image-2022-01-13-17-18-32-409.png|width=657,height=275! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages
[ https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711177 ] ASF GitHub Bot logged work on HDFS-16423: - Author: ASF GitHub Bot Created on: 19/Jan/22 08:59 Start Date: 19/Jan/22 08:59 Worklog Time Spent: 10m Work Description: tasanuma merged pull request #3883: URL: https://github.com/apache/hadoop/pull/3883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 711177) Time Spent: 3h (was: 2h 50m) > balancer should not get blocks on stale storages > > > Key: HDFS-16423 > URL: https://issues.apache.org/jira/browse/HDFS-16423 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-01-13-17-18-32-409.png > > Time Spent: 3h > Remaining Estimate: 0h > > We have met a problems as described in HDFS-16420 > We found that balancer copied a block multi times without deleting the source > block if this block was placed in a stale storage. And resulting a block with > many copies, but these redundant copies are not deleted until the storage > become not stale. > > !image-2022-01-13-17-18-32-409.png|width=657,height=275! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org