[ https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139468#comment-17139468 ]
hemanthboyina commented on HDFS-15420: -------------------------------------- thanks [~maxmzkr] for providing the report , a quick question are there any pending reconstruction requests that are timed out? > approx scheduled blocks not reseting over time > ---------------------------------------------- > > Key: HDFS-15420 > URL: https://issues.apache.org/jira/browse/HDFS-15420 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 2.6.0, 3.0.0 > Environment: Our 2.6.0 environment is a 3 node cluster running > cdh5.15.0. > Our 3.0.0 environment is a 4 node cluster running cdh6.3.0. > Reporter: Max Mizikar > Priority: Minor > Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from > 2020-06-18 09-31-15.png > > > We have been experiencing large amounts of scheduled blocks that never get > cleared out. This is preventing blocks from being placed even when there is > plenty of space on the system. > Here is an example of the block growth over 24 hours on one of our systems > running 2.6.0 > !Screenshot from 2020-06-18 09-29-57.png! > Here is an example of the block growth over 24 hours on one of our systems > running 3.0.0 > !Screenshot from 2020-06-18 09-31-15.png! > https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue > we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, > however, there appears to still be a systemic growth in scheduled blocks over > time and our systems will still need to restart the namenode on occasion to > reset this count. I have not determined what is causing the leaked blocks in > 3.0.0. > Looking into the issue, I discovered that the intention is for scheduled > blocks to slowly go back down to 0 after errors cause blocks to be leaked. > {code} > /** Increment the number of blocks scheduled. */ > void incrementBlocksScheduled(StorageType t) { > currApproxBlocksScheduled.add(t, 1); > } > > /** Decrement the number of blocks scheduled. */ > void decrementBlocksScheduled(StorageType t) { > if (prevApproxBlocksScheduled.get(t) > 0) { > prevApproxBlocksScheduled.subtract(t, 1); > } else if (currApproxBlocksScheduled.get(t) > 0) { > currApproxBlocksScheduled.subtract(t, 1); > } > // its ok if both counters are zero. > } > > /** Adjusts curr and prev number of blocks scheduled every few minutes. */ > private void rollBlocksScheduled(long now) { > if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) { > prevApproxBlocksScheduled.set(currApproxBlocksScheduled); > currApproxBlocksScheduled.reset(); > lastBlocksScheduledRollTime = now; > } > } > {code} > However, this code does not do what is intended if the system has a constant > flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the > next scheduled block increments currApproxBlocksScheduled and when it > completes, it decrements prevApproxBlocksScheduled preventing the leaked > block to be removed from the approx count. So, for errors to be corrected, we > have to not write any data for the roll period of 10 minutes. The number of > blocks we write per 10 minutes is quite high. This allows the error on the > approx counts to grow to very large numbers. > The comments in the ticket for the original implementation suggest this > issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, > it's not clear to me if the severity of it was known at the time. > > So if there are some blocks that are not reported back by the datanode, > > they will eventually get adjusted (usually 10 min; bit longer if datanode > > is continuously receiving blocks). > The comments suggest it will eventually get cleared out, but in our case, it > never gets cleared out. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org