[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time

hemanthboyina (Jira) Thu, 18 Jun 2020 07:41:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139468#comment-17139468
 ]


hemanthboyina commented on HDFS-15420:
--------------------------------------

thanks [~maxmzkr] for providing the report , a quick question are there any 
pending reconstruction requests that are timed out?

> approx scheduled blocks not reseting over time
> ----------------------------------------------
>
>                 Key: HDFS-15420
>                 URL: https://issues.apache.org/jira/browse/HDFS-15420
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 2.6.0, 3.0.0
>         Environment: Our 2.6.0 environment is a 3 node cluster running 
> cdh5.15.0.
> Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
>            Reporter: Max Mizikar
>            Priority: Minor
>         Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 
> 2020-06-18 09-31-15.png
>
>
> We have been experiencing large amounts of scheduled blocks that never get 
> cleared out. This is preventing blocks from being placed even when there is 
> plenty of space on the system.
> Here is an example of the block growth over 24 hours on one of our systems 
> running 2.6.0
>  !Screenshot from 2020-06-18 09-29-57.png! 
> Here is an example of the block growth over 24 hours on one of our systems 
> running 3.0.0
>  !Screenshot from 2020-06-18 09-31-15.png! 
> https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue 
> we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, 
> however, there appears to still be a systemic growth in scheduled blocks over 
> time and our systems will still need to restart the namenode on occasion to 
> reset this count. I have not determined what is causing the leaked blocks in 
> 3.0.0.
> Looking into the issue, I discovered that the intention is for scheduled 
> blocks to slowly go back down to 0 after errors cause blocks to be leaked.
> {code}
>   /** Increment the number of blocks scheduled. */
>   void incrementBlocksScheduled(StorageType t) {
>     currApproxBlocksScheduled.add(t, 1);
>   }
>   
>   /** Decrement the number of blocks scheduled. */
>   void decrementBlocksScheduled(StorageType t) {
>     if (prevApproxBlocksScheduled.get(t) > 0) {
>       prevApproxBlocksScheduled.subtract(t, 1);
>     } else if (currApproxBlocksScheduled.get(t) > 0) {
>       currApproxBlocksScheduled.subtract(t, 1);
>     } 
>     // its ok if both counters are zero.
>   }
>   
>   /** Adjusts curr and prev number of blocks scheduled every few minutes. */
>   private void rollBlocksScheduled(long now) {
>     if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
>       prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
>       currApproxBlocksScheduled.reset();
>       lastBlocksScheduledRollTime = now;
>     }
>   }
> {code}
> However, this code does not do what is intended if the system has a constant 
> flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the 
> next scheduled block increments currApproxBlocksScheduled and when it 
> completes, it decrements prevApproxBlocksScheduled preventing the leaked 
> block to be removed from the approx count. So, for errors to be corrected, we 
> have to not write any data for the roll period of 10 minutes. The number of 
> blocks we write per 10 minutes is quite high. This allows the error on the 
> approx counts to grow to very large numbers.
> The comments in the ticket for the original implementation suggest this 
> issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, 
> it's not clear to me if the severity of it was known at the time.
> > So if there are some blocks that are not reported back by the datanode, 
> > they will eventually get adjusted (usually 10 min; bit longer if datanode 
> > is continuously receiving blocks).
> The comments suggest it will eventually get cleared out, but in our case, it 
> never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time

Reply via email to