[ https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439972#comment-13439972 ]
Andy Isaacson commented on HDFS-3828: ------------------------------------- bq. If the scanner scans exactly once shouldn't scansLastRun be 0 after this first run? Ie getBlocksScannedInLastRun shouldn't always return 1 right? Empirically it is always 1 after a block has been scanned. This is because when we call scanBlockPoolSlice but there is nothing to scan we're doing a bunch of useless work: # creating a new HashMap {{processedBlocks}} # parsing the verificationLogs and putting the results in the new {{processedBlocks}} # calling scan() which returns immediately # setting totalBlocksScannedInLastRun to the resulting size of {{processedBlocks}} bq. Like the new approach better. I also like the new code better, but the fact that we can't shortcircuit all the nonsense enumerated above in {{scanBlockPoolSlice}} is a bummer. The previous approach avoided doing all of this extra work. As an alternative, we could propagate a "please wake me up at time T" up from BlockPoolSliceScanner to DataBlockScanner#run and adjust the sleep time there, accordingly. If all threadpools continue to have work to do, then preserve the existing 5-second sleep; if all threadpools are done working then DataBlockScanner could go to sleep for much longer. > Block Scanner rescans blocks too frequently > ------------------------------------------- > > Key: HDFS-3828 > URL: https://issues.apache.org/jira/browse/HDFS-3828 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 0.23.0, 2.0.0-alpha > Reporter: Andy Isaacson > Assignee: Andy Isaacson > Attachments: hdfs-3828-1.txt, hdfs3828.txt > > > {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from > {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}. But cleanUp > unconditionally roll()s the verificationLogs, so after two iterations we have > lost the first iteration of block verification times. As a result a cluster > with just one block repeatedly rescans it every 10 seconds: > {noformat} > 2012-08-16 15:59:57,884 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > 2012-08-16 16:00:07,904 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > 2012-08-16 16:00:17,925 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > {noformat} > {quote} > To fix this, we need to avoid roll()ing the logs multiple times per period. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira