[ https://issues.apache.org/jira/browse/HDFS-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438977#comment-13438977 ]
Andy Isaacson commented on HDFS-3194: ------------------------------------- bq. Latest patch from Amith should address this issue. I assume you're referring to HDFS-3194_7.patch . I've asked for a description before of how it solves the problem, because that was not obvious from the discussion nor from reading the diff. I'm disappointed that nobody responded to that request, so I've gone and read the patch in detail, and I think I can finally explain the approach 7.patch is using. The problem 7.patch is solving is: currently when we finish scanning a BP, we unconditionally rotate the log. We keep two logs, the previous and the current logs. We sometimes re-scan a BP before the scanPeriod completes. If rescan happens twice within a single scanPeriod, the logs will rotate away and we will forget which blocks we previously scanned, so we will scan the first blocks again. To fix this, 7.patch delays rotating the logs until the log has reached a predetermined size, rather than rotating when the scan completes. {code} + static final int verficationLogLimit = 5; {code} What does this constant do? It seems to govern the block verification log size, but I don't understand why we want to keep 5 log entries for every block in blockMap. {code} + private static long BLOCK_SCAN_PERIOD_UNIT = 3600 * 1000; ... - this.scanPeriod = hours * 3600 * 1000; + this.scanPeriod = hours * BLOCK_SCAN_PERIOD_UNIT; {code} I don't think adding a named constant here is an improvement, but if you feel that it helps, please use a more descriptive name for this constant, like MS_PER_HOUR or something similar. Uma, Amith -- Have you tested 7.patch with multiple block pools and a full cluster restart? I think the changed code will leave a dncp_block_verification_log.prev in multiple BP directories, and I suspect that the BLockPoolSliceScanner might resume from the wrong place if there are multiple verification_logs in the data directories. Per Eli's request, I'm going to close this Jira as resolved by my one-line patch which resolves the "Block scanner runs too frequently" bug, and open a new Jira to track the "Block scanner repeatedly rescans blocks" bug which is addressed by HDFS-3194_7.patch. > DataNode block scanner is running too frequently > ------------------------------------------------ > > Key: HDFS-3194 > URL: https://issues.apache.org/jira/browse/HDFS-3194 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 2.0.0-alpha > Reporter: suja s > Assignee: Andy Isaacson > Fix For: 2.2.0-alpha > > Attachments: HDFS-3194_1.patch, hdfs-3194-1.txt, HDFS-3194_2.patch, > HDFS-3194_4.patch, HDFS-3194_6.patch, HDFS-3194_7.patch, HDFS-3194.patch > > > Block scanning interval by default should be taken as 21 days(3 weeks) and > each block scanning should happen once in 21 days. > Here the block is being scanned continuosly. > 2012-04-03 10:44:47,056 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for > BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003 > 2012-04-03 10:45:02,064 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for > BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003 > 2012-04-03 10:45:17,071 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for > BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003 > 2012-04-03 10:45:32,079 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira