[jira] [Commented] (HDFS-3194) DataNode block scanner is running too frequently

Andy Isaacson (JIRA) Tue, 21 Aug 2012 12:39:40 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438977#comment-13438977
 ]


Andy Isaacson commented on HDFS-3194:
-------------------------------------

bq. Latest patch from Amith should address this issue. 

I assume you're referring to HDFS-3194_7.patch .

I've asked for a description before of how it solves the problem, because that 
was not obvious from the discussion nor from reading the diff.  I'm 
disappointed that nobody responded to that request, so I've gone and read the 
patch in detail, and I think I can finally explain the approach 7.patch is 
using.

The problem 7.patch is solving is: currently when we finish scanning a BP, we 
unconditionally rotate the log.  We keep two logs, the previous and the current 
logs.  We sometimes re-scan a BP before the scanPeriod completes.  If rescan 
happens twice within a single scanPeriod, the logs will rotate away and we will 
forget which blocks we previously scanned, so we will scan the first blocks 
again.

To fix this, 7.patch delays rotating the logs until the log has reached a 
predetermined size, rather than rotating when the scan completes.

{code}
+  static final int verficationLogLimit = 5;
{code}
What does this constant do?  It seems to govern the block verification log 
size, but I don't understand why we want to keep 5 log entries for every block 
in blockMap.
{code}
+  private static long BLOCK_SCAN_PERIOD_UNIT = 3600 * 1000;
...
-    this.scanPeriod = hours * 3600 * 1000;
+    this.scanPeriod = hours * BLOCK_SCAN_PERIOD_UNIT;
{code}
I don't think adding a named constant here is an improvement, but if you feel 
that it helps, please use a more descriptive name for this constant, like 
MS_PER_HOUR or something similar.

Uma, Amith -- Have you tested 7.patch with multiple block pools and a full 
cluster restart?  I think the changed code will leave a 
dncp_block_verification_log.prev in multiple BP directories, and I suspect that 
the BLockPoolSliceScanner might resume from the wrong place if there are 
multiple verification_logs in the data directories.

Per Eli's request, I'm going to close this Jira as resolved by my one-line 
patch which resolves the "Block scanner runs too frequently" bug, and open a 
new Jira to track the "Block scanner repeatedly rescans blocks" bug which is 
addressed by HDFS-3194_7.patch.
                
> DataNode block scanner is running too frequently
> ------------------------------------------------
>
>                 Key: HDFS-3194
>                 URL: https://issues.apache.org/jira/browse/HDFS-3194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: suja s
>            Assignee: Andy Isaacson
>             Fix For: 2.2.0-alpha
>
>         Attachments: HDFS-3194_1.patch, hdfs-3194-1.txt, HDFS-3194_2.patch, 
> HDFS-3194_4.patch, HDFS-3194_6.patch, HDFS-3194_7.patch, HDFS-3194.patch
>
>
> Block scanning interval by default should be taken as 21 days(3 weeks) and 
> each block scanning should happen once in 21 days.
> Here the block is being scanned continuosly.
> 2012-04-03 10:44:47,056 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for 
> BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:02,064 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for 
> BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:17,071 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for 
> BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:32,079 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3194) DataNode block scanner is running too frequently

Reply via email to