[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764063#comment-13764063 ]
Vinay commented on HDFS-5031: ----------------------------- bq. I ran TestDatanodeBlockScanner#testDuplicateScans without the rest of the code changes and it continues to pass. Do you see the same? Yes. I also observed yesterday. I had missed one assertion. Will be updated in upcoming patch bq. I did not understand how the isNewPeriod check works. I will continue to take a look but meanwhile if someone more familiar with this code wants to chime in please do so. {{processedBlocks}} is getting reset for every log roll, but {{bytesLeft}} is getting reset only for every {{startNewPeriod()}}, so on every log roll unnecessory {{bytesLeft}} was getting decremented in {{assignInitialVerificationTimes()}} which was resulting in negative values of bytesLeft. Due to this scanning was returning from {{workRemainingInCurrentPeriod()}} without scanning latest blocks. We should decrement it only once after starting the new period. bq. BlockScanInfo#equals looks redundant now. Can we just remove it? Yes, I will remove in next patch. bq. In Reader#next, should the assignment to lastReadFile happen after the call to readNext? Since {{Reader#next}} is not actually reading again and returning. Its returning previously read line only. So assignment of {{lastReadFile }} before {{readNext}} is correct. > BlockScanner scans the block multiple times and on restart scans everything > --------------------------------------------------------------------------- > > Key: HDFS-5031 > URL: https://issues.apache.org/jira/browse/HDFS-5031 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0, 2.1.0-beta > Reporter: Vinay > Assignee: Vinay > Attachments: HDFS-5031.patch, HDFS-5031.patch > > > BlockScanner scans the block twice, also on restart of datanode scans > everything. > Steps: > 1. Write blocks with interval of more than 5 seconds. write new block on > completion of scan for written block. > Each time datanode scans new block, it also scans, previous block which is > already scanned. > Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira