[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770619#comment-13770619
 ] 

Vinay commented on HDFS-5031:
-----------------------------

bq. I am still not convinced that the assignment to lastReadFile before the 
call to readNext is correct. Is lastReadFile meant to store the file from which 
the last line was read? If so then the call to readNext can change file, or did 
I understand it wrong?
Here I agree that, {{readNext()}} will change the reference of {{file}}, but 
{{next()}} will return the {{curLine}} which was read in the previous call of 
{{readNext()}}, so since we are using the value of line before {{readNext()}} 
in current call, we should also have the previous value of {{file}} for 
{{lastReadFile}}. Otherwise, following problem will come.
# Consider {{RollingLogsImpl#next()}} call is expected to return the last but 
one entry from {{dncp_block_verification.log.prev}}, during this time 
{{RollingLogsImpl#readNext()}} would read the last entry and keep in {{line}}
# one more call to {{RollingLogsImpl#next()}}will return last entry read in 
previous call, but this time {{readNext()}} will open 
{{dncp_block_verification.log.cur}} and change {{file}} to 
{{dncp_block_verification.log.cur}}.
# Now in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} while 
processing the last entry from prev dncp log, if {{logIterator.isPrevious()}} 
is called, then it will return false as the {{file}} have reference to current 
verification log. Hence this entry will not be appended to current verification 
log and block will be re-scanned after next roll.
{code:java}                if (logIterator.isPrevious()) {
                  // write the log entry to current file
                  // so that the entry is preserved for later runs.
                  verificationLog.append(entry.verificationTime, entry.genStamp,
                      entry.blockId);
                }
{code}

But {{logIterator.isLastReadFromPrevious()}} will return the true in this case 
and no entry from prev dncp log will be missed.
                
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch
>
>
> BlockScanner scans the block twice, also on restart of datanode scans 
> everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on 
> completion of scan for written block.
> Each time datanode scans new block, it also scans, previous block which is 
> already scanned. 
> Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to