[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770189#comment-13770189
 ] 

Arpit Agarwal commented on HDFS-5031:
-------------------------------------

Hi Vinay, thanks for the updated patch. I verified that the new test case fails 
without your code changes.

The patch looks good except for one point. I am still not convinced that the 
assignment to {{lastReadFile}} before the call to {{readNext}} is correct. Is 
{{lastReadFile}} meant to store the file from which the last line was read? If 
so then the call to {{readNext}} can change {{file}}, or did I understand it 
wrong?

{code}
    private void readNext() throws IOException {
...
        if (line == null) {
          // move to the next file.
          if (openFile()) {
            readNext();
          }
{code}

{quote}
processedBlocks is getting reset for every log roll, but bytesLeft is getting 
reset only for every startNewPeriod(), so on every log roll unnecessory 
bytesLeft was getting decremented in assignInitialVerificationTimes() which was 
resulting in negative values of bytesLeft. Due to this scanning was returning 
from workRemainingInCurrentPeriod() without scanning latest blocks. We should 
decrement it only once after starting the new period.
{quote}

Thanks for the explanation, I understand what you are trying to fix now.

                
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch
>
>
> BlockScanner scans the block twice, also on restart of datanode scans 
> everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on 
> completion of scan for written block.
> Each time datanode scans new block, it also scans, previous block which is 
> already scanned. 
> Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to