[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771792#comment-13771792
 ] 

Hudson commented on HDFS-5031:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #337 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/337/])
HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit 
Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771869#comment-13771869
 ] 

Hudson commented on HDFS-5031:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1553 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1553/])
HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit 
Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771879#comment-13771879
 ] 

Hudson commented on HDFS-5031:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1527 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1527/])
HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit 
Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-18 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770619#comment-13770619
 ] 

Vinay commented on HDFS-5031:
-

bq. I am still not convinced that the assignment to lastReadFile before the 
call to readNext is correct. Is lastReadFile meant to store the file from which 
the last line was read? If so then the call to readNext can change file, or did 
I understand it wrong?
Here I agree that, {{readNext()}} will change the reference of {{file}}, but 
{{next()}} will return the {{curLine}} which was read in the previous call of 
{{readNext()}}, so since we are using the value of line before {{readNext()}} 
in current call, we should also have the previous value of {{file}} for 
{{lastReadFile}}. Otherwise, following problem will come.
# Consider {{RollingLogsImpl#next()}} call is expected to return the last but 
one entry from {{dncp_block_verification.log.prev}}, during this time 
{{RollingLogsImpl#readNext()}} would read the last entry and keep in {{line}}
# one more call to {{RollingLogsImpl#next()}}will return last entry read in 
previous call, but this time {{readNext()}} will open 
{{dncp_block_verification.log.cur}} and change {{file}} to 
{{dncp_block_verification.log.cur}}.
# Now in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} while 
processing the last entry from prev dncp log, if {{logIterator.isPrevious()}} 
is called, then it will return false as the {{file}} have reference to current 
verification log. Hence this entry will not be appended to current verification 
log and block will be re-scanned after next roll.
{code:java}if (logIterator.isPrevious()) {
  // write the log entry to current file
  // so that the entry is preserved for later runs.
  verificationLog.append(entry.verificationTime, entry.genStamp,
  entry.blockId);
}
{code}

But {{logIterator.isLastReadFromPrevious()}} will return the true in this case 
and no entry from prev dncp log will be missed.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771157#comment-13771157
 ] 

Arpit Agarwal commented on HDFS-5031:
-

Okay that makes sense. The patch looks good, I will commit this shortly.

Thanks!

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771199#comment-13771199
 ] 

Arpit Agarwal commented on HDFS-5031:
-

I have committed this to trunk and branch-2. Thanks for the submitting the 
patch and your patience, Vinay.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771186#comment-13771186
 ] 

Hudson commented on HDFS-5031:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4436 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4436/])
HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit 
Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-18 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771579#comment-13771579
 ] 

Vinay commented on HDFS-5031:
-

Thanks Arpit for reviews and commit.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769692#comment-13769692
 ] 

Hadoop QA commented on HDFS-5031:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603601/HDFS-5031.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4982//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4982//console

This message is automatically generated.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-17 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770189#comment-13770189
 ] 

Arpit Agarwal commented on HDFS-5031:
-

Hi Vinay, thanks for the updated patch. I verified that the new test case fails 
without your code changes.

The patch looks good except for one point. I am still not convinced that the 
assignment to {{lastReadFile}} before the call to {{readNext}} is correct. Is 
{{lastReadFile}} meant to store the file from which the last line was read? If 
so then the call to {{readNext}} can change {{file}}, or did I understand it 
wrong?

{code}
private void readNext() throws IOException {
...
if (line == null) {
  // move to the next file.
  if (openFile()) {
readNext();
  }
{code}

{quote}
processedBlocks is getting reset for every log roll, but bytesLeft is getting 
reset only for every startNewPeriod(), so on every log roll unnecessory 
bytesLeft was getting decremented in assignInitialVerificationTimes() which was 
resulting in negative values of bytesLeft. Due to this scanning was returning 
from workRemainingInCurrentPeriod() without scanning latest blocks. We should 
decrement it only once after starting the new period.
{quote}

Thanks for the explanation, I understand what you are trying to fix now.


 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-11 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764063#comment-13764063
 ] 

Vinay commented on HDFS-5031:
-

bq. I ran TestDatanodeBlockScanner#testDuplicateScans without the rest of the 
code changes and it continues to pass. Do you see the same?
Yes. I also observed yesterday. I had missed one assertion. Will be updated in 
upcoming patch
bq. I did not understand how the isNewPeriod check works. I will continue to 
take a look but meanwhile if someone more familiar with this code wants to 
chime in please do so.
{{processedBlocks}} is getting reset for every log roll, but {{bytesLeft}} is 
getting reset only for every {{startNewPeriod()}}, so on every log roll 
unnecessory {{bytesLeft}} was getting decremented in 
{{assignInitialVerificationTimes()}} which was resulting in negative values of 
bytesLeft. Due to this scanning was returning from 
{{workRemainingInCurrentPeriod()}} without scanning latest blocks. We should 
decrement it only once after starting the new period.

bq. BlockScanInfo#equals looks redundant now. Can we just remove it?
Yes, I will remove in next patch.

bq. In Reader#next, should the assignment to lastReadFile happen after the call 
to readNext?

Since {{Reader#next}} is not actually reading again and returning. Its 
returning previously read line only. So assignment of {{lastReadFile }} before 
{{readNext}} is correct.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764148#comment-13764148
 ] 

Hadoop QA commented on HDFS-5031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602541/HDFS-5031.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4954//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4954//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4954//console

This message is automatically generated.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-11 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764158#comment-13764158
 ] 

Vinay commented on HDFS-5031:
-

{quote}BlockScanInfo#equals looks redundant now. Can we just remove it?
Yes, I will remove in next patch{quote}
Find bug is due to this.. it seems overriding equals() even though redundant, 
is necessary. Any thoughts.?

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-11 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764550#comment-13764550
 ] 

Arpit Agarwal commented on HDFS-5031:
-

We can add it back if findbugs is unhappy. I'll take a look at the updated 
patch. Thanks.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch, HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-09-10 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763413#comment-13763413
 ] 

Arpit Agarwal commented on HDFS-5031:
-

Hi Vinay, thanks for the explanation! I apologize for the delay in reviewing 
this.

# I ran {{TestDatanodeBlockScanner#testDuplicateScans}} without the rest of the 
code changes and it continues to pass. Do you see the same?
# I did not understand how the {{isNewPeriod}} check works. I will continue to 
take a look but meanwhile if someone more familiar with this code wants to 
chime in please do so.

Minor points:
# {{BlockScanInfo#equals}} looks redundant now. Can we just remove it?
# In {{Reader#next}}, should the assignment to {{lastReadFile}} happen after 
the call to {{readNext}}?



 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-08-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751930#comment-13751930
 ] 

Arpit Agarwal commented on HDFS-5031:
-

Hi [~vinayrpet],

Could you please describe your approach briefly to make the code review easier?

{quote}
if you feel, its not actually blocker, can change it to Major. No issues.
{quote}
I have downgraded it to Major.

Thanks.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-08-27 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752092#comment-13752092
 ] 

Vinay commented on HDFS-5031:
-

Thanks [~arpitagarwal] for taking look at Jira,
bq. Could you please describe your approach briefly to make the code review 
easier?

Sure.

There were multiple issues:
1.  Storing and retrieval from {{blockMap}}. Basically problem was introduced 
when LightWeightGSet was used instead of HashMap for 'blockMap' in 
BlockPoolSliceScanner and BlockScanInfo.equals() was re-written.
{{BlockScanInfo.equals()}} is strictly checking for the instance of 
BlockScanInfo, but in almost all retrievals from {{blockMap}} are done using 
instance of {{Block}}, so always will get null value and hence scan will happen 
again.

2. {{logIterator.isPrevious()}} was mistakenly considering lastEntry in 
previous dncp log to be present in current. This was happening when prev log's 
last entry was read and before returning the entry, stream was being opened to 
read current dncp log. At that time  {{logIterator.isPrevious()}} was returning 
false. So every log roll was missing one entry from scan log info. Hence scan 
for these missed blocks will happen again before scan period (i.e. 21 days by 
default). 

3. After fix of earlier 2 issues, one more issue will come with the invalid 
value of {{bytesLeft}}. After one log roll, scanning itself will not happen 
until we write some bunch of blocks (actually equal to same number of bytes 
before roll) again. This is because {{bytesLeft}} should be incremented when 
the block was added and should be decremented when the block is scanned. But it 
was decrementing everytime roll happens. this was taking {{bytesLeft}} to 
negative value, hence scanner was just returning from 
{{workRemainingInCurrentPeriod()}} without scanning new blocks.




 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-07-30 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723858#comment-13723858
 ] 

Vinay commented on HDFS-5031:
-

scanning is happening multiple times. also if datanode restarted, then scanning 
happens for all blocks. Since this is a major problem in datanode block scanner.

if you feel, its not actually blocker, can change it to Major. No issues.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-07-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723351#comment-13723351
 ] 

Aaron T. Myers commented on HDFS-5031:
--

Hi Vinay, not sure that this should be considered a blocker. Is it a 
regression? What actual problem is it causing?

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719666#comment-13719666
 ] 

Hadoop QA commented on HDFS-5031:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594172/HDFS-5031.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4731//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4731//console

This message is automatically generated.

 BlockScanner scans the block multiple times and on restart scans everything
 ---

 Key: HDFS-5031
 URL: https://issues.apache.org/jira/browse/HDFS-5031
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5031.patch


 BlockScanner scans the block twice, also on restart of datanode scans 
 everything.
 Steps:
 1. Write blocks with interval of more than 5 seconds. write new block on 
 completion of scan for written block.
 Each time datanode scans new block, it also scans, previous block which is 
 already scanned. 
 Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira