[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771792#comment-13771792 ] Hudson commented on HDFS-5031: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #337 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/337/]) HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771869#comment-13771869 ] Hudson commented on HDFS-5031: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1553 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1553/]) HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771879#comment-13771879 ] Hudson commented on HDFS-5031: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1527 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1527/]) HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770619#comment-13770619 ] Vinay commented on HDFS-5031: - bq. I am still not convinced that the assignment to lastReadFile before the call to readNext is correct. Is lastReadFile meant to store the file from which the last line was read? If so then the call to readNext can change file, or did I understand it wrong? Here I agree that, {{readNext()}} will change the reference of {{file}}, but {{next()}} will return the {{curLine}} which was read in the previous call of {{readNext()}}, so since we are using the value of line before {{readNext()}} in current call, we should also have the previous value of {{file}} for {{lastReadFile}}. Otherwise, following problem will come. # Consider {{RollingLogsImpl#next()}} call is expected to return the last but one entry from {{dncp_block_verification.log.prev}}, during this time {{RollingLogsImpl#readNext()}} would read the last entry and keep in {{line}} # one more call to {{RollingLogsImpl#next()}}will return last entry read in previous call, but this time {{readNext()}} will open {{dncp_block_verification.log.cur}} and change {{file}} to {{dncp_block_verification.log.cur}}. # Now in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} while processing the last entry from prev dncp log, if {{logIterator.isPrevious()}} is called, then it will return false as the {{file}} have reference to current verification log. Hence this entry will not be appended to current verification log and block will be re-scanned after next roll. {code:java}if (logIterator.isPrevious()) { // write the log entry to current file // so that the entry is preserved for later runs. verificationLog.append(entry.verificationTime, entry.genStamp, entry.blockId); } {code} But {{logIterator.isLastReadFromPrevious()}} will return the true in this case and no entry from prev dncp log will be missed. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771157#comment-13771157 ] Arpit Agarwal commented on HDFS-5031: - Okay that makes sense. The patch looks good, I will commit this shortly. Thanks! BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771199#comment-13771199 ] Arpit Agarwal commented on HDFS-5031: - I have committed this to trunk and branch-2. Thanks for the submitting the patch and your patience, Vinay. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771186#comment-13771186 ] Hudson commented on HDFS-5031: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4436 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4436/]) HDFS-5031. BlockScanner scans the block multiple times. (Vinay via Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524553) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RollingLogs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771579#comment-13771579 ] Vinay commented on HDFS-5031: - Thanks Arpit for reviews and commit. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769692#comment-13769692 ] Hadoop QA commented on HDFS-5031: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603601/HDFS-5031.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4982//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4982//console This message is automatically generated. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770189#comment-13770189 ] Arpit Agarwal commented on HDFS-5031: - Hi Vinay, thanks for the updated patch. I verified that the new test case fails without your code changes. The patch looks good except for one point. I am still not convinced that the assignment to {{lastReadFile}} before the call to {{readNext}} is correct. Is {{lastReadFile}} meant to store the file from which the last line was read? If so then the call to {{readNext}} can change {{file}}, or did I understand it wrong? {code} private void readNext() throws IOException { ... if (line == null) { // move to the next file. if (openFile()) { readNext(); } {code} {quote} processedBlocks is getting reset for every log roll, but bytesLeft is getting reset only for every startNewPeriod(), so on every log roll unnecessory bytesLeft was getting decremented in assignInitialVerificationTimes() which was resulting in negative values of bytesLeft. Due to this scanning was returning from workRemainingInCurrentPeriod() without scanning latest blocks. We should decrement it only once after starting the new period. {quote} Thanks for the explanation, I understand what you are trying to fix now. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764063#comment-13764063 ] Vinay commented on HDFS-5031: - bq. I ran TestDatanodeBlockScanner#testDuplicateScans without the rest of the code changes and it continues to pass. Do you see the same? Yes. I also observed yesterday. I had missed one assertion. Will be updated in upcoming patch bq. I did not understand how the isNewPeriod check works. I will continue to take a look but meanwhile if someone more familiar with this code wants to chime in please do so. {{processedBlocks}} is getting reset for every log roll, but {{bytesLeft}} is getting reset only for every {{startNewPeriod()}}, so on every log roll unnecessory {{bytesLeft}} was getting decremented in {{assignInitialVerificationTimes()}} which was resulting in negative values of bytesLeft. Due to this scanning was returning from {{workRemainingInCurrentPeriod()}} without scanning latest blocks. We should decrement it only once after starting the new period. bq. BlockScanInfo#equals looks redundant now. Can we just remove it? Yes, I will remove in next patch. bq. In Reader#next, should the assignment to lastReadFile happen after the call to readNext? Since {{Reader#next}} is not actually reading again and returning. Its returning previously read line only. So assignment of {{lastReadFile }} before {{readNext}} is correct. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764148#comment-13764148 ] Hadoop QA commented on HDFS-5031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12602541/HDFS-5031.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4954//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4954//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4954//console This message is automatically generated. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764158#comment-13764158 ] Vinay commented on HDFS-5031: - {quote}BlockScanInfo#equals looks redundant now. Can we just remove it? Yes, I will remove in next patch{quote} Find bug is due to this.. it seems overriding equals() even though redundant, is necessary. Any thoughts.? BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764550#comment-13764550 ] Arpit Agarwal commented on HDFS-5031: - We can add it back if findbugs is unhappy. I'll take a look at the updated patch. Thanks. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch, HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763413#comment-13763413 ] Arpit Agarwal commented on HDFS-5031: - Hi Vinay, thanks for the explanation! I apologize for the delay in reviewing this. # I ran {{TestDatanodeBlockScanner#testDuplicateScans}} without the rest of the code changes and it continues to pass. Do you see the same? # I did not understand how the {{isNewPeriod}} check works. I will continue to take a look but meanwhile if someone more familiar with this code wants to chime in please do so. Minor points: # {{BlockScanInfo#equals}} looks redundant now. Can we just remove it? # In {{Reader#next}}, should the assignment to {{lastReadFile}} happen after the call to {{readNext}}? BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751930#comment-13751930 ] Arpit Agarwal commented on HDFS-5031: - Hi [~vinayrpet], Could you please describe your approach briefly to make the code review easier? {quote} if you feel, its not actually blocker, can change it to Major. No issues. {quote} I have downgraded it to Major. Thanks. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752092#comment-13752092 ] Vinay commented on HDFS-5031: - Thanks [~arpitagarwal] for taking look at Jira, bq. Could you please describe your approach briefly to make the code review easier? Sure. There were multiple issues: 1. Storing and retrieval from {{blockMap}}. Basically problem was introduced when LightWeightGSet was used instead of HashMap for 'blockMap' in BlockPoolSliceScanner and BlockScanInfo.equals() was re-written. {{BlockScanInfo.equals()}} is strictly checking for the instance of BlockScanInfo, but in almost all retrievals from {{blockMap}} are done using instance of {{Block}}, so always will get null value and hence scan will happen again. 2. {{logIterator.isPrevious()}} was mistakenly considering lastEntry in previous dncp log to be present in current. This was happening when prev log's last entry was read and before returning the entry, stream was being opened to read current dncp log. At that time {{logIterator.isPrevious()}} was returning false. So every log roll was missing one entry from scan log info. Hence scan for these missed blocks will happen again before scan period (i.e. 21 days by default). 3. After fix of earlier 2 issues, one more issue will come with the invalid value of {{bytesLeft}}. After one log roll, scanning itself will not happen until we write some bunch of blocks (actually equal to same number of bytes before roll) again. This is because {{bytesLeft}} should be incremented when the block was added and should be decremented when the block is scanned. But it was decrementing everytime roll happens. this was taking {{bytesLeft}} to negative value, hence scanner was just returning from {{workRemainingInCurrentPeriod()}} without scanning new blocks. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723858#comment-13723858 ] Vinay commented on HDFS-5031: - scanning is happening multiple times. also if datanode restarted, then scanning happens for all blocks. Since this is a major problem in datanode block scanner. if you feel, its not actually blocker, can change it to Major. No issues. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723351#comment-13723351 ] Aaron T. Myers commented on HDFS-5031: -- Hi Vinay, not sure that this should be considered a blocker. Is it a regression? What actual problem is it causing? BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
[ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719666#comment-13719666 ] Hadoop QA commented on HDFS-5031: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594172/HDFS-5031.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4731//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4731//console This message is automatically generated. BlockScanner scans the block multiple times and on restart scans everything --- Key: HDFS-5031 URL: https://issues.apache.org/jira/browse/HDFS-5031 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5031.patch BlockScanner scans the block twice, also on restart of datanode scans everything. Steps: 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. Each time datanode scans new block, it also scans, previous block which is already scanned. Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira