[ https://issues.apache.org/jira/browse/HDFS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012665#comment-13012665 ]
Hadoop QA commented on HDFS-1767: --------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474913/DelaySecondBR_v3.patch against trunk revision 1086654. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//console This message is automatically generated. > Namenode should ignore non-initial block reports from datanodes when in > safemode during startup > ----------------------------------------------------------------------------------------------- > > Key: HDFS-1767 > URL: https://issues.apache.org/jira/browse/HDFS-1767 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Affects Versions: 0.20.204, 0.22.0, 0.23.0 > Reporter: Matt Foley > Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: DelaySecondBR_v1.patch, DelaySecondBR_v2.patch, > DelaySecondBR_v3-0.20-security.patch, DelaySecondBR_v3.patch, table.csv, > table_tab.csv > > > Consider a large cluster that takes 40 minutes to start up. The datanodes > compete to register and send their Initial Block Reports (IBRs) as fast as > they can after startup (subject to a small sub-two-minute random delay, which > isn't relevant to this discussion). > As each datanode succeeds in sending its IBR, it schedules the starting time > for its regular cycle of reports, every hour (or other configured value of > dfs.blockreport.intervalMsec). In order to spread the reports evenly across > the block report interval, each datanode picks a random fraction of that > interval, for the starting point of its regular report cycle. For example, > if a particular datanode ends up randomly selecting 18 minutes after the > hour, then that datanode will send a Block Report at 18 minutes after the > hour every hour as long as it remains up. Other datanodes will start their > cycles at other randomly selected times. This code is in > DataNode.blockReport() and DataNode.scheduleBlockReport(). > The "second Block Report" (2BR), is the start of these hourly reports. The > problem is that some of these 2BRs get scheduled sooner rather than later, > and actually occur within the startup period. For example, if the cluster > takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that > succeed in sending their IBRs during the first 10 minutes, between 1/2 and > 2/3 of them will send their 2BR before the 40-minute startup time has > completed! > 2BRs sent within the startup time actually compete with the remaining IBRs, > and thereby slow down the overall startup process. This can be seen in the > following data, which shows the startup process for a 3700-node cluster that > took about 17 minutes to finish startup: > {noformat} > time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min > 0 1299799498 3042 3042 1969 1969 151 151 0 151 > 1 1299799558 665 3707 1470 3439 248 399 0 248 > 2 1299799618 3707 224 3663 270 669 0 270 > 3 1299799678 3707 14 3677 261 930 3 3 264 > 4 1299799738 3707 23 3700 288 1218 1 4 289 > 5 1299799798 3707 7 3707 258 1476 3 7 261 > 6 1299799858 3707 3707 317 1793 4 11 321 > 7 1299799918 3707 3707 292 2085 6 17 298 > 8 1299799978 3707 3707 292 2377 8 25 300 > 9 1299800038 3707 3707 272 2649 25 272 > 10 1299800098 3707 3707 280 2929 15 40 295 > 11 1299800158 3707 3707 223 3152 14 54 237 > 12 1299800218 3707 3707 143 3295 54 143 > 13 1299800278 3707 3707 141 3436 20 74 161 > 14 1299800338 3707 3707 195 3631 78 152 273 > 15 1299800398 3707 3707 51 3682 209 361 260 > 16 1299800458 3707 3707 25 3707 369 730 394 > 17 1299800518 3707 3707 3707 166 896 166 > 18 1299800578 3707 3707 3707 72 968 72 > 19 1299800638 3707 3707 3707 67 1035 67 > 20 1299800698 3707 3707 3707 75 1110 75 > 21 1299800758 3707 3707 3707 71 1181 71 > 22 1299800818 3707 3707 3707 67 1248 67 > 23 1299800878 3707 3707 3707 62 1310 62 > 24 1299800938 3707 3707 3707 56 1366 56 > 25 1299800998 3707 3707 3707 60 1426 60 > {noformat} > This data was harvested from the startup logs of all the datanodes, and > correlated into one-minute buckets. Each row of the table represents the > progress during one elapsed minute of clock time. It seems that every > cluster startup is different, but this one showed the effect fairly well. > The "starts" column shows that all the nodes started up within the first 2 > minutes, and the "regs" column shows that all succeeded in registering by > minute 6. The IBR column shows a sustained rate of Initial Block Report > processing of 250-300/minute for the first 10 minutes. > The question is why, during minutes 11 through 16, the rate of IBR processing > slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we > see the rate of 2BRs ramping up as more datanodes complete their IBRs. As > the rate increases, they become more effective at competing with the IBRs, > and slow down the IBR processing even more. After the IBRs finally finish in > minute 16, the rate of 2BRs settles down to a steady ~60-70/minute. > In order to decrease competition for locks and other resources, to speed up > IBR processing during startup, we propose to delay 2BRs until later into the > cycle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira