[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777899#comment-16777899 ]
Hadoop QA commented on HDFS-14314: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 50s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 52 unchanged - 0 fixed = 64 total (was 52) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 53s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 21s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 32s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14314 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960171/HDFS-14314-trunk.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 24b01f282ab8 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 59ba355 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt | | compile | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | mvnsite | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/testReport/ | | Max. process+thread count | 306 (vs. ulimit of 10000) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > fullBlockReportLeaseId should be reset after registering to NN > -------------------------------------------------------------- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.8.4 > Environment: > > > Reporter: star > Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch, > HDFS-14314.0.patch, HDFS-14314.2.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time.... > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org