[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600872#comment-17600872 ]
ASF GitHub Bot commented on HDFS-16689: --------------------------------------- hadoop-yetus commented on PR #4744: URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1238374829 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 59s | | trunk passed | | +1 :green_heart: | compile | 1m 47s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 34s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 23s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 43s | | trunk passed | | +1 :green_heart: | javadoc | 1m 21s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 1s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 1m 27s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 1s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 285 unchanged - 1 fixed = 286 total (was 286) | | +1 :green_heart: | mvnsite | 1m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 33s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 30s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 22s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 262m 4s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 19s | | The patch does not generate ASF License warnings. | | | | 372m 35s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4744 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 4c4ce04c083f 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 58002d0e78874a22190a43904eefae665aa1d983 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/testReport/ | | Max. process+thread count | 3416 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Standby NameNode crashes when transitioning to Active with in-progress tailer > ----------------------------------------------------------------------------- > > Key: HDFS-16689 > URL: https://issues.apache.org/jira/browse/HDFS-16689 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Standby NameNode crashes when transitioning to Active with a in-progress > tailer. And the error message like blew: > {code:java} > Caused by: java.lang.IllegalStateException: Cannot start writing at txid X > when there is a stream available for read: ByteStringEditLog[X, Y], > ByteStringEditLog[X, 0] > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132) > ... 36 more > {code} > After tracing and found there is a critical bug in > *EditlogTailer#catchupDuringFailover()* when > *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* > try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. > It may cannot replay any edits when they are some abnormal JournalNodes. > Reproduce method, suppose: > - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode > is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, > JN1 and JN2. > - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully > synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or > restarted. > - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover > active from NN0 to NN1. > - NN1 only got two responses from JN0 and JN1 when it try to selecting > inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count > txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad > network or restarted. > - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes > because the *maxAllowedTxns* is 0. > So I think Standby NameNode should *catchupDuringFailover()* with > *onlyDurableTxns=false* , so that it can replay all missed edits from > JournalNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org