[ https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640374#comment-17640374 ]
ASF GitHub Bot commented on HDFS-16550: --------------------------------------- hadoop-yetus commented on PR #4209: URL: https://github.com/apache/hadoop/pull/4209#issuecomment-1330060886 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 58s | | trunk passed | | +1 :green_heart: | compile | 1m 40s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 22s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 48s | | trunk passed | | +1 :green_heart: | javadoc | 1m 18s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 57s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 29s | | the patch passed | | +1 :green_heart: | compile | 1m 40s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 39s | | the patch passed | | +1 :green_heart: | compile | 1m 32s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 10s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 42s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 47s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 28s | | the patch passed | | -1 :x: | shadedclient | 36m 26s | | patch has errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 16m 54s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +0 :ok: | asflicense | 0m 39s | | ASF License check generated no output? | | | | 144m 56s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.fs.viewfs.TestViewFileSystemLinkFallback | | | hadoop.fs.viewfs.TestViewFsWithAcls | | | hadoop.fs.permission.TestStickyBit | | | hadoop.fs.contract.hdfs.TestHDFSContractSetTimes | | | hadoop.fs.contract.hdfs.TestHDFSContractPathHandle | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4209 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint | | uname | Linux 9ee32daf0144 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 11b70ea65c625e55a4b9052434567563abd6432e | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/8/testReport/ | | Max. process+thread count | 556 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/8/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [SBN read] Improper cache-size for journal node may cause cluster crash > ----------------------------------------------------------------------- > > Key: HDFS-16550 > URL: https://issues.apache.org/jira/browse/HDFS-16550 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Tao Li > Assignee: Tao Li > Priority: Major > Labels: pull-request-available > Attachments: image-2022-04-21-09-54-29-751.png, > image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png > > Time Spent: 1h > Remaining Estimate: 0h > > When we introduced {*}SBN Read{*}, we encountered a situation during upgrade > the JournalNodes. > Cluster Info: > *Active: nn0* > *Standby: nn1* > 1. Rolling restart journal node. {color:#ff0000}(related config: > fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color} > 2. The cluster runs for a while, edits cache usage is increasing and memory > is used up. > 3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed > out waiting 120000ms for a quorum of nodes to respond”{_}. > 4. Transfer nn1 to Active state. > 5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of > “{_}Timed out waiting 120000ms for a quorum of nodes to respond” too{_}. > 6. {color:#ff0000}The cluster crashed{color}. > > Related code: > {code:java} > JournaledEditsCache(Configuration conf) { > capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, > DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT); > if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) { > Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " + > "maximum JVM memory is only %d bytes. It is recommended that you " + > "decrease the cache size or increase the heap size.", > capacity, Runtime.getRuntime().maxMemory())); > } > Journal.LOG.info("Enabling the journaled edits cache with a capacity " + > "of bytes: " + capacity); > ReadWriteLock lock = new ReentrantReadWriteLock(true); > readLock = new AutoCloseableLock(lock.readLock()); > writeLock = new AutoCloseableLock(lock.writeLock()); > initialize(INVALID_TXN_ID); > } {code} > Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size > than the memory requested by the process. If > {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * > Runtime.getruntime().maxMemory(){*}, only warn logs are printed during > journalnode startup. This can easily be overlooked by users. However, as the > cluster runs to a certain period of time, it is likely to cause the cluster > to crash. > > NN log: > !image-2022-04-21-09-54-57-111.png|width=1012,height=47! > !image-2022-04-21-12-32-56-170.png|width=809,height=218! > IMO, we should not set the {{cache size}} to a fixed value, but to the ratio > of maximum memory, which is 0.2 by default. > This avoids the problem of too large cache size. In addition, users can > actively adjust the heap size when they need to increase the cache size. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org