[ 
https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639734#comment-17639734
 ] 

ASF GitHub Bot commented on HDFS-16550:
---------------------------------------

tomscut commented on PR #4209:
URL: https://github.com/apache/hadoop/pull/4209#issuecomment-1328405738

   > I am -1 on the PR as-is. We have publicly exposed the current config 
`dfs.journalnode.edit-cache-size.bytes`; we can't just rename it and change the 
behavior now. I also think there is a lot of value in being able to configure 
the cache size exactly, rather than as a fraction, but I do recognize the value 
in using a ratio as a helpful default (one less knob to tune). I would propose:
   > 
   > * _Add_ (not replace) a new config 
`dfs.journalnode.edit-cache-size.fraction` (or `.ratio`? but either way I think 
we should maintain the `edit-cache-size` prefix)
   > * If `edit-cache-size.bytes` is set, use that value. Otherwise, use the 
value of `edit-cache-size.fraction * Runtime#maxMemory()`, which has a default 
value set.
   > * I would suggest 0.5 rather than 0.3 for the default value of `fraction` 
but am open to discussion there.
   > 
   > This still does change the default behavior slightly, since before you 
would get a 1GB cache and now you get `-Xmx * 0.5`, but there is an easy way to 
preserve the old behavior and if you've explicitly configured the cache size 
(which you probably did, if you're using the feature) then there is no change.
   
   Hi @xkrogen @ZanderXu , I updated the code according to the suggestion, 
please take a look. Thanks!




> [SBN read] Improper cache-size for journal node may cause cluster crash
> -----------------------------------------------------------------------
>
>                 Key: HDFS-16550
>                 URL: https://issues.apache.org/jira/browse/HDFS-16550
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Tao Li
>            Assignee: Tao Li
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-04-21-09-54-29-751.png, 
> image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When we introduced {*}SBN Read{*}, we encountered a situation during upgrade 
> the JournalNodes.
> Cluster Info: 
> *Active: nn0*
> *Standby: nn1*
> 1. Rolling restart journal node. {color:#ff0000}(related config: 
> fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}
> 2. The cluster runs for a while, edits cache usage is increasing and memory 
> is used up.
> 3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed 
> out waiting 120000ms for a quorum of nodes to respond”{_}.
> 4. Transfer nn1 to Active state.
> 5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of 
> “{_}Timed out waiting 120000ms for a quorum of nodes to respond” too{_}.
> 6. {color:#ff0000}The cluster crashed{color}.
>  
> Related code:
> {code:java}
> JournaledEditsCache(Configuration conf) {
>   capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
>       DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
>   if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
>     Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
>         "maximum JVM memory is only %d bytes. It is recommended that you " +
>         "decrease the cache size or increase the heap size.",
>         capacity, Runtime.getRuntime().maxMemory()));
>   }
>   Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
>       "of bytes: " + capacity);
>   ReadWriteLock lock = new ReentrantReadWriteLock(true);
>   readLock = new AutoCloseableLock(lock.readLock());
>   writeLock = new AutoCloseableLock(lock.writeLock());
>   initialize(INVALID_TXN_ID);
> } {code}
> Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
> than the memory requested by the process. If 
> {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
> Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
> journalnode startup. This can easily be overlooked by users. However, as the 
> cluster runs to a certain period of time, it is likely to cause the cluster 
> to crash.
>  
> NN log:
> !image-2022-04-21-09-54-57-111.png|width=1012,height=47!
> !image-2022-04-21-12-32-56-170.png|width=809,height=218!
> IMO, we should not set the {{cache size}} to a fixed value, but to the ratio 
> of maximum memory, which is 0.2 by default.
> This avoids the problem of too large cache size. In addition, users can 
> actively adjust the heap size when they need to increase the cache size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to