[ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704526#comment-16704526 ]
ASF GitHub Bot commented on FLINK-10884: ---------------------------------------- wg1026688210 commented on a change in pull request #7185: [FLINK-10884] [yarn/mesos] adjust container memory param to set a safe margin from offheap memory URL: https://github.com/apache/flink/pull/7185#discussion_r237810314 ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java ########## @@ -158,8 +158,10 @@ public static ContaineredTaskManagerParameters create( // (2) split the remaining Java memory between heap and off-heap final long heapSizeMB = TaskManagerServices.calculateHeapSizeMB(containerMemoryMB - cutoffMB, config); - // use the cut-off memory for off-heap (that was its intention) - final long offHeapSizeMB = containerMemoryMB - heapSizeMB; + // (3) try to compute the offHeapMemory from a safe margin + final long restMemoryMB = containerMemoryMB - heapSizeMB; + final long offHeapCutoffMemory = calculateOffHeapCutoffMB(config, restMemoryMB); Review comment: Thank you for your attention. I do add a containerized.offheap-cutoff-ratio for two reason 1. it will be unfriendly for a small memory container such as 1gb memory container if we only use containerized.heap-cutoff-ratio and containerized.heap-cutoff-min to cutoff the memory 0f container physical memory . 2. native memory not only contains memory which used by rockdb state backend but also memory pools are called arenas which will cost enormous memory in multi-core cpu machine as time goes on . You can use 'pmap -x {TM container pid}' command to validate it So I want to add a option to adjust memory flexiblely for someone who wana adjust both heap meomory and offheap memory to adapt the three kinds of memory. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flink on yarn TM container will be killed by nodemanager because of the > exceeded physical memory. > ---------------------------------------------------------------------------------------------------- > > Key: FLINK-10884 > URL: https://issues.apache.org/jira/browse/FLINK-10884 > Project: Flink > Issue Type: Bug > Components: Cluster Management, Core > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Environment: version : 1.6.2 > module : flink on yarn > centos jdk1.8 > hadoop 2.7 > Reporter: wgcn > Assignee: wgcn > Priority: Major > Labels: pull-request-available, yarn > > TM container will be killed by nodemanager because of the exceeded > [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi] > memory. I found the lanuch context lanuching TM container that > "container memory = heap memory+ offHeapSizeMB" at the class > org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters > from line 160 to 166 I set a safety margin for the whole memory container > using. For example if the container limit 3g memory, the sum memory that > "heap memory+ offHeapSizeMB" is equal to 2.4g to prevent the container > being killed.Do we have the > [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] > solution or I can commit my solution -- This message was sent by Atlassian JIRA (v7.6.3#76005)