[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481468#comment-17481468
 ] 

Yun Gao commented on FLINK-18356:
---------------------------------

With some more observation, it seems (sorry I still have not got the final 
result yet)
 # Currently the test mostly failed in flink-table/flink-table-planner module. 
The tests of this module contains two parts, the tests and the integration 
tests. The failure always happens in the integration tests parts.
 # In Azure there are two parallel surefire test processes. Since the 
flink-table-planner module has set reuseForks = true, it means the same two 
processes would be used to run all the integration tests. Thus if we do not 
have correctly cleanup or some cases have memory leaking, the memory used would 
keep increasing.
 # By add some print statements to the watchdog process: 
[https://github.com/apache/flink/pull/18486,] from the result 
[https://dev.azure.com/gaoyunhaii/gaoyun-flink/_build/results?buildId=562&view=logs&j=43a593e7-535d-554b-08cc-244368da36b4&t=82d122c0-8bbf-56f3-4c0d-8e3d69630d0f]
 it seems the total memory is indeed 7G as Dawid pointed out, and the memory 
usage is keeping increasing.
 # Fortunately the case could be reproduced locally: by first run _mvn clean 
install_ then run {_}mvn -Dflink.forkCount=2 -Dcheckstyle.skip=true verify -pl 
flink-table/flink-table-planner{_}, the memory of the two processes are keeping 
increasing, the maximum memory required is 4G for each process. This is also 
wired since we have limit the heap to 2G. By adding 
-XX:NativeMemoryTracking=detail to the surefire plugin JVM options, the memory 
tracking result at the end of the tests are as follows.

{code:java}
Native Memory Tracking:Total: reserved=5199583KB +38339KB, committed=3802831KB 
+44371KB-                 Java Heap (reserved=2097152KB, committed=1575936KB)
                            (mmap: reserved=2097152KB, committed=1575936KB)
 
-                     Class (reserved=2342856KB +37546KB, committed=1534368KB 
+42666KB)
                            (classes #193700 +5400)
                            (malloc=38856KB +682KB #351017 +7653)
                            (mmap: reserved=2304000KB +36864KB, 
committed=1495512KB +41984KB)
 
-                    Thread (reserved=48453KB -969KB, committed=48453KB -969KB)
                            (thread #48 -1)
                            (stack: reserved=48188KB -1028KB, committed=48188KB 
-1028KB)
                            (malloc=146KB -3KB #250 -5)
                            (arena=119KB +63 #90 -2)
 
-                      Code (reserved=287969KB +111KB, committed=244357KB 
+1023KB)
                            (malloc=38369KB +111KB #69847 +655)
                            (mmap: reserved=249600KB, committed=205988KB +912KB)
 
-                        GC (reserved=146916KB +24KB, committed=127580KB +24KB)
                            (malloc=36324KB +24KB #148561 +842)
                            (mmap: reserved=110592KB, committed=91256KB)
 
-                  Compiler (reserved=442KB, committed=442KB)
                            (malloc=312KB #8705 +2)
                            (arena=131KB #7)
 
-                  Internal (reserved=177788KB +890KB, committed=177784KB 
+890KB)
                            (malloc=177752KB +890KB #316112 +8088)
                            (mmap: reserved=36KB, committed=32KB)
 
-                    Symbol (reserved=43952KB +93KB, committed=43952KB +93KB)
                            (malloc=42122KB +93KB #393043 +1171)
                            (arena=1830KB #1)
 
-    Native Memory Tracking (reserved=21378KB +675KB, committed=21378KB +675KB)
                            (malloc=1011KB +314KB #14713 +4635)
                            (tracking overhead=20367KB +361KB)
 
-               Arena Chunk (reserved=28582KB -31KB, committed=28582KB -31KB)
                            (malloc=28582KB -31KB)
 
-                   Unknown (reserved=4096KB, committed=0KB)
                            (mmap: reserved=4096KB, committed=0KB)
{code}
It seems the heap part and the classes part contributes to most of the memory 
consumption. 

> Exit code 137 returned from process
> -----------------------------------
>
>                 Key: FLINK-18356
>                 URL: https://issues.apache.org/jira/browse/FLINK-18356
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / Azure Pipelines, Tests
>    Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
>            Reporter: Piotr Nowojski
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>              Labels: pull-request-available, test-stability
>             Fix For: 1.15.0
>
>
> {noformat}
> ============================= test session starts 
> ==============================
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..........                    [  
> 1%]
> pyflink/common/tests/test_execution_config.py .......................    [  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to