[jira] [Commented] (FLINK-21552) The managed memory was not released if exception was thrown in createPythonExecutionEnvironment

Dian Fu (Jira) Mon, 01 Mar 2021 23:58:41 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293477#comment-17293477
 ]


Dian Fu commented on FLINK-21552:
---------------------------------

Thanks [~xintongsong] for the analysis. Agree with you. I think we should 
handle properly the case when exception was thrown in initializer.apply(). 

cc [~sewen]

> The managed memory was not released if exception was thrown in 
> createPythonExecutionEnvironment
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-21552
>                 URL: https://issues.apache.org/jira/browse/FLINK-21552
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python, Runtime / Coordination
>    Affects Versions: 1.12.0
>            Reporter: Dian Fu
>            Assignee: Xintong Song
>            Priority: Major
>             Fix For: 1.13.0, 1.12.3
>
>
> If there is exception thrown in 
> [createPythonExecutionEnvironment|https://github.com/apache/flink/blob/3796e59f79a90bd8ad5e6fc37458e2d6cce23139/flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java#L248],
>  the job will failed with the following exception:
> {code:java}
> org.apache.flink.runtime.memory.MemoryAllocationException: Could not created 
> the shared memory resource of size 611948962. Not enough memory left to 
> reserve from the slot's managed memory.
> at 
> org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:536)
> at 
> org.apache.flink.runtime.memory.SharedResources.createResource(SharedResources.java:126)
> at 
> org.apache.flink.runtime.memory.SharedResources.getOrAllocateSharedResource(SharedResources.java:72)
> at 
> org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:555)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:250)
> at 
> org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:113)
> at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:116)
> at 
> org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:88)
> at 
> org.apache.flink.table.runtime.operators.python.scalar.AbstractRowDataPythonScalarFunctionOperator.open(AbstractRowDataPythonScalarFunctionOperator.java:70)
> at 
> org.apache.flink.table.runtime.operators.python.scalar.RowDataPythonScalarFunctionOperator.open(RowDataPythonScalarFunctionOperator.java:59)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
> at java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.flink.runtime.memory.MemoryReservationException: Could 
> not allocate 611948962 bytes, only 0 bytes are remaining. This usually 
> indicates that you are requesting more memory than you have reserved. 
> However, when running an old JVM version it can also be caused by slow 
> garbage collection. Try to upgrade to Java 8u72 or higher if running on an 
> old Java version.
> at 
> org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:170)
> at 
> org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:84)
> at 
> org.apache.flink.runtime.memory.MemoryManager.reserveMemory(MemoryManager.java:423)
> at 
> org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:534)
> ... 17 more
> {code}
> The reason is that the reserved managed memory was not added back to the 
> MemoryManager when Job failed because of exceptions thrown in 
> createPythonExecutionEnvironment. This causes that there is no managed memory 
> to allocate during failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21552) The managed memory was not released if exception was thrown in createPythonExecutionEnvironment

Reply via email to