[ https://issues.apache.org/jira/browse/FLINK-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xintong Song closed FLINK-21552. -------------------------------- Resolution: Fixed Fixed via * master (1.13): 1e71b1c35c25063ce80abd35d1ac59e437c858c3 * release-1.12: 64647490f3e96bdbdfe535654c667f1ead0b026c > The managed memory was not released if exception was thrown in > createPythonExecutionEnvironment > ----------------------------------------------------------------------------------------------- > > Key: FLINK-21552 > URL: https://issues.apache.org/jira/browse/FLINK-21552 > Project: Flink > Issue Type: Bug > Components: API / Python, Runtime / Coordination > Affects Versions: 1.12.0 > Reporter: Dian Fu > Assignee: Xintong Song > Priority: Major > Labels: pull-request-available > Fix For: 1.13.0, 1.12.3 > > > If there is exception thrown in > [createPythonExecutionEnvironment|https://github.com/apache/flink/blob/3796e59f79a90bd8ad5e6fc37458e2d6cce23139/flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java#L248], > the job will failed with the following exception: > {code:java} > org.apache.flink.runtime.memory.MemoryAllocationException: Could not created > the shared memory resource of size 611948962. Not enough memory left to > reserve from the slot's managed memory. > at > org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:536) > at > org.apache.flink.runtime.memory.SharedResources.createResource(SharedResources.java:126) > at > org.apache.flink.runtime.memory.SharedResources.getOrAllocateSharedResource(SharedResources.java:72) > at > org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:555) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:250) > at > org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:113) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:116) > at > org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:88) > at > org.apache.flink.table.runtime.operators.python.scalar.AbstractRowDataPythonScalarFunctionOperator.open(AbstractRowDataPythonScalarFunctionOperator.java:70) > at > org.apache.flink.table.runtime.operators.python.scalar.RowDataPythonScalarFunctionOperator.open(RowDataPythonScalarFunctionOperator.java:59) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.flink.runtime.memory.MemoryReservationException: Could > not allocate 611948962 bytes, only 0 bytes are remaining. This usually > indicates that you are requesting more memory than you have reserved. > However, when running an old JVM version it can also be caused by slow > garbage collection. Try to upgrade to Java 8u72 or higher if running on an > old Java version. > at > org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:170) > at > org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:84) > at > org.apache.flink.runtime.memory.MemoryManager.reserveMemory(MemoryManager.java:423) > at > org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:534) > ... 17 more > {code} > The reason is that the reserved managed memory was not added back to the > MemoryManager when Job failed because of exceptions thrown in > createPythonExecutionEnvironment. This causes that there is no managed memory > to allocate during failover. -- This message was sent by Atlassian Jira (v8.3.4#803005)