[ https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=401769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-401769 ]
ASF GitHub Bot logged work on BEAM-9474: ---------------------------------------- Author: ASF GitHub Bot Created on: 11/Mar/20 20:43 Start Date: 11/Mar/20 20:43 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11084: [BEAM-9474] Improve robustness of BundleFactory and ProcessEnvironment URL: https://github.com/apache/beam/pull/11084#discussion_r391256990 ########## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java ########## @@ -352,20 +407,18 @@ public RemoteBundle getBundle( // The blocking queue of caches for serving multiple bundles concurrently. currentCache = availableCaches.take(); client = currentCache.getUnchecked(executableStage.getEnvironment()); - client.ref(); Review comment: Yes, that makes sense. I've already reverted the change. I suppose there is a race condition where we retrieve an environment X and before we can call `ref()` on it, we evict the environment X, close all its references, and shut it down. This will result in a job restart. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 401769) Time Spent: 6h 50m (was: 6h 40m) > Environment cleanup is not robust enough and may leak resources > --------------------------------------------------------------- > > Key: BEAM-9474 > URL: https://issues.apache.org/jira/browse/BEAM-9474 > Project: Beam > Issue Type: Bug > Components: java-fn-execution > Reporter: Maximilian Michels > Assignee: Maximilian Michels > Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} > s may leak resources. This is especially a concern when the execution engines > reuses the same JVM or underlying machines for multiple runs of a pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005)