[ https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315174 ]
ASF GitHub Bot logged work on BEAM-7945: ---------------------------------------- Author: ASF GitHub Bot Created on: 19/Sep/19 16:50 Start Date: 19/Sep/19 16:50 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9452: [BEAM-7945] Allow runner to configure semi_persist_dir which is used … URL: https://github.com/apache/beam/pull/9452#discussion_r326278269 ########## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ########## @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep // host networking on Mac) .add("--env=DOCKER_MAC_CONTAINER=" + System.getenv("DOCKER_MAC_CONTAINER")); - List<String> args = - ImmutableList.of( - String.format("--id=%s", workerId), - String.format("--logging_endpoint=%s", loggingEndpoint), - String.format("--artifact_endpoint=%s", artifactEndpoint), - String.format("--provision_endpoint=%s", provisionEndpoint), - String.format("--control_endpoint=%s", controlEndpoint)); + Boolean retainDockerContainer = + pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers(); + if (!retainDockerContainer) { + dockerOptsBuilder.add("--rm"); + } + + String semiPersistDir = pipelineOptions.as(RemoteEnvironmentOptions.class).getSemiPersistDir(); + ImmutableList.Builder<String> argsBuilder = + ImmutableList.<String>builder() + .add(String.format("--id=%s", workerId)) + .add(String.format("--logging_endpoint=%s", loggingEndpoint)) + .add(String.format("--artifact_endpoint=%s", artifactEndpoint)) + .add(String.format("--provision_endpoint=%s", provisionEndpoint)) + .add(String.format("--control_endpoint=%s", controlEndpoint)); + if (semiPersistDir != null) { Review comment: So we essentially pass the same piece of information to the worker twice: As entry point argument and then again within the pipeline options. It needs to be done this way due to the container contract. Would be nice to revisit in the future. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 315174) Time Spent: 3h 40m (was: 3.5h) > Allow runner to configure "semi_persist_dir" which is used in the SDK harness > ----------------------------------------------------------------------------- > > Key: BEAM-7945 > URL: https://issues.apache.org/jira/browse/BEAM-7945 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core > Reporter: sunjincheng > Assignee: sunjincheng > Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently "semi_persist_dir" is not configurable. This may become a problem > in certain scenarios. For example, the default value of "semi_persist_dir" is > "/tmp" > ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48]) > in Python SDK harness. When the environment type is "PROCESS", the disk of > "/tmp" may be filled up and unexpected issues will occur in production > environment. We should provide a way to configure "semi_persist_dir" in > EnvironmentFactory at the runner side. -- This message was sent by Atlassian Jira (v8.3.4#803005)