[ https://issues.apache.org/jira/browse/BEAM-8935?focusedWorklogId=364911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364911 ]
ASF GitHub Bot logged work on BEAM-8935: ---------------------------------------- Author: ASF GitHub Bot Created on: 31/Dec/19 08:03 Start Date: 31/Dec/19 08:03 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10338: [BEAM-8935] Fail fast if sdk harness startup failed. URL: https://github.com/apache/beam/pull/10338#issuecomment-569884513 Good catch! I have not found direct documentation for this question. However, I did find some useful information which maybe helpful. According to the [doc](https://docs.docker.com/engine/api/v1.40/#operation/ContainerList), the status of a docker container could be one of `created, restarting, running, removing, paused, exited, dead`. So I think we only need to consider if the status of a container could be `created` in race conditions after executing `docker run`(for the other status, it's obvious that there is something wrong). According to [StackOverFlow](https://stackoverflow.com/questions/37744961/docker-run-vs-create), `docker run = docker create + docker start`. I guess that `docker create` will change the state of container to `created` and `docker start` will change the state of container to `running`. It is further explained in [StackOverFlow](https://stackoverflow.com/questions/43734412/what-does-created-container-mean-in-docker) in which case the status of docker container could be "created": `docker create` and `docker run`. For `docker run`, it says that `Docker container has been created using docker run but it hasn't been able to start successfully`. So we can infer that if the docker status become `created` after `docker run`, the docker container isn't started successfully. Besides, there is an unit test in [DockerCommandTest](https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java#L60), it checks that the container becomes `running` after `docker run`. If there is race condition, I guess this test may fail from time to time. So, I think the check logic of current PR would be fine :) What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 364911) Time Spent: 1h 10m (was: 1h) > Fail fast if sdk harness startup failed > --------------------------------------- > > Key: BEAM-8935 > URL: https://issues.apache.org/jira/browse/BEAM-8935 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution > Reporter: sunjincheng > Assignee: sunjincheng > Priority: Major > Fix For: 2.19.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the runner waits for the sdk harness to startup blockingly until > the sdk harness is available or timeout occurs. The timeout is 1 or 2 > minutes. If the sdk harness startup failed for some reason, the runner may be > aware of it after 1 or 2 minutes. This is too long. -- This message was sent by Atlassian Jira (v8.3.4#803005)