[ https://issues.apache.org/jira/browse/BEAM-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309455#comment-17309455 ]
Jens Wiren commented on BEAM-11959: ----------------------------------- [~ibzib] Just retested this with: TFX 0.28.0 apache/beam_python3.7_sdk:2.28.0 apache/flink:1.12.2-scala_2.11 With the exact same result, execution hangs in beam worker when it tries to install the TFX package. Only difference is that the job timouts after 5 mins and fails instead of hanging indefinately. Beam sdk logs still show the same. > Python Beam SDK Harness hangs when installing pip packages > ---------------------------------------------------------- > > Key: BEAM-11959 > URL: https://issues.apache.org/jira/browse/BEAM-11959 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness > Affects Versions: 2.27.0, 2.28.0 > Environment: Kubernetes v1.19.6 > Reporter: Jens Wiren > Priority: P1 > Attachments: jobmanager-configmap.yaml, jobmanager-deploy.yaml, > jobmanager-svc.yaml, taskmanager-deploy.yaml > > > When running a Beam pipeline using Flink as backend, the python sdk harness > hangs when trying to install pip packages. Tested using Flink 1.10.3. > Images used: > apache/beam_python3.7_sdk:2.28.0 > apache/flink:1.10.3 > Beam args used are: > "--runner=FlinkRunner", > "--flink_version=1.10", > "--flink_master=http://flink-jobmanager.default:8081", > f"--artifacts_dir=/mnt/flink", > "--environment_type=EXTERNAL", > "--environment_config=localhost:50000", > > Specifically this was tested by running a TFX pipeline which gets submitted > and registered as it should, but the SDK Harness hangs when installing: > 2021/03/10 12:16:20 Initializing python harness: /opt/apache/beam/boot > --id=1-1 --logging_endpoint=localhost:39795 > --artifact_endpoint=localhost:34095 --provision_endpoint=localhost:42999 > --control_endpoint=localhost:38129 > 2021/03/10 12:16:20 Found artifact: tfx_ephemeral-0.27.0.tar.gz > 2021/03/10 12:16:20 Found artifact: extra_packages.txt > 2021/03/10 12:16:20 Installing setup packages ... > 2021/03/10 12:16:20 Installing extra package: tfx_ephemeral-0.27.0.tar.gz > and nothing else is shown irregardless how long it is left. I can manually > install the TFX package by exec into the container in < 3 min. > The Flink task-manager then waits idling and periodically logs: > 2021-03-10 11:29:26,287 INFO > org.apache.beam.runners.fnexecution.environment.ExternalEnvironmentFactory - > Still waiting for startup of environment from localhost:50000 for worker id > 1-1 > Helm charts attached below. -- This message was sent by Atlassian Jira (v8.3.4#803005)