[ 
https://issues.apache.org/jira/browse/BEAM-12792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481350#comment-17481350
 ] 

Valentyn Tymofieiev commented on BEAM-12792:
--------------------------------------------

Note that Beam SDK container comes with Beam dependencies installed in the 
default environment. creating a new environment at startup of the container 
would not have the preinstalled dependencies, and they will be reinstalled on 
startup when Beam SDK tarball is installed; also the size of the container will 
keep growing with each environment created if it's not cleaned up, so this 
solution would need more work to iron out, but it would nevertheless be useful 
to verify if it addresses the issue.

> Multiple jobs running on Flink session cluster reuse the persistent Python 
> environment.
> ---------------------------------------------------------------------------------------
>
>                 Key: BEAM-12792
>                 URL: https://issues.apache.org/jira/browse/BEAM-12792
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.27.0, 2.28.0, 2.29.0, 2.30.0, 2.31.0
>         Environment: Kubernetes 1.20 on Ubuntu 18.04.
>            Reporter: Jens Wiren
>            Priority: P1
>              Labels: FlinkRunner, beam
>
> I'm running TFX pipelines on a Flink cluster using Beam in k8s. However, 
> extra python packages passed to the Flink runner (or rather beam worker 
> side-car) are only installed once per deployment cycle. Example:
>  # Flink is deployed and is up and running
>  # A TFX pipeline starts, submits a job to Flink along with a python whl of 
> custom code and beam ops.
>  # The beam worker installs the package and the pipeline finishes succesfully.
>  # A new TFX pipeline is build where a new beam fn is introduced, the pipline 
> is started and the new whl is submitted as in step 2).
>  # This time, the new package is not being installed in the beam worker 
> causing the job to fail due to a reference which does not exist in the beam 
> worker, since it didn't install the new package.
>  
> I started using Flink from beam version 2.27 and it has been an issue all the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to