GCS staging location when uploading artifacts from python for dataflow

Steve Niemitz Tue, 07 Dec 2021 09:49:38 -0800

I noticed that the python dataflow runner appends some "uniqueness" (the
timestamp) [1] to the staging directory when staging artifacts for a
dataflow job.  This is very suboptimal because it makes caching artifacts
between job runs useless.


The jvm runner doesn't do this, is there a good reason the python one
does?  Or is this just an oversight that hasn't been fixed yet?

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L467

GCS staging location when uploading artifacts from python for dataflow

Reply via email to