I noticed that the python dataflow runner appends some "uniqueness" (the
timestamp) [1] to the staging directory when staging artifacts for a
dataflow job.  This is very suboptimal because it makes caching artifacts
between job runs useless.

The jvm runner doesn't do this, is there a good reason the python one
does?  Or is this just an oversight that hasn't been fixed yet?

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L467

Reply via email to