Hi Beam devs,

I have a question/proposal about the default tempDirectory setting for
file-based IOs. AvroIO, FileIO, TextIO all provide Builders with an
optional tempDirectory setter, and when the transforms are expanded,
tempDirectory will default to the value of the final output directory if
null [AvroIO
<https://github.com/apache/beam/blob/1d4a9ccd11c14ac6f0a2de1cc438a881244ede0a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L1634-L1637>
/FileIO
<https://github.com/apache/beam/blob/f759a5c7fe34c3d9e39cc21bb78cdc5da0a13eb1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1284-L1290>
/TextIO
<https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L992-L995>
].

I think it would make sense to default to the value of
PipelineOptions#getTempLocation instead, which is accessible inside
the expand(PCollection<T>
input) method; it seems reasonable for the user to expect that their
PipelineOptions#getTempLocation will be honored, and additionally, their
final output locations may have locks/retention policies set that make the
temp file renaming step fail. Plus, this pattern looks like it's already
being used in BigQueryIO
<https://github.com/apache/beam/blob/e76b4db30a90d8f351e807cb247a707e7a3c566c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L944>
.

What do you think?

Thanks!
Claire

Reply via email to