Adding relevant folks +Chamikara Jayalath <[email protected]> +Pablo
Estrada <[email protected]>

This proposal makes sense to me. It makes it easier for users to reason
about why a temp directory is chosen, and would lead to a unified code
across all IOs that does this.

On Thu, Sep 9, 2021 at 11:37 AM Claire McGinty <[email protected]>
wrote:

> Hi Beam devs,
>
> I have a question/proposal about the default tempDirectory setting for
> file-based IOs. AvroIO, FileIO, TextIO all provide Builders with an
> optional tempDirectory setter, and when the transforms are expanded,
> tempDirectory will default to the value of the final output directory if
> null [AvroIO
> <https://github.com/apache/beam/blob/1d4a9ccd11c14ac6f0a2de1cc438a881244ede0a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L1634-L1637>
> /FileIO
> <https://github.com/apache/beam/blob/f759a5c7fe34c3d9e39cc21bb78cdc5da0a13eb1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1284-L1290>
> /TextIO
> <https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L992-L995>
> ].
>
> I think it would make sense to default to the value of
> PipelineOptions#getTempLocation instead, which is accessible inside the 
> expand(PCollection<T>
> input) method; it seems reasonable for the user to expect that their
> PipelineOptions#getTempLocation will be honored, and additionally, their
> final output locations may have locks/retention policies set that make the
> temp file renaming step fail. Plus, this pattern looks like it's already
> being used in BigQueryIO
> <https://github.com/apache/beam/blob/e76b4db30a90d8f351e807cb247a707e7a3c566c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L944>
> .
>
> What do you think?
>
> Thanks!
> Claire
>
>
>

Reply via email to