I think it may have been an API design mistake to put the S3 region into
PipelineOptions. PipelineOptions are global per pipeline, whereas it's
totally reasonable to access S3 files in different regions even from the
code of a single DoFn running on a single element. The same applies to
"setS3StorageClass".

Jacob: what do you think? Why is it necessary to specify the S3 region at
all - can AWS infer it automatically? Per
https://github.com/aws/aws-sdk-java/issues/1107 it seems that this is
possible via a setting on the client, so that the specified region is used
as the default but if the bucket is in a different region things still work.

As for the storage class: so far nobody complained ;) but it should
probably be specified via
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/CreateOptions.java
instead
of a pipeline option.

On Thu, Mar 8, 2018 at 9:16 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> The "hint" would probably to use hints :) - indees this joke refers to the
> hint thread.
>
> Long story short with hints you should be able to say "use that specialize
> config here".
>
> Now, personally, I'd like to see a way to specialize config per transform.
> With an hint an easy way is to use a prefix: --s3-region would become
> --prefix_transform1-s3-region. But to impl it i have
> https://github.com/apache/beam/pull/4683 which needs to be merged before
> ;).
>
> Le 8 mars 2018 23:03, "Ismaël Mejía" <ieme...@gmail.com> a écrit :
>
>> I was trying to create a really simple pipeline that read from a
>> bucket in a filesystem (s3) and writes to a different bucket in the
>> same filesystem.
>>
>>     S3Options options =
>> PipelineOptionsFactory.fromArgs(args).create().as(S3Options.class);
>>     Pipeline pipeline = Pipeline.create(options);
>>     pipeline
>>       .apply("ReadLines", TextIO.read().from("s3://src-bucket/*"))
>>       // .apply("AllOtherMagic", ...)
>>       .apply("WriteCounts", TextIO.write().to("s3://dst-bucket/"));
>>     p.run().waitUntilFinish();
>>
>> I discovered that my original bucket was in a different region so I
>> needed to pass a different S3Options object to the Write
>> ‘options.setAwsRegion(“dst-region”)’, but I could not find a way to do
>> it. Can somebody give me a hint on how to do this?
>>
>> I was wondering that since File-based IOs use the configuration
>> implied by the Filesystem if this was possible. With non-file based
>> IOs all the configuration details are explicit in each specific
>> transform, but this is not the case for these file-based transforms.
>>
>> Note. I know this question probably belongs more to user@ but since I
>> couldn’t find an easy way to do it I was wondering if this is an issue
>> we should consider at dev@ from an API point of view.
>>
>

Reply via email to