Yes, I agree with all of this. Jacob
On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw <rober...@google.com> wrote: > On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov <kirpic...@google.com> > wrote: > >> I think it may have been an API design mistake to put the S3 region into >> PipelineOptions. >> > > +1, IMHO it's generally a mistake to put any transform configuration into > PipelineOptions for exactly this reason. > > >> PipelineOptions are global per pipeline, whereas it's totally reasonable >> to access S3 files in different regions even from the code of a single DoFn >> running on a single element. The same applies to "setS3StorageClass". >> >> Jacob: what do you think? Why is it necessary to specify the S3 region at >> all - can AWS infer it automatically? Per https://github.com/aws/ >> aws-sdk-java/issues/1107 it seems that this is possible via a setting on >> the client, so that the specified region is used as the default but if the >> bucket is in a different region things still work. >> >> As for the storage class: so far nobody complained ;) but it should >> probably be specified via https://github.com/apache/ >> beam/blob/master/sdks/java/core/src/main/java/org/apache/ >> beam/sdk/io/fs/CreateOptions.java instead of a pipeline option. >> >> On Thu, Mar 8, 2018 at 9:16 PM Romain Manni-Bucau <rmannibu...@gmail.com> >> wrote: >> >>> The "hint" would probably to use hints :) - indees this joke refers to >>> the hint thread. >>> >>> Long story short with hints you should be able to say "use that >>> specialize config here". >>> >>> Now, personally, I'd like to see a way to specialize config per >>> transform. With an hint an easy way is to use a prefix: --s3-region would >>> become --prefix_transform1-s3-region. But to impl it i have >>> https://github.com/apache/beam/pull/4683 which needs to be merged >>> before ;). >>> >>> Le 8 mars 2018 23:03, "Ismaël Mejía" <ieme...@gmail.com> a écrit : >>> >>>> I was trying to create a really simple pipeline that read from a >>>> bucket in a filesystem (s3) and writes to a different bucket in the >>>> same filesystem. >>>> >>>> S3Options options = >>>> PipelineOptionsFactory.fromArgs(args).create().as(S3Options.class); >>>> Pipeline pipeline = Pipeline.create(options); >>>> pipeline >>>> .apply("ReadLines", TextIO.read().from("s3://src-bucket/*")) >>>> // .apply("AllOtherMagic", ...) >>>> .apply("WriteCounts", TextIO.write().to("s3://dst-bucket/")); >>>> p.run().waitUntilFinish(); >>>> >>>> I discovered that my original bucket was in a different region so I >>>> needed to pass a different S3Options object to the Write >>>> ‘options.setAwsRegion(“dst-region”)’, but I could not find a way to do >>>> it. Can somebody give me a hint on how to do this? >>>> >>>> I was wondering that since File-based IOs use the configuration >>>> implied by the Filesystem if this was possible. With non-file based >>>> IOs all the configuration details are explicit in each specific >>>> transform, but this is not the case for these file-based transforms. >>>> >>>> Note. I know this question probably belongs more to user@ but since I >>>> couldn’t find an easy way to do it I was wondering if this is an issue >>>> we should consider at dev@ from an API point of view. >>>> >>>