Hmm, it doesnt solve the issue that beam doesnt enable to configure transform from its "config" (let say the cli).
So if i have a generic pipeline taking a file as input and another as output then i must register 2 filesystems in all cases? If the pipeline is dynamic i must make it dynamic too? Sounds pretty bad for end users and not generic - all transform hit this issue since beam cant assume the impl. Using a prefix (namespace which can be implicit or not) is simple, straight forward and enables all cases to be handled smoothly for end users. What is the blocker to fix this design issue? I kind of fail to see why we end up on a few particular cases with workarounds right now :s. Le 9 mars 2018 19:00, "Jacob Marble" <jacobmar...@gmail.com> a écrit : > I think when I wrote the S3 code, I couldn't see how to set storage class > per-bucket, so put it in a flag. It's easy to imagine a use case where > storage class differs per filespec, not only per bucket. > > Jacob > > On Fri, Mar 9, 2018 at 9:51 AM, Jacob Marble <jacobmar...@gmail.com> > wrote: > >> Yes, I agree with all of this. >> >> Jacob >> >> On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov <kirpic...@google.com> >>> wrote: >>> >>>> I think it may have been an API design mistake to put the S3 region >>>> into PipelineOptions. >>>> >>> >>> +1, IMHO it's generally a mistake to put any transform configuration >>> into PipelineOptions for exactly this reason. >>> >>> >>>> PipelineOptions are global per pipeline, whereas it's totally >>>> reasonable to access S3 files in different regions even from the code of a >>>> single DoFn running on a single element. The same applies to >>>> "setS3StorageClass". >>>> >>>> Jacob: what do you think? Why is it necessary to specify the S3 region >>>> at all - can AWS infer it automatically? Per https://github.com/aws/aws >>>> -sdk-java/issues/1107 it seems that this is possible via a setting on >>>> the client, so that the specified region is used as the default but if the >>>> bucket is in a different region things still work. >>>> >>>> As for the storage class: so far nobody complained ;) but it should >>>> probably be specified via https://github.com/apache/ >>>> beam/blob/master/sdks/java/core/src/main/java/org/apache/bea >>>> m/sdk/io/fs/CreateOptions.java instead of a pipeline option. >>>> >>>> On Thu, Mar 8, 2018 at 9:16 PM Romain Manni-Bucau < >>>> rmannibu...@gmail.com> wrote: >>>> >>>>> The "hint" would probably to use hints :) - indees this joke refers to >>>>> the hint thread. >>>>> >>>>> Long story short with hints you should be able to say "use that >>>>> specialize config here". >>>>> >>>>> Now, personally, I'd like to see a way to specialize config per >>>>> transform. With an hint an easy way is to use a prefix: --s3-region would >>>>> become --prefix_transform1-s3-region. But to impl it i have >>>>> https://github.com/apache/beam/pull/4683 which needs to be merged >>>>> before ;). >>>>> >>>>> Le 8 mars 2018 23:03, "Ismaël Mejía" <ieme...@gmail.com> a écrit : >>>>> >>>>>> I was trying to create a really simple pipeline that read from a >>>>>> bucket in a filesystem (s3) and writes to a different bucket in the >>>>>> same filesystem. >>>>>> >>>>>> S3Options options = >>>>>> PipelineOptionsFactory.fromArgs(args).create().as(S3Options.class); >>>>>> Pipeline pipeline = Pipeline.create(options); >>>>>> pipeline >>>>>> .apply("ReadLines", TextIO.read().from("s3://src-bucket/*")) >>>>>> // .apply("AllOtherMagic", ...) >>>>>> .apply("WriteCounts", TextIO.write().to("s3://dst-bucket/")); >>>>>> p.run().waitUntilFinish(); >>>>>> >>>>>> I discovered that my original bucket was in a different region so I >>>>>> needed to pass a different S3Options object to the Write >>>>>> ‘options.setAwsRegion(“dst-region”)’, but I could not find a way to >>>>>> do >>>>>> it. Can somebody give me a hint on how to do this? >>>>>> >>>>>> I was wondering that since File-based IOs use the configuration >>>>>> implied by the Filesystem if this was possible. With non-file based >>>>>> IOs all the configuration details are explicit in each specific >>>>>> transform, but this is not the case for these file-based transforms. >>>>>> >>>>>> Note. I know this question probably belongs more to user@ but since I >>>>>> couldn’t find an easy way to do it I was wondering if this is an issue >>>>>> we should consider at dev@ from an API point of view. >>>>>> >>>>> >> >