On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote: > There were also discussions[1] in the past about scoping PipelineOptions > to specific PTransforms. Would scoping PipelineOptions to PTransforms make > this a more general solution? > > 1: > https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E >
Is this just for pipeline construction time or also for runtime ? Trying to scope options for transforms at runtime might complicate things in the presence of optimizations such as fusion. > > On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <goe...@google.com> wrote: > >> Having namespaces for option makes sense. >> I think, along with a help command to print all the options given the >> runner name will be useful. >> As for the scope of name spacing, I think that assigning a logical name >> space gives more flexibility around how and where we declare options. It >> also make future refactoring possible. >> >> >> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <m...@apache.org> wrote: >> >>> Good points. As already mentioned there is no namespacing between the >>> different pipeline option classes. In particular, there is no separate >>> namespace for system and user options which is most concerning. >>> >>> I'm in favor of an optional namespace using the class name of the >>> defining pipeline option class. That way we would at least be able to >>> resolve duplicate option names. For example, if there were was "optionX" >>> in class A and B, we could use "A#optionX" to refer to it from class A. >>> >> I think this solves the original problem. Runner specific options will have unique names that includes the runner (in options class). I guess to be complete we also have to include the package (module for Python) ? If an option is globally unique, users should be able to specify it without qualifying (at least for backwards compatibility). > >>> -Max >>> >>> On 04.05.19 02:23, Reza Rokni wrote: >>> > Great point Lukasz, worker machine could be relevant to multiple >>> runners. >>> > >>> > Perhaps for parameters that could have multiple runner relevance, the >>> > doc could be rephrased to reflect its potential multiple uses. For >>> > example change the help information to start with a generic reference >>> " >>> > worker type on the runner" followed by runner specific behavior >>> expected >>> > for RunnerA, RunnerB etc... >>> > >>> > But I do worry that without prefix even generic options could cause >>> > confusion. For example if the use of --network is substantially >>> > different between runnerA vs runnerB then the user will only have this >>> > information by reading the help. It will also mean that a pipeline >>> which >>> > is expected to work both on-premise on RunnerA and in the cloud on >>> > RunnerB could fail because the format of the options to pass to >>> > --network are different. >>> > >>> > Cheers >>> > >>> > Reza >>> > >>> > *From: *Kenneth Knowles <k...@apache.org <mailto:k...@apache.org>> >>> > *Date: *Sat, 4 May 2019 at 03:54 >>> > *To: *dev >>> > >>> > Even though they are in classes named for specific runners, they >>> are >>> > not namespaced. All PipelineOptions exist in a global namespace so >>> > they need to be careful to be very precise. >>> > >>> > It is a good point that even though they may be multiple uses for >>> > "machine type" they are probably not going to both happen at the >>> > same time. >>> > >>> > If it becomes an issue, another thing we could do would be to add >>> > namespacing support so options have less spooky action, or at least >>> > have a way to resolve it when it happens on accident. >>> > >>> > Kenn >>> > >>> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath >>> > <chamik...@google.com <mailto:chamik...@google.com>> wrote: >>> > >>> > Also, we do have runner specific options classes where truly >>> > runner specific options can go. >>> > >>> > >>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java >>> > >>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java >>> > >>> > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <al...@google.com >>> > <mailto:al...@google.com>> wrote: >>> > >>> > I agree, that is a good point. >>> > >>> > *From: *Lukasz Cwik <lc...@google.com <mailto: >>> lc...@google.com>> >>> > *Date: *Fri, May 3, 2019 at 9:37 AM >>> > *To: *dev >>> > >>> > The concept of a machine type isn't necessarily limited >>> > to Dataflow. If it made sense for a runner, they could >>> > use AWS/Azure machine types as well. >>> > >>> > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay >>> > <al...@google.com <mailto:al...@google.com>> wrote: >>> > >>> > This idea was discussed in a PR a few months ago, >>> > and JIRA was filed as a follow up [1]. IMO, it >>> makes >>> > sense to use a namespace prefix. The primary issue >>> > here is that, such a change will very likely be a >>> > backward incompatible change and would be hard to >>> do >>> > before the next major version. >>> > >>> > [1] >>> https://issues.apache.org/jira/browse/BEAM-6531 >>> > >>> > *From: *Reza Rokni <r...@google.com >>> > <mailto:r...@google.com>> >>> > *Date: *Thu, May 2, 2019 at 8:00 PM >>> > *To: * <dev@beam.apache.org >>> > <mailto:dev@beam.apache.org>> >>> > >>> > Hi, >>> > >>> > Was reading this SO question: >>> > >>> > >>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has >>> > >>> > And noticed that in >>> > >>> > >>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions >>> > >>> > The option is called --worker_machine_type. >>> > >>> > I wonder if runner specific options should have >>> > the runner in the prefix? Something like >>> > --dataflow_worker_machine_type? >>> > >>> > Cheers >>> > Reza >>> > >>> > -- >>> > >>> > This email may be confidential and privileged. >>> > If you received this communication by mistake, >>> > please don't forward it to anyone else, please >>> > erase all copies and attachments, and please >>> let >>> > me know that it has gone to the wrong person. >>> > >>> > The above terms reflect a potential business >>> > arrangement, are provided solely as a basis for >>> > further discussion, and are not intended to be >>> > and do not constitute a legally binding >>> > obligation. No legally binding obligations will >>> > be created, implied, or inferred until an >>> > agreement in final form is executed in writing >>> > by all parties involved. >>> > >>> > >>> > >>> > -- >>> > >>> > This email may be confidential and privileged. If you received this >>> > communication by mistake, please don't forward it to anyone else, >>> please >>> > erase all copies and attachments, and please let me know that it has >>> > gone to the wrong person. >>> > >>> > The above terms reflect a potential business arrangement, are provided >>> > solely as a basis for further discussion, and are not intended to be >>> and >>> > do not constitute a legally binding obligation. No legally binding >>> > obligations will be created, implied, or inferred until an agreement >>> in >>> > final form is executed in writing by all parties involved. >>> > >>> >>