On Mon, May 6, 2019 at 3:01 PM Ahmet Altay <[email protected]> wrote:

> There is RunnerOptions already. Its options are populated by querying the
> job service. Any portable runner is able to provide a list of options that
> is runner specific through that mechanism.
>
> *From: *Reza Rokni <[email protected]>
> *Date: *Mon, May 6, 2019 at 2:57 PM
> *To: * <[email protected]>
>
> So the options here would be moved to runner options?
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>
In theory at least, many options specified in WorkerOptions can apply for
all runners hence probably are not truly runner-specific (num_workers,
zone, worker_machine_type, etc). Also, moving existing options might be
hard due to backwards compatibility reasons.

Some of the truly runner specific options are in XYZRunnerOptions classes.
But due to not having a namespace, names there have to be globally unique
which can be addressed by introducing class name as a namespace.


>
>>
>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>> have FlinkPipelineOptions etc...
>>
>> *From: *Chamikara Jayalath <[email protected]>
>> *Date: *Tue, 7 May 2019 at 05:29
>> *To: *dev
>>
>>
>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> There were also discussions[1] in the past about scoping
>>>> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
>>>> PTransforms make this a more general solution?
>>>>
>>>> 1:
>>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>>
>>>
>>> Is this just for pipeline construction time or also for runtime ? Trying
>>> to scope options for transforms at runtime might complicate things in the
>>> presence of optimizations such as fusion.
>>>
>>>
>>>>
>>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <[email protected]> wrote:
>>>>
>>>>> Having namespaces for option makes sense.
>>>>> I think, along with a help command to print all the options given the
>>>>> runner name will be useful.
>>>>> As for the scope of name spacing, I think that assigning a logical
>>>>> name space gives more flexibility around how and where we declare options.
>>>>> It also make future refactoring possible.
>>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Good points. As already mentioned there is no namespacing between the
>>>>>> different pipeline option classes. In particular, there is no
>>>>>> separate
>>>>>> namespace for system and user options which is most concerning.
>>>>>>
>>>>>> I'm in favor of an optional namespace using the class name of the
>>>>>> defining pipeline option class. That way we would at least be able to
>>>>>> resolve duplicate option names. For example, if there were was
>>>>>> "optionX"
>>>>>> in class A and B, we could use "A#optionX" to refer to it from class
>>>>>> A.
>>>>>>
>>>>>
>>> I think this solves the original problem. Runner specific options will
>>> have unique names that includes the runner (in options class). I guess to
>>> be complete we also have to include the package (module for Python) ?
>>> If an option is globally unique, users should be able to specify it
>>> without qualifying (at least for backwards compatibility).
>>>
>>>
>>>>
>>>>>> -Max
>>>>>>
>>>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>>>> runners.
>>>>>> >
>>>>>> > Perhaps for parameters that could have multiple runner relevance,
>>>>>> the
>>>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>>>> > example change the help information to start with a generic
>>>>>> reference "
>>>>>> > worker type on the runner" followed by runner specific behavior
>>>>>> expected
>>>>>> > for RunnerA, RunnerB etc...
>>>>>> >
>>>>>> > But I do worry that without prefix even generic options could cause
>>>>>> > confusion. For example if the use of --network is substantially
>>>>>> > different between runnerA vs runnerB then the user will only have
>>>>>> this
>>>>>> > information by reading the help. It will also mean that a pipeline
>>>>>> which
>>>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>>>> > RunnerB could fail because the format of the options to pass to
>>>>>> > --network are different.
>>>>>> >
>>>>>> > Cheers
>>>>>> >
>>>>>> > Reza
>>>>>> >
>>>>>> > *From: *Kenneth Knowles <[email protected] <mailto:[email protected]>>
>>>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>>>> > *To: *dev
>>>>>> >
>>>>>> >     Even though they are in classes named for specific runners,
>>>>>> they are
>>>>>> >     not namespaced. All PipelineOptions exist in a global namespace
>>>>>> so
>>>>>> >     they need to be careful to be very precise.
>>>>>> >
>>>>>> >     It is a good point that even though they may be multiple uses
>>>>>> for
>>>>>> >     "machine type" they are probably not going to both happen at the
>>>>>> >     same time.
>>>>>> >
>>>>>> >     If it becomes an issue, another thing we could do would be to
>>>>>> add
>>>>>> >     namespacing support so options have less spooky action, or at
>>>>>> least
>>>>>> >     have a way to resolve it when it happens on accident.
>>>>>> >
>>>>>> >     Kenn
>>>>>> >
>>>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>>>> >     <[email protected] <mailto:[email protected]>> wrote:
>>>>>> >
>>>>>> >         Also, we do have runner specific options classes where truly
>>>>>> >         runner specific options can go.
>>>>>> >
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>>>> >
>>>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <
>>>>>> [email protected]
>>>>>> >         <mailto:[email protected]>> wrote:
>>>>>> >
>>>>>> >             I agree, that is a good point.
>>>>>> >
>>>>>> >             *From: *Lukasz Cwik <[email protected] <mailto:
>>>>>> [email protected]>>
>>>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>>>> >             *To: *dev
>>>>>> >
>>>>>> >                 The concept of a machine type isn't necessarily
>>>>>> limited
>>>>>> >                 to Dataflow. If it made sense for a runner, they
>>>>>> could
>>>>>> >                 use AWS/Azure machine types as well.
>>>>>> >
>>>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>>>> >                 <[email protected] <mailto:[email protected]>> wrote:
>>>>>> >
>>>>>> >                     This idea was discussed in a PR a few months
>>>>>> ago,
>>>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>>>> makes
>>>>>> >                     sense to use a namespace prefix. The primary
>>>>>> issue
>>>>>> >                     here is that, such a change will very likely be
>>>>>> a
>>>>>> >                     backward incompatible change and would be hard
>>>>>> to do
>>>>>> >                     before the next major version.
>>>>>> >
>>>>>> >                     [1]
>>>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>>>> >
>>>>>> >                     *From: *Reza Rokni <[email protected]
>>>>>> >                     <mailto:[email protected]>>
>>>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>>> >                     *To: * <[email protected]
>>>>>> >                     <mailto:[email protected]>>
>>>>>> >
>>>>>> >                         Hi,
>>>>>> >
>>>>>> >                         Was reading this SO question:
>>>>>> >
>>>>>> >
>>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>> >
>>>>>> >                         And noticed that in
>>>>>> >
>>>>>> >
>>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>> >
>>>>>> >                         The option is called --worker_machine_type.
>>>>>> >
>>>>>> >                         I wonder if runner specific options should
>>>>>> have
>>>>>> >                         the runner in the prefix? Something like
>>>>>> >                         --dataflow_worker_machine_type?
>>>>>> >
>>>>>> >                         Cheers
>>>>>> >                         Reza
>>>>>> >
>>>>>> >                         --
>>>>>> >
>>>>>> >                         This email may be confidential and
>>>>>> privileged.
>>>>>> >                         If you received this communication by
>>>>>> mistake,
>>>>>> >                         please don't forward it to anyone else,
>>>>>> please
>>>>>> >                         erase all copies and attachments, and
>>>>>> please let
>>>>>> >                         me know that it has gone to the wrong
>>>>>> person.
>>>>>> >
>>>>>> >                         The above terms reflect a potential business
>>>>>> >                         arrangement, are provided solely as a basis
>>>>>> for
>>>>>> >                         further discussion, and are not intended to
>>>>>> be
>>>>>> >                         and do not constitute a legally binding
>>>>>> >                         obligation. No legally binding obligations
>>>>>> will
>>>>>> >                         be created, implied, or inferred until an
>>>>>> >                         agreement in final form is executed in
>>>>>> writing
>>>>>> >                         by all parties involved.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > This email may be confidential and privileged. If you received this
>>>>>> > communication by mistake, please don't forward it to anyone else,
>>>>>> please
>>>>>> > erase all copies and attachments, and please let me know that it
>>>>>> has
>>>>>> > gone to the wrong person.
>>>>>> >
>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>> provided
>>>>>> > solely as a basis for further discussion, and are not intended to
>>>>>> be and
>>>>>> > do not constitute a legally binding obligation. No legally binding
>>>>>> > obligations will be created, implied, or inferred until an
>>>>>> agreement in
>>>>>> > final form is executed in writing by all parties involved.
>>>>>> >
>>>>>>
>>>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Reply via email to