Re: Better naming for runner specific options

2019-05-22 Thread Maximilian Michels

+1

On 22.05.19 04:28, Reza Rokni wrote:

Hi,

Coming back to this, is the general consensus that this can be addressed 
via https://issues.apache.org/jira/browse/BEAM-6531 in Beam 3.0?


Cheers
Reza

On Tue, 7 May 2019 at 23:15, Valentyn Tymofieiev > wrote:


I think using RunnerOptions was an idea at some point, but in
Python, we ended up parsing options from the runner api without
populating RunnerOptions, and  RunnerOptions was eventually removed [1].

If we decide to rename options, a path forward may be to have
runners recognize both old and new names until Beam 3.0, but update
codebase, examples and documentation to use new names.

[1]

https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815

*From:*Ahmet Altay mailto:al...@google.com>>
*Date:*Mon, May 6, 2019, 6:01 PM
*To:*dev

There is RunnerOptions already. Its options are populated by
querying the job service. Any portable runner is able to provide
a list of options that is runner specific through that mechanism.

*From: *Reza Rokni mailto:r...@google.com>>
*Date: *Mon, May 6, 2019 at 2:57 PM
*To: * mailto:dev@beam.apache.org>>

So the options here would be moved to runner options?

https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

In Java they are in DataflowPipelineWorkerPoolOptions and of
course we have FlinkPipelineOptions etc...

*From: *Chamikara Jayalath mailto:chamik...@google.com>>
*Date: *Tue, 7 May 2019 at 05:29
*To: *dev


On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik
mailto:lc...@google.com>> wrote:

There were also discussions[1] in the past about
scoping PipelineOptions to specific PTransforms.
Would scoping PipelineOptions to PTransforms make
this a more general solution?

1:

https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E


Is this just for pipeline construction time or also for
runtime ? Trying to scope options for transforms at
runtime might complicate things in the presence of
optimizations such as fusion.


On Mon, May 6, 2019 at 12:02 PM Ankur Goenka
mailto:goe...@google.com>> wrote:

Having namespaces for option makes sense.
I think, along with a help command to print all
the options given the runner name will be useful.
As for the scope of name spacing, I think that
assigning a logical name space gives more
flexibility around how and where we declare
options. It also make future refactoring possible.


On Mon, May 6, 2019 at 7:50 AM Maximilian
Michels mailto:m...@apache.org>>
wrote:

Good points. As already mentioned there is
no namespacing between the
different pipeline option classes. In
particular, there is no separate
namespace for system and user options which
is most concerning.

I'm in favor of an optional namespace using
the class name of the
defining pipeline option class. That way we
would at least be able to
resolve duplicate option names. For example,
if there were was "optionX"
in class A and B, we could use "A#optionX"
to refer to it from class A.


I think this solves the original problem. Runner
specific options will have unique names that includes
the runner (in options class). I guess to be complete we
also have to include the package (module for Python) ?
If an option is globally unique, users should be able to
specify it without qualifying (at least for backwards
compatibility).


-Max

On 04.05.19 02:23, Reza Rokni wrote:
 > Great point Lukasz, worker machine could
be relevant to multiple runners.
 >
 > Perhaps for parameters that could have

Re: Better naming for runner specific options

2019-05-21 Thread Reza Rokni
Hi,

Coming back to this, is the general consensus that this can be addressed
via https://issues.apache.org/jira/browse/BEAM-6531 in Beam 3.0?

Cheers
Reza

On Tue, 7 May 2019 at 23:15, Valentyn Tymofieiev 
wrote:

> I think using RunnerOptions was an idea at some point, but in Python, we
> ended up parsing options from the runner api without populating
> RunnerOptions, and  RunnerOptions was eventually removed [1].
>
> If we decide to rename options, a path forward may be to have runners
> recognize both old and new names until Beam 3.0, but update codebase,
> examples and documentation to use new names.
>
> [1]
> https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815
>
> *From:*Ahmet Altay 
> *Date:*Mon, May 6, 2019, 6:01 PM
> *To:*dev
>
> There is RunnerOptions already. Its options are populated by querying the
>> job service. Any portable runner is able to provide a list of options that
>> is runner specific through that mechanism.
>>
>> *From: *Reza Rokni 
>> *Date: *Mon, May 6, 2019 at 2:57 PM
>> *To: * 
>>
>> So the options here would be moved to runner options?
>>>
>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>
>>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>>> have FlinkPipelineOptions etc...
>>>
>>> *From: *Chamikara Jayalath 
>>> *Date: *Tue, 7 May 2019 at 05:29
>>> *To: *dev
>>>
>>>
 On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:

> There were also discussions[1] in the past about scoping
> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
> PTransforms make this a more general solution?
>
> 1:
> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>

 Is this just for pipeline construction time or also for runtime ?
 Trying to scope options for transforms at runtime might complicate things
 in the presence of optimizations such as fusion.


>
> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka 
> wrote:
>
>> Having namespaces for option makes sense.
>> I think, along with a help command to print all the options given the
>> runner name will be useful.
>> As for the scope of name spacing, I think that assigning a logical
>> name space gives more flexibility around how and where we declare 
>> options.
>> It also make future refactoring possible.
>>
>>
>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels 
>> wrote:
>>
>>> Good points. As already mentioned there is no namespacing between
>>> the
>>> different pipeline option classes. In particular, there is no
>>> separate
>>> namespace for system and user options which is most concerning.
>>>
>>> I'm in favor of an optional namespace using the class name of the
>>> defining pipeline option class. That way we would at least be able
>>> to
>>> resolve duplicate option names. For example, if there were was
>>> "optionX"
>>> in class A and B, we could use "A#optionX" to refer to it from class
>>> A.
>>>
>>
 I think this solves the original problem. Runner specific options will
 have unique names that includes the runner (in options class). I guess to
 be complete we also have to include the package (module for Python) ?
 If an option is globally unique, users should be able to specify it
 without qualifying (at least for backwards compatibility).


>
>>> -Max
>>>
>>> On 04.05.19 02:23, Reza Rokni wrote:
>>> > Great point Lukasz, worker machine could be relevant to multiple
>>> runners.
>>> >
>>> > Perhaps for parameters that could have multiple runner relevance,
>>> the
>>> > doc could be rephrased to reflect its potential multiple uses. For
>>> > example change the help information to start with a generic
>>> reference "
>>> > worker type on the runner" followed by runner specific behavior
>>> expected
>>> > for RunnerA, RunnerB etc...
>>> >
>>> > But I do worry that without prefix even generic options could
>>> cause
>>> > confusion. For example if the use of --network is substantially
>>> > different between runnerA vs runnerB then the user will only have
>>> this
>>> > information by reading the help. It will also mean that a pipeline
>>> which
>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>> > RunnerB could fail because the format of the options to pass to
>>> > --network are different.
>>> >
>>> > Cheers
>>> >
>>> > Reza
>>> >
>>> > *From: *Kenneth Knowles mailto:k...@apache.org>>
>>> > *Date: *Sat, 4 May 2019 at 03:54
>>> > *To: *dev
>>> >
>>> > Even though they are in classes named for specific runners,

Re: Better naming for runner specific options

2019-05-07 Thread Valentyn Tymofieiev
I think using RunnerOptions was an idea at some point, but in Python, we
ended up parsing options from the runner api without populating
RunnerOptions, and  RunnerOptions was eventually removed [1].

If we decide to rename options, a path forward may be to have runners
recognize both old and new names until Beam 3.0, but update codebase,
examples and documentation to use new names.

[1]
https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815

*From:*Ahmet Altay 
*Date:*Mon, May 6, 2019, 6:01 PM
*To:*dev

There is RunnerOptions already. Its options are populated by querying the
> job service. Any portable runner is able to provide a list of options that
> is runner specific through that mechanism.
>
> *From: *Reza Rokni 
> *Date: *Mon, May 6, 2019 at 2:57 PM
> *To: * 
>
> So the options here would be moved to runner options?
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>> have FlinkPipelineOptions etc...
>>
>> *From: *Chamikara Jayalath 
>> *Date: *Tue, 7 May 2019 at 05:29
>> *To: *dev
>>
>>
>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:
>>>
 There were also discussions[1] in the past about scoping
 PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
 PTransforms make this a more general solution?

 1:
 https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E

>>>
>>> Is this just for pipeline construction time or also for runtime ? Trying
>>> to scope options for transforms at runtime might complicate things in the
>>> presence of optimizations such as fusion.
>>>
>>>

 On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:

> Having namespaces for option makes sense.
> I think, along with a help command to print all the options given the
> runner name will be useful.
> As for the scope of name spacing, I think that assigning a logical
> name space gives more flexibility around how and where we declare options.
> It also make future refactoring possible.
>
>
> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels 
> wrote:
>
>> Good points. As already mentioned there is no namespacing between the
>> different pipeline option classes. In particular, there is no
>> separate
>> namespace for system and user options which is most concerning.
>>
>> I'm in favor of an optional namespace using the class name of the
>> defining pipeline option class. That way we would at least be able to
>> resolve duplicate option names. For example, if there were was
>> "optionX"
>> in class A and B, we could use "A#optionX" to refer to it from class
>> A.
>>
>
>>> I think this solves the original problem. Runner specific options will
>>> have unique names that includes the runner (in options class). I guess to
>>> be complete we also have to include the package (module for Python) ?
>>> If an option is globally unique, users should be able to specify it
>>> without qualifying (at least for backwards compatibility).
>>>
>>>

>> -Max
>>
>> On 04.05.19 02:23, Reza Rokni wrote:
>> > Great point Lukasz, worker machine could be relevant to multiple
>> runners.
>> >
>> > Perhaps for parameters that could have multiple runner relevance,
>> the
>> > doc could be rephrased to reflect its potential multiple uses. For
>> > example change the help information to start with a generic
>> reference "
>> > worker type on the runner" followed by runner specific behavior
>> expected
>> > for RunnerA, RunnerB etc...
>> >
>> > But I do worry that without prefix even generic options could cause
>> > confusion. For example if the use of --network is substantially
>> > different between runnerA vs runnerB then the user will only have
>> this
>> > information by reading the help. It will also mean that a pipeline
>> which
>> > is expected to work both on-premise on RunnerA and in the cloud on
>> > RunnerB could fail because the format of the options to pass to
>> > --network are different.
>> >
>> > Cheers
>> >
>> > Reza
>> >
>> > *From: *Kenneth Knowles mailto:k...@apache.org>>
>> > *Date: *Sat, 4 May 2019 at 03:54
>> > *To: *dev
>> >
>> > Even though they are in classes named for specific runners,
>> they are
>> > not namespaced. All PipelineOptions exist in a global namespace
>> so
>> > they need to be careful to be very precise.
>> >
>> > It is a good point that even though they may be multiple uses
>> for
>> > "machine type" they are probably not going to both happen at the
>> > same time.
>> >
>> > If it 

Re: Better naming for runner specific options

2019-05-06 Thread Chamikara Jayalath
On Mon, May 6, 2019 at 3:01 PM Ahmet Altay  wrote:

> There is RunnerOptions already. Its options are populated by querying the
> job service. Any portable runner is able to provide a list of options that
> is runner specific through that mechanism.
>
> *From: *Reza Rokni 
> *Date: *Mon, May 6, 2019 at 2:57 PM
> *To: * 
>
> So the options here would be moved to runner options?
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>
In theory at least, many options specified in WorkerOptions can apply for
all runners hence probably are not truly runner-specific (num_workers,
zone, worker_machine_type, etc). Also, moving existing options might be
hard due to backwards compatibility reasons.

Some of the truly runner specific options are in XYZRunnerOptions classes.
But due to not having a namespace, names there have to be globally unique
which can be addressed by introducing class name as a namespace.


>
>>
>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>> have FlinkPipelineOptions etc...
>>
>> *From: *Chamikara Jayalath 
>> *Date: *Tue, 7 May 2019 at 05:29
>> *To: *dev
>>
>>
>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:
>>>
 There were also discussions[1] in the past about scoping
 PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
 PTransforms make this a more general solution?

 1:
 https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E

>>>
>>> Is this just for pipeline construction time or also for runtime ? Trying
>>> to scope options for transforms at runtime might complicate things in the
>>> presence of optimizations such as fusion.
>>>
>>>

 On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:

> Having namespaces for option makes sense.
> I think, along with a help command to print all the options given the
> runner name will be useful.
> As for the scope of name spacing, I think that assigning a logical
> name space gives more flexibility around how and where we declare options.
> It also make future refactoring possible.
>
>
> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels 
> wrote:
>
>> Good points. As already mentioned there is no namespacing between the
>> different pipeline option classes. In particular, there is no
>> separate
>> namespace for system and user options which is most concerning.
>>
>> I'm in favor of an optional namespace using the class name of the
>> defining pipeline option class. That way we would at least be able to
>> resolve duplicate option names. For example, if there were was
>> "optionX"
>> in class A and B, we could use "A#optionX" to refer to it from class
>> A.
>>
>
>>> I think this solves the original problem. Runner specific options will
>>> have unique names that includes the runner (in options class). I guess to
>>> be complete we also have to include the package (module for Python) ?
>>> If an option is globally unique, users should be able to specify it
>>> without qualifying (at least for backwards compatibility).
>>>
>>>

>> -Max
>>
>> On 04.05.19 02:23, Reza Rokni wrote:
>> > Great point Lukasz, worker machine could be relevant to multiple
>> runners.
>> >
>> > Perhaps for parameters that could have multiple runner relevance,
>> the
>> > doc could be rephrased to reflect its potential multiple uses. For
>> > example change the help information to start with a generic
>> reference "
>> > worker type on the runner" followed by runner specific behavior
>> expected
>> > for RunnerA, RunnerB etc...
>> >
>> > But I do worry that without prefix even generic options could cause
>> > confusion. For example if the use of --network is substantially
>> > different between runnerA vs runnerB then the user will only have
>> this
>> > information by reading the help. It will also mean that a pipeline
>> which
>> > is expected to work both on-premise on RunnerA and in the cloud on
>> > RunnerB could fail because the format of the options to pass to
>> > --network are different.
>> >
>> > Cheers
>> >
>> > Reza
>> >
>> > *From: *Kenneth Knowles mailto:k...@apache.org>>
>> > *Date: *Sat, 4 May 2019 at 03:54
>> > *To: *dev
>> >
>> > Even though they are in classes named for specific runners,
>> they are
>> > not namespaced. All PipelineOptions exist in a global namespace
>> so
>> > they need to be careful to be very precise.
>> >
>> > It is a good point that even though they may be multiple uses
>> for
>> > "machine type" they are probably not going to both happen at the
>> > same time.
>> >
>> > If it becomes an issue, another thing we 

Re: Better naming for runner specific options

2019-05-06 Thread Reza Rokni
So the options here would be moved to runner options?
https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

In Java they are in DataflowPipelineWorkerPoolOptions and of course we
have FlinkPipelineOptions etc...

*From: *Chamikara Jayalath 
*Date: *Tue, 7 May 2019 at 05:29
*To: *dev


> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:
>
>> There were also discussions[1] in the past about scoping PipelineOptions
>> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
>> this a more general solution?
>>
>> 1:
>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>
>
> Is this just for pipeline construction time or also for runtime ? Trying
> to scope options for transforms at runtime might complicate things in the
> presence of optimizations such as fusion.
>
>
>>
>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:
>>
>>> Having namespaces for option makes sense.
>>> I think, along with a help command to print all the options given the
>>> runner name will be useful.
>>> As for the scope of name spacing, I think that assigning a logical name
>>> space gives more flexibility around how and where we declare options. It
>>> also make future refactoring possible.
>>>
>>>
>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels 
>>> wrote:
>>>
 Good points. As already mentioned there is no namespacing between the
 different pipeline option classes. In particular, there is no separate
 namespace for system and user options which is most concerning.

 I'm in favor of an optional namespace using the class name of the
 defining pipeline option class. That way we would at least be able to
 resolve duplicate option names. For example, if there were was
 "optionX"
 in class A and B, we could use "A#optionX" to refer to it from class A.

>>>
> I think this solves the original problem. Runner specific options will
> have unique names that includes the runner (in options class). I guess to
> be complete we also have to include the package (module for Python) ?
> If an option is globally unique, users should be able to specify it
> without qualifying (at least for backwards compatibility).
>
>
>>
 -Max

 On 04.05.19 02:23, Reza Rokni wrote:
 > Great point Lukasz, worker machine could be relevant to multiple
 runners.
 >
 > Perhaps for parameters that could have multiple runner relevance, the
 > doc could be rephrased to reflect its potential multiple uses. For
 > example change the help information to start with a generic reference
 "
 > worker type on the runner" followed by runner specific behavior
 expected
 > for RunnerA, RunnerB etc...
 >
 > But I do worry that without prefix even generic options could cause
 > confusion. For example if the use of --network is substantially
 > different between runnerA vs runnerB then the user will only have
 this
 > information by reading the help. It will also mean that a pipeline
 which
 > is expected to work both on-premise on RunnerA and in the cloud on
 > RunnerB could fail because the format of the options to pass to
 > --network are different.
 >
 > Cheers
 >
 > Reza
 >
 > *From: *Kenneth Knowles mailto:k...@apache.org>>
 > *Date: *Sat, 4 May 2019 at 03:54
 > *To: *dev
 >
 > Even though they are in classes named for specific runners, they
 are
 > not namespaced. All PipelineOptions exist in a global namespace so
 > they need to be careful to be very precise.
 >
 > It is a good point that even though they may be multiple uses for
 > "machine type" they are probably not going to both happen at the
 > same time.
 >
 > If it becomes an issue, another thing we could do would be to add
 > namespacing support so options have less spooky action, or at
 least
 > have a way to resolve it when it happens on accident.
 >
 > Kenn
 >
 > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
 > mailto:chamik...@google.com>> wrote:
 >
 > Also, we do have runner specific options classes where truly
 > runner specific options can go.
 >
 >
 https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
 >
 https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 >
 > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay >>> > > wrote:
 >
 > I agree, that is a good point.
 >
 > *From: *Lukasz Cwik >>> lc...@google.com>>
 > *Date: *Fri, May 3, 2019 at 9:37 AM

Re: Better naming for runner specific options

2019-05-06 Thread Ahmet Altay
There is RunnerOptions already. Its options are populated by querying the
job service. Any portable runner is able to provide a list of options that
is runner specific through that mechanism.

*From: *Reza Rokni 
*Date: *Mon, May 6, 2019 at 2:57 PM
*To: * 

So the options here would be moved to runner options?
>
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
> have FlinkPipelineOptions etc...
>
> *From: *Chamikara Jayalath 
> *Date: *Tue, 7 May 2019 at 05:29
> *To: *dev
>
>
>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:
>>
>>> There were also discussions[1] in the past about scoping PipelineOptions
>>> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
>>> this a more general solution?
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>
>>
>> Is this just for pipeline construction time or also for runtime ? Trying
>> to scope options for transforms at runtime might complicate things in the
>> presence of optimizations such as fusion.
>>
>>
>>>
>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:
>>>
 Having namespaces for option makes sense.
 I think, along with a help command to print all the options given the
 runner name will be useful.
 As for the scope of name spacing, I think that assigning a logical name
 space gives more flexibility around how and where we declare options. It
 also make future refactoring possible.


 On Mon, May 6, 2019 at 7:50 AM Maximilian Michels 
 wrote:

> Good points. As already mentioned there is no namespacing between the
> different pipeline option classes. In particular, there is no separate
> namespace for system and user options which is most concerning.
>
> I'm in favor of an optional namespace using the class name of the
> defining pipeline option class. That way we would at least be able to
> resolve duplicate option names. For example, if there were was
> "optionX"
> in class A and B, we could use "A#optionX" to refer to it from class A.
>

>> I think this solves the original problem. Runner specific options will
>> have unique names that includes the runner (in options class). I guess to
>> be complete we also have to include the package (module for Python) ?
>> If an option is globally unique, users should be able to specify it
>> without qualifying (at least for backwards compatibility).
>>
>>
>>>
> -Max
>
> On 04.05.19 02:23, Reza Rokni wrote:
> > Great point Lukasz, worker machine could be relevant to multiple
> runners.
> >
> > Perhaps for parameters that could have multiple runner relevance,
> the
> > doc could be rephrased to reflect its potential multiple uses. For
> > example change the help information to start with a generic
> reference "
> > worker type on the runner" followed by runner specific behavior
> expected
> > for RunnerA, RunnerB etc...
> >
> > But I do worry that without prefix even generic options could cause
> > confusion. For example if the use of --network is substantially
> > different between runnerA vs runnerB then the user will only have
> this
> > information by reading the help. It will also mean that a pipeline
> which
> > is expected to work both on-premise on RunnerA and in the cloud on
> > RunnerB could fail because the format of the options to pass to
> > --network are different.
> >
> > Cheers
> >
> > Reza
> >
> > *From: *Kenneth Knowles mailto:k...@apache.org>>
> > *Date: *Sat, 4 May 2019 at 03:54
> > *To: *dev
> >
> > Even though they are in classes named for specific runners, they
> are
> > not namespaced. All PipelineOptions exist in a global namespace
> so
> > they need to be careful to be very precise.
> >
> > It is a good point that even though they may be multiple uses for
> > "machine type" they are probably not going to both happen at the
> > same time.
> >
> > If it becomes an issue, another thing we could do would be to add
> > namespacing support so options have less spooky action, or at
> least
> > have a way to resolve it when it happens on accident.
> >
> > Kenn
> >
> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
> > mailto:chamik...@google.com>> wrote:
> >
> > Also, we do have runner specific options classes where truly
> > runner specific options can go.
> >
> >
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
> >
> 

Re: Better naming for runner specific options

2019-05-06 Thread Chamikara Jayalath
On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik  wrote:

> There were also discussions[1] in the past about scoping PipelineOptions
> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
> this a more general solution?
>
> 1:
> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>

Is this just for pipeline construction time or also for runtime ? Trying to
scope options for transforms at runtime might complicate things in the
presence of optimizations such as fusion.


>
> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:
>
>> Having namespaces for option makes sense.
>> I think, along with a help command to print all the options given the
>> runner name will be useful.
>> As for the scope of name spacing, I think that assigning a logical name
>> space gives more flexibility around how and where we declare options. It
>> also make future refactoring possible.
>>
>>
>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels  wrote:
>>
>>> Good points. As already mentioned there is no namespacing between the
>>> different pipeline option classes. In particular, there is no separate
>>> namespace for system and user options which is most concerning.
>>>
>>> I'm in favor of an optional namespace using the class name of the
>>> defining pipeline option class. That way we would at least be able to
>>> resolve duplicate option names. For example, if there were was "optionX"
>>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>>
>>
I think this solves the original problem. Runner specific options will have
unique names that includes the runner (in options class). I guess to be
complete we also have to include the package (module for Python) ?
If an option is globally unique, users should be able to specify it without
qualifying (at least for backwards compatibility).


>
>>> -Max
>>>
>>> On 04.05.19 02:23, Reza Rokni wrote:
>>> > Great point Lukasz, worker machine could be relevant to multiple
>>> runners.
>>> >
>>> > Perhaps for parameters that could have multiple runner relevance, the
>>> > doc could be rephrased to reflect its potential multiple uses. For
>>> > example change the help information to start with a generic reference
>>> "
>>> > worker type on the runner" followed by runner specific behavior
>>> expected
>>> > for RunnerA, RunnerB etc...
>>> >
>>> > But I do worry that without prefix even generic options could cause
>>> > confusion. For example if the use of --network is substantially
>>> > different between runnerA vs runnerB then the user will only have this
>>> > information by reading the help. It will also mean that a pipeline
>>> which
>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>> > RunnerB could fail because the format of the options to pass to
>>> > --network are different.
>>> >
>>> > Cheers
>>> >
>>> > Reza
>>> >
>>> > *From: *Kenneth Knowles mailto:k...@apache.org>>
>>> > *Date: *Sat, 4 May 2019 at 03:54
>>> > *To: *dev
>>> >
>>> > Even though they are in classes named for specific runners, they
>>> are
>>> > not namespaced. All PipelineOptions exist in a global namespace so
>>> > they need to be careful to be very precise.
>>> >
>>> > It is a good point that even though they may be multiple uses for
>>> > "machine type" they are probably not going to both happen at the
>>> > same time.
>>> >
>>> > If it becomes an issue, another thing we could do would be to add
>>> > namespacing support so options have less spooky action, or at least
>>> > have a way to resolve it when it happens on accident.
>>> >
>>> > Kenn
>>> >
>>> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>> > mailto:chamik...@google.com>> wrote:
>>> >
>>> > Also, we do have runner specific options classes where truly
>>> > runner specific options can go.
>>> >
>>> >
>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>> >
>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>> >
>>> > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay >> > > wrote:
>>> >
>>> > I agree, that is a good point.
>>> >
>>> > *From: *Lukasz Cwik >> lc...@google.com>>
>>> > *Date: *Fri, May 3, 2019 at 9:37 AM
>>> > *To: *dev
>>> >
>>> > The concept of a machine type isn't necessarily limited
>>> > to Dataflow. If it made sense for a runner, they could
>>> > use AWS/Azure machine types as well.
>>> >
>>> > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>> > mailto:al...@google.com>> wrote:
>>> >
>>> > This idea was discussed in a PR a few months ago,
>>> > and 

Re: Better naming for runner specific options

2019-05-06 Thread Lukasz Cwik
There were also discussions[1] in the past about scoping PipelineOptions to
specific PTransforms. Would scoping PipelineOptions to PTransforms make
this a more general solution?

1:
https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E

On Mon, May 6, 2019 at 12:02 PM Ankur Goenka  wrote:

> Having namespaces for option makes sense.
> I think, along with a help command to print all the options given the
> runner name will be useful.
> As for the scope of name spacing, I think that assigning a logical name
> space gives more flexibility around how and where we declare options. It
> also make future refactoring possible.
>
>
> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels  wrote:
>
>> Good points. As already mentioned there is no namespacing between the
>> different pipeline option classes. In particular, there is no separate
>> namespace for system and user options which is most concerning.
>>
>> I'm in favor of an optional namespace using the class name of the
>> defining pipeline option class. That way we would at least be able to
>> resolve duplicate option names. For example, if there were was "optionX"
>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>
>> -Max
>>
>> On 04.05.19 02:23, Reza Rokni wrote:
>> > Great point Lukasz, worker machine could be relevant to multiple
>> runners.
>> >
>> > Perhaps for parameters that could have multiple runner relevance, the
>> > doc could be rephrased to reflect its potential multiple uses. For
>> > example change the help information to start with a generic reference "
>> > worker type on the runner" followed by runner specific behavior
>> expected
>> > for RunnerA, RunnerB etc...
>> >
>> > But I do worry that without prefix even generic options could cause
>> > confusion. For example if the use of --network is substantially
>> > different between runnerA vs runnerB then the user will only have this
>> > information by reading the help. It will also mean that a pipeline
>> which
>> > is expected to work both on-premise on RunnerA and in the cloud on
>> > RunnerB could fail because the format of the options to pass to
>> > --network are different.
>> >
>> > Cheers
>> >
>> > Reza
>> >
>> > *From: *Kenneth Knowles mailto:k...@apache.org>>
>> > *Date: *Sat, 4 May 2019 at 03:54
>> > *To: *dev
>> >
>> > Even though they are in classes named for specific runners, they are
>> > not namespaced. All PipelineOptions exist in a global namespace so
>> > they need to be careful to be very precise.
>> >
>> > It is a good point that even though they may be multiple uses for
>> > "machine type" they are probably not going to both happen at the
>> > same time.
>> >
>> > If it becomes an issue, another thing we could do would be to add
>> > namespacing support so options have less spooky action, or at least
>> > have a way to resolve it when it happens on accident.
>> >
>> > Kenn
>> >
>> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>> > mailto:chamik...@google.com>> wrote:
>> >
>> > Also, we do have runner specific options classes where truly
>> > runner specific options can go.
>> >
>> >
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>> >
>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>> >
>> > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay > > > wrote:
>> >
>> > I agree, that is a good point.
>> >
>> > *From: *Lukasz Cwik > lc...@google.com>>
>> > *Date: *Fri, May 3, 2019 at 9:37 AM
>> > *To: *dev
>> >
>> > The concept of a machine type isn't necessarily limited
>> > to Dataflow. If it made sense for a runner, they could
>> > use AWS/Azure machine types as well.
>> >
>> > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>> > mailto:al...@google.com>> wrote:
>> >
>> > This idea was discussed in a PR a few months ago,
>> > and JIRA was filed as a follow up [1]. IMO, it makes
>> > sense to use a namespace prefix. The primary issue
>> > here is that, such a change will very likely be a
>> > backward incompatible change and would be hard to do
>> > before the next major version.
>> >
>> > [1] https://issues.apache.org/jira/browse/BEAM-6531
>> >
>> > *From: *Reza Rokni > > >
>> > *Date: *Thu, May 2, 2019 at 8:00 PM
>> > *To: * > > >
>> >
>> > Hi,
>> >
>> >  

Re: Better naming for runner specific options

2019-05-06 Thread Ankur Goenka
Having namespaces for option makes sense.
I think, along with a help command to print all the options given the
runner name will be useful.
As for the scope of name spacing, I think that assigning a logical name
space gives more flexibility around how and where we declare options. It
also make future refactoring possible.


On Mon, May 6, 2019 at 7:50 AM Maximilian Michels  wrote:

> Good points. As already mentioned there is no namespacing between the
> different pipeline option classes. In particular, there is no separate
> namespace for system and user options which is most concerning.
>
> I'm in favor of an optional namespace using the class name of the
> defining pipeline option class. That way we would at least be able to
> resolve duplicate option names. For example, if there were was "optionX"
> in class A and B, we could use "A#optionX" to refer to it from class A.
>
> -Max
>
> On 04.05.19 02:23, Reza Rokni wrote:
> > Great point Lukasz, worker machine could be relevant to multiple runners.
> >
> > Perhaps for parameters that could have multiple runner relevance, the
> > doc could be rephrased to reflect its potential multiple uses. For
> > example change the help information to start with a generic reference "
> > worker type on the runner" followed by runner specific behavior expected
> > for RunnerA, RunnerB etc...
> >
> > But I do worry that without prefix even generic options could cause
> > confusion. For example if the use of --network is substantially
> > different between runnerA vs runnerB then the user will only have this
> > information by reading the help. It will also mean that a pipeline which
> > is expected to work both on-premise on RunnerA and in the cloud on
> > RunnerB could fail because the format of the options to pass to
> > --network are different.
> >
> > Cheers
> >
> > Reza
> >
> > *From: *Kenneth Knowles mailto:k...@apache.org>>
> > *Date: *Sat, 4 May 2019 at 03:54
> > *To: *dev
> >
> > Even though they are in classes named for specific runners, they are
> > not namespaced. All PipelineOptions exist in a global namespace so
> > they need to be careful to be very precise.
> >
> > It is a good point that even though they may be multiple uses for
> > "machine type" they are probably not going to both happen at the
> > same time.
> >
> > If it becomes an issue, another thing we could do would be to add
> > namespacing support so options have less spooky action, or at least
> > have a way to resolve it when it happens on accident.
> >
> > Kenn
> >
> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
> > mailto:chamik...@google.com>> wrote:
> >
> > Also, we do have runner specific options classes where truly
> > runner specific options can go.
> >
> >
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
> >
> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> >
> > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay  > > wrote:
> >
> > I agree, that is a good point.
> >
> > *From: *Lukasz Cwik  lc...@google.com>>
> > *Date: *Fri, May 3, 2019 at 9:37 AM
> > *To: *dev
> >
> > The concept of a machine type isn't necessarily limited
> > to Dataflow. If it made sense for a runner, they could
> > use AWS/Azure machine types as well.
> >
> > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
> > mailto:al...@google.com>> wrote:
> >
> > This idea was discussed in a PR a few months ago,
> > and JIRA was filed as a follow up [1]. IMO, it makes
> > sense to use a namespace prefix. The primary issue
> > here is that, such a change will very likely be a
> > backward incompatible change and would be hard to do
> > before the next major version.
> >
> > [1] https://issues.apache.org/jira/browse/BEAM-6531
> >
> > *From: *Reza Rokni  > >
> > *Date: *Thu, May 2, 2019 at 8:00 PM
> > *To: *  > >
> >
> > Hi,
> >
> > Was reading this SO question:
> >
> >
> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
> >
> > And noticed that in
> >
> >
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
> >
> > The option is called --worker_machine_type.
> >
> > I wonder if runner 

Re: Better naming for runner specific options

2019-05-06 Thread Maximilian Michels
Good points. As already mentioned there is no namespacing between the 
different pipeline option classes. In particular, there is no separate 
namespace for system and user options which is most concerning.


I'm in favor of an optional namespace using the class name of the 
defining pipeline option class. That way we would at least be able to 
resolve duplicate option names. For example, if there were was "optionX" 
in class A and B, we could use "A#optionX" to refer to it from class A.


-Max

On 04.05.19 02:23, Reza Rokni wrote:

Great point Lukasz, worker machine could be relevant to multiple runners.

Perhaps for parameters that could have multiple runner relevance, the 
doc could be rephrased to reflect its potential multiple uses. For 
example change the help information to start with a generic reference " 
worker type on the runner" followed by runner specific behavior expected 
for RunnerA, RunnerB etc...


But I do worry that without prefix even generic options could cause 
confusion. For example if the use of --network is substantially 
different between runnerA vs runnerB then the user will only have this 
information by reading the help. It will also mean that a pipeline which 
is expected to work both on-premise on RunnerA and in the cloud on 
RunnerB could fail because the format of the options to pass to 
--network are different.


Cheers

Reza

*From: *Kenneth Knowles mailto:k...@apache.org>>
*Date: *Sat, 4 May 2019 at 03:54
*To: *dev

Even though they are in classes named for specific runners, they are
not namespaced. All PipelineOptions exist in a global namespace so
they need to be careful to be very precise.

It is a good point that even though they may be multiple uses for
"machine type" they are probably not going to both happen at the
same time.

If it becomes an issue, another thing we could do would be to add
namespacing support so options have less spooky action, or at least
have a way to resolve it when it happens on accident.

Kenn

On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:

Also, we do have runner specific options classes where truly
runner specific options can go.


https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java

https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

On Fri, May 3, 2019 at 9:50 AM Ahmet Altay mailto:al...@google.com>> wrote:

I agree, that is a good point.

*From: *Lukasz Cwik mailto:lc...@google.com>>
*Date: *Fri, May 3, 2019 at 9:37 AM
*To: *dev

The concept of a machine type isn't necessarily limited
to Dataflow. If it made sense for a runner, they could
use AWS/Azure machine types as well.

On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
mailto:al...@google.com>> wrote:

This idea was discussed in a PR a few months ago,
and JIRA was filed as a follow up [1]. IMO, it makes
sense to use a namespace prefix. The primary issue
here is that, such a change will very likely be a
backward incompatible change and would be hard to do
before the next major version.

[1] https://issues.apache.org/jira/browse/BEAM-6531

*From: *Reza Rokni mailto:r...@google.com>>
*Date: *Thu, May 2, 2019 at 8:00 PM
*To: * mailto:dev@beam.apache.org>>

Hi,

Was reading this SO question:


https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has

And noticed that in


https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

The option is called --worker_machine_type.

I wonder if runner specific options should have
the runner in the prefix? Something like
--dataflow_worker_machine_type?

Cheers
Reza

-- 


This email may be confidential and privileged.
If you received this communication by mistake,
please don't forward it to anyone else, please
erase all copies and attachments, and please let
me know that it has gone to the wrong person.

The above terms reflect a potential business
arrangement, 

Re: Better naming for runner specific options

2019-05-03 Thread Reza Rokni
Great point Lukasz, worker machine could be relevant to multiple runners.

Perhaps for parameters that could have multiple runner relevance, the doc
could be rephrased to reflect its potential multiple uses. For example
change the help information to start with a generic reference " worker type
on the runner" followed by runner specific behavior expected for RunnerA,
RunnerB etc...

But I do worry that without prefix even generic options could cause
confusion. For example if the use of --network is substantially different
between runnerA vs runnerB then the user will only have this information by
reading the help. It will also mean that a pipeline which is expected to
work both on-premise on RunnerA and in the cloud on RunnerB could fail
because the format of the options to pass to --network are different.

Cheers

Reza

*From: *Kenneth Knowles 
*Date: *Sat, 4 May 2019 at 03:54
*To: *dev

Even though they are in classes named for specific runners, they are not
> namespaced. All PipelineOptions exist in a global namespace so they need to
> be careful to be very precise.
>
> It is a good point that even though they may be multiple uses for "machine
> type" they are probably not going to both happen at the same time.
>
> If it becomes an issue, another thing we could do would be to add
> namespacing support so options have less spooky action, or at least have a
> way to resolve it when it happens on accident.
>
> Kenn
>
> On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath 
> wrote:
>
>> Also, we do have runner specific options classes where truly runner
>> specific options can go.
>>
>>
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>
>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>
>> On Fri, May 3, 2019 at 9:50 AM Ahmet Altay  wrote:
>>
>>> I agree, that is a good point.
>>>
>>> *From: *Lukasz Cwik 
>>> *Date: *Fri, May 3, 2019 at 9:37 AM
>>> *To: *dev
>>>
>>> The concept of a machine type isn't necessarily limited to Dataflow. If
 it made sense for a runner, they could use AWS/Azure machine types as well.

 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay  wrote:

> This idea was discussed in a PR a few months ago, and JIRA was filed
> as a follow up [1]. IMO, it makes sense to use a namespace prefix. The
> primary issue here is that, such a change will very likely be a backward
> incompatible change and would be hard to do before the next major version.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6531
>
> *From: *Reza Rokni 
> *Date: *Thu, May 2, 2019 at 8:00 PM
> *To: * 
>
> Hi,
>>
>> Was reading this SO question:
>>
>>
>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>
>> And noticed that in
>>
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>> The option is called --worker_machine_type.
>>
>> I wonder if runner specific options should have the runner in the
>> prefix? Something like --dataflow_worker_machine_type?
>>
>> Cheers
>> Reza
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are
>> provided solely as a basis for further discussion, and are not intended 
>> to
>> be and do not constitute a legally binding obligation. No legally binding
>> obligations will be created, implied, or inferred until an agreement in
>> final form is executed in writing by all parties involved.
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: Better naming for runner specific options

2019-05-03 Thread Kenneth Knowles
Even though they are in classes named for specific runners, they are not
namespaced. All PipelineOptions exist in a global namespace so they need to
be careful to be very precise.

It is a good point that even though they may be multiple uses for "machine
type" they are probably not going to both happen at the same time.

If it becomes an issue, another thing we could do would be to add
namespacing support so options have less spooky action, or at least have a
way to resolve it when it happens on accident.

Kenn

On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath 
wrote:

> Also, we do have runner specific options classes where truly runner
> specific options can go.
>
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>
> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> On Fri, May 3, 2019 at 9:50 AM Ahmet Altay  wrote:
>
>> I agree, that is a good point.
>>
>> *From: *Lukasz Cwik 
>> *Date: *Fri, May 3, 2019 at 9:37 AM
>> *To: *dev
>>
>> The concept of a machine type isn't necessarily limited to Dataflow. If
>>> it made sense for a runner, they could use AWS/Azure machine types as well.
>>>
>>> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay  wrote:
>>>
 This idea was discussed in a PR a few months ago, and JIRA was filed as
 a follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
 issue here is that, such a change will very likely be a backward
 incompatible change and would be hard to do before the next major version.

 [1] https://issues.apache.org/jira/browse/BEAM-6531

 *From: *Reza Rokni 
 *Date: *Thu, May 2, 2019 at 8:00 PM
 *To: * 

 Hi,
>
> Was reading this SO question:
>
>
> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>
> And noticed that in
>
>
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> The option is called --worker_machine_type.
>
> I wonder if runner specific options should have the runner in the
> prefix? Something like --dataflow_worker_machine_type?
>
> Cheers
> Reza
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and 
> do
> not constitute a legally binding obligation. No legally binding 
> obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>



Re: Better naming for runner specific options

2019-05-03 Thread Chamikara Jayalath
Also, we do have runner specific options classes where truly runner
specific options can go.

https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

On Fri, May 3, 2019 at 9:50 AM Ahmet Altay  wrote:

> I agree, that is a good point.
>
> *From: *Lukasz Cwik 
> *Date: *Fri, May 3, 2019 at 9:37 AM
> *To: *dev
>
> The concept of a machine type isn't necessarily limited to Dataflow. If it
>> made sense for a runner, they could use AWS/Azure machine types as well.
>>
>> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay  wrote:
>>
>>> This idea was discussed in a PR a few months ago, and JIRA was filed as
>>> a follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
>>> issue here is that, such a change will very likely be a backward
>>> incompatible change and would be hard to do before the next major version.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>>
>>> *From: *Reza Rokni 
>>> *Date: *Thu, May 2, 2019 at 8:00 PM
>>> *To: * 
>>>
>>> Hi,

 Was reading this SO question:


 https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has

 And noticed that in


 https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

 The option is called --worker_machine_type.

 I wonder if runner specific options should have the runner in the
 prefix? Something like --dataflow_worker_machine_type?

 Cheers
 Reza

 --

 This email may be confidential and privileged. If you received this
 communication by mistake, please don't forward it to anyone else, please
 erase all copies and attachments, and please let me know that it has gone
 to the wrong person.

 The above terms reflect a potential business arrangement, are provided
 solely as a basis for further discussion, and are not intended to be and do
 not constitute a legally binding obligation. No legally binding obligations
 will be created, implied, or inferred until an agreement in final form is
 executed in writing by all parties involved.

>>>


Re: Better naming for runner specific options

2019-05-03 Thread Ahmet Altay
I agree, that is a good point.

*From: *Lukasz Cwik 
*Date: *Fri, May 3, 2019 at 9:37 AM
*To: *dev

The concept of a machine type isn't necessarily limited to Dataflow. If it
> made sense for a runner, they could use AWS/Azure machine types as well.
>
> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay  wrote:
>
>> This idea was discussed in a PR a few months ago, and JIRA was filed as a
>> follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
>> issue here is that, such a change will very likely be a backward
>> incompatible change and would be hard to do before the next major version.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>
>> *From: *Reza Rokni 
>> *Date: *Thu, May 2, 2019 at 8:00 PM
>> *To: * 
>>
>> Hi,
>>>
>>> Was reading this SO question:
>>>
>>>
>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>
>>> And noticed that in
>>>
>>>
>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>
>>> The option is called --worker_machine_type.
>>>
>>> I wonder if runner specific options should have the runner in the
>>> prefix? Something like --dataflow_worker_machine_type?
>>>
>>> Cheers
>>> Reza
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>


Re: Better naming for runner specific options

2019-05-03 Thread Lukasz Cwik
The concept of a machine type isn't necessarily limited to Dataflow. If it
made sense for a runner, they could use AWS/Azure machine types as well.

On Fri, May 3, 2019 at 9:32 AM Ahmet Altay  wrote:

> This idea was discussed in a PR a few months ago, and JIRA was filed as a
> follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
> issue here is that, such a change will very likely be a backward
> incompatible change and would be hard to do before the next major version.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6531
>
> *From: *Reza Rokni 
> *Date: *Thu, May 2, 2019 at 8:00 PM
> *To: * 
>
> Hi,
>>
>> Was reading this SO question:
>>
>>
>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>
>> And noticed that in
>>
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>> The option is called --worker_machine_type.
>>
>> I wonder if runner specific options should have the runner in the prefix?
>> Something like --dataflow_worker_machine_type?
>>
>> Cheers
>> Reza
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>


Re: Better naming for runner specific options

2019-05-03 Thread Ahmet Altay
This idea was discussed in a PR a few months ago, and JIRA was filed as a
follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
issue here is that, such a change will very likely be a backward
incompatible change and would be hard to do before the next major version.

[1] https://issues.apache.org/jira/browse/BEAM-6531

*From: *Reza Rokni 
*Date: *Thu, May 2, 2019 at 8:00 PM
*To: * 

Hi,
>
> Was reading this SO question:
>
>
> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>
> And noticed that in
>
>
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> The option is called --worker_machine_type.
>
> I wonder if runner specific options should have the runner in the prefix?
> Something like --dataflow_worker_machine_type?
>
> Cheers
> Reza
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>