Hi,
Coming back to this, is the general consensus that this can be addressed
via https://issues.apache.org/jira/browse/BEAM-6531 in Beam 3.0?
Cheers
Reza
On Tue, 7 May 2019 at 23:15, Valentyn Tymofieiev <valen...@google.com
<mailto:valen...@google.com>> wrote:
I think using RunnerOptions was an idea at some point, but in
Python, we ended up parsing options from the runner api without
populating RunnerOptions, and RunnerOptions was eventually removed [1].
If we decide to rename options, a path forward may be to have
runners recognize both old and new names until Beam 3.0, but update
codebase, examples and documentation to use new names.
[1]
https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815
*From:*Ahmet Altay <al...@google.com <mailto:al...@google.com>>
*Date:*Mon, May 6, 2019, 6:01 PM
*To:*dev
There is RunnerOptions already. Its options are populated by
querying the job service. Any portable runner is able to provide
a list of options that is runner specific through that mechanism.
*From: *Reza Rokni <r...@google.com <mailto:r...@google.com>>
*Date: *Mon, May 6, 2019 at 2:57 PM
*To: * <dev@beam.apache.org <mailto:dev@beam.apache.org>>
So the options here would be moved to runner options?
https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
In Java they are in DataflowPipelineWorkerPoolOptions and of
course we have FlinkPipelineOptions etc...
*From: *Chamikara Jayalath <chamik...@google.com
<mailto:chamik...@google.com>>
*Date: *Tue, 7 May 2019 at 05:29
*To: *dev
On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik
<lc...@google.com <mailto:lc...@google.com>> wrote:
There were also discussions[1] in the past about
scoping PipelineOptions to specific PTransforms.
Would scoping PipelineOptions to PTransforms make
this a more general solution?
1:
https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
Is this just for pipeline construction time or also for
runtime ? Trying to scope options for transforms at
runtime might complicate things in the presence of
optimizations such as fusion.
On Mon, May 6, 2019 at 12:02 PM Ankur Goenka
<goe...@google.com <mailto:goe...@google.com>> wrote:
Having namespaces for option makes sense.
I think, along with a help command to print all
the options given the runner name will be useful.
As for the scope of name spacing, I think that
assigning a logical name space gives more
flexibility around how and where we declare
options. It also make future refactoring possible.
On Mon, May 6, 2019 at 7:50 AM Maximilian
Michels <m...@apache.org <mailto:m...@apache.org>>
wrote:
Good points. As already mentioned there is
no namespacing between the
different pipeline option classes. In
particular, there is no separate
namespace for system and user options which
is most concerning.
I'm in favor of an optional namespace using
the class name of the
defining pipeline option class. That way we
would at least be able to
resolve duplicate option names. For example,
if there were was "optionX"
in class A and B, we could use "A#optionX"
to refer to it from class A.
I think this solves the original problem. Runner
specific options will have unique names that includes
the runner (in options class). I guess to be complete we
also have to include the package (module for Python) ?
If an option is globally unique, users should be able to
specify it without qualifying (at least for backwards
compatibility).
-Max
On 04.05.19 02:23, Reza Rokni wrote:
> Great point Lukasz, worker machine could
be relevant to multiple runners.
>
> Perhaps for parameters that could have
multiple runner relevance, the
> doc could be rephrased to reflect its
potential multiple uses. For
> example change the help information to
start with a generic reference "
> worker type on the runner" followed by
runner specific behavior expected
> for RunnerA, RunnerB etc...
>
> But I do worry that without prefix even
generic options could cause
> confusion. For example if the use of
--network is substantially
> different between runnerA vs runnerB then
the user will only have this
> information by reading the help. It will
also mean that a pipeline which
> is expected to work both on-premise on
RunnerA and in the cloud on
> RunnerB could fail because the format of
the options to pass to
> --network are different.
>
> Cheers
>
> Reza
>
> *From: *Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>
<mailto:k...@apache.org
<mailto:k...@apache.org>>>
> *Date: *Sat, 4 May 2019 at 03:54
> *To: *dev
>
> Even though they are in classes named
for specific runners, they are
> not namespaced. All PipelineOptions
exist in a global namespace so
> they need to be careful to be very
precise.
>
> It is a good point that even though
they may be multiple uses for
> "machine type" they are probably not
going to both happen at the
> same time.
>
> If it becomes an issue, another thing
we could do would be to add
> namespacing support so options have
less spooky action, or at least
> have a way to resolve it when it
happens on accident.
>
> Kenn
>
> On Fri, May 3, 2019 at 10:43 AM
Chamikara Jayalath
> <chamik...@google.com
<mailto:chamik...@google.com>
<mailto:chamik...@google.com
<mailto:chamik...@google.com>>> wrote:
>
> Also, we do have runner specific
options classes where truly
> runner specific options can go.
>
>
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>
https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> On Fri, May 3, 2019 at 9:50 AM
Ahmet Altay <al...@google.com
<mailto:al...@google.com>
> <mailto:al...@google.com
<mailto:al...@google.com>>> wrote:
>
> I agree, that is a good point.
>
> *From: *Lukasz Cwik
<lc...@google.com <mailto:lc...@google.com>
<mailto:lc...@google.com
<mailto:lc...@google.com>>>
> *Date: *Fri, May 3, 2019 at
9:37 AM
> *To: *dev
>
> The concept of a machine
type isn't necessarily limited
> to Dataflow. If it made
sense for a runner, they could
> use AWS/Azure machine
types as well.
>
> On Fri, May 3, 2019 at
9:32 AM Ahmet Altay
> <al...@google.com
<mailto:al...@google.com>
<mailto:al...@google.com
<mailto:al...@google.com>>> wrote:
>
> This idea was
discussed in a PR a few months ago,
> and JIRA was filed as
a follow up [1]. IMO, it makes
> sense to use a
namespace prefix. The primary issue
> here is that, such a
change will very likely be a
> backward incompatible
change and would be hard to do
> before the next major
version.
>
> [1]
https://issues.apache.org/jira/browse/BEAM-6531
>
> *From: *Reza Rokni
<r...@google.com <mailto:r...@google.com>
>
<mailto:r...@google.com
<mailto:r...@google.com>>>
> *Date: *Thu, May 2,
2019 at 8:00 PM
> *To: *
<dev@beam.apache.org
<mailto:dev@beam.apache.org>
>
<mailto:dev@beam.apache.org
<mailto:dev@beam.apache.org>>>
>
> Hi,
>
> Was reading this
SO question:
>
>
https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>
> And noticed that in
>
>
https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> The option is
called --worker_machine_type.
>
> I wonder if
runner specific options should have
> the runner in the
prefix? Something like
>
--dataflow_worker_machine_type?
>
> Cheers
> Reza
>
> --
>
> This email may be
confidential and privileged.
> If you received
this communication by mistake,
> please don't
forward it to anyone else, please
> erase all copies
and attachments, and please let
> me know that it
has gone to the wrong person.
>
> The above terms
reflect a potential business
> arrangement, are
provided solely as a basis for
> further
discussion, and are not intended to be
> and do not
constitute a legally binding
> obligation. No
legally binding obligations will
> be created,
implied, or inferred until an
> agreement in
final form is executed in writing
> by all parties
involved.
>
>
>
> --
>
> This email may be confidential and
privileged. If you received this
> communication by mistake, please don't
forward it to anyone else, please
> erase all copies and attachments, and
please let me know that it has
> gone to the wrong person.
>
> The above terms reflect a potential
business arrangement, are provided
> solely as a basis for further discussion,
and are not intended to be and
> do not constitute a legally binding
obligation. No legally binding
> obligations will be created, implied, or
inferred until an agreement in
> final form is executed in writing by all
parties involved.
>
--
This email may be confidential and privileged. If you
received this communication by mistake, please don't forward
it to anyone else, please erase all copies and attachments,
and please let me know that it has gone to the wrong person.
The above terms reflect a potential business arrangement,
are provided solely as a basis for further discussion, and
are not intended to be and do not constitute a legally
binding obligation. No legally binding obligations will be
created, implied, or inferred until an agreement in final
form is executed in writing by all parties involved.
--
This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has
gone to the wrong person.
The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and
do not constitute a legally binding obligation. No legally binding
obligations will be created, implied, or inferred until an agreement in
final form is executed in writing by all parties involved.