On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen <c...@google.com> wrote:
> What I mean is that a user may find that it works for them to pass > "--myarg blah" and access it as "options.myarg" without explicitly defining > a "my_arg" flag due to the added logic. This is not the intended behavior > and we may want to change this implementation detail in the future. > However, having this logic in a released version makes it hard to change > this behavior since users may erroneously depend on this undocumented > behavior. Instead, we should namespace / scope this so that it is obvious > that this is meant for runner (and not Beam user) consumption. > > On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise <t...@apache.org> wrote: > >> Can you please elaborate more what practical problems this introduces for >> users? >> >> I can see that this change allows a user to specify a runner specific >> option, which in the future may change because we decide to scope >> differently. If this only affects users of the portable Flink runner (like >> us), then no need to revert, because at this early stage we prefer >> something that works over being blocked. >> >> It would also be really great if some of the core Python SDK developers >> could help out with the design aspects and PR reviews of changes that >> affect common Python code. Anyone who specifically wants to be tagged on >> relevant JIRAs and PRs? >> > I would be happy to be tagged, and I can also help with including other relevant folks whenever possible. In general I think Robert, Charles, myself are good candidates. > >> Thanks >> >> >> On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay <al...@google.com> wrote: >> >>> >>> >>> On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen <c...@google.com> wrote: >>> >>>> For context, I made comments on https://github.com/apache/ >>>> beam/pull/6600 noting that the changes being made were not good for >>>> Beam backwards-compatibility. The change as is allows users to use >>>> pipeline options without explicitly defining them, which is not the type of >>>> usage we would like to encourage since we prefer to be explicit whenever >>>> possible. If users write pipelines with this sort of pattern, they will >>>> potentially encounter pain when upgrading to a later version since this is >>>> an implementation detail and not an officially supported pattern. I agree >>>> with the comments above that this is ultimately a scoping issue. I would >>>> not have a problem with these changes if they were explicitly scoped under >>>> either a runner or unparsed options namespace. >>>> >>>> As a second note, since the 2.8.0 release is being cut right now, >>>> because of these backwards-compatibility concerns, I would suggest >>>> reverting these changes, at least until 2.8.0 is cut, so we can have a >>>> discussion here before committing to and releasing any API-level changes. >>>> >>> >>> +1 I would like to revert the changes in order not rush this into the >>> release. Once this discussion results in an agreement changes can be >>> brought back. >>> >>> >>>> >>>> On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde <hero...@google.com> >>>> wrote: >>>> >>>>> Agree that pipeline options lack some mechanism for scoping. It is >>>>> also not always possible distinguish options meant to be consumed at >>>>> pipeline construction time, by the runner, by the SDK harness, by the user >>>>> code or any combination -- and this causes confusion every now and then. >>>>> >>>>> For Dataflow, we have been using "experiments" for arbitrary >>>>> runner-specific options. It's simply a string list pipeline option that >>>>> all >>>>> SDKs support and, for Go at least, is sent to portable runners. Flink can >>>>> do the same in the short term to move forward. >>>>> >>>>> Henning >>>>> >>>>> >>>>> On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise <t...@apache.org> wrote: >>>>> >>>>>> [moving to the list] >>>>>> >>>>>> The requirement driving this part of the change was to allow a user >>>>>> to specify pipeline options that a runner supports without having to >>>>>> declare those in each language SDK. >>>>>> >>>>>> In the specific scenario, we have options that the Flink runner >>>>>> supports (and can validate), that are not enumerated in the Python SDK. >>>>>> >>>>>> I think we have a bigger problem scoping pipeline options. For >>>>>> example, the runner options are dumped into the SDK worker. There is >>>>>> also a >>>>>> possibility of name collisions. So I think this would benefit from >>>>>> broader >>>>>> feedback. >>>>>> >>>>>> Thanks, >>>>>> Thomas >>>>>> >>>>>> >>>>>> ---------- Forwarded message --------- >>>>>> From: Charles Chen <notificati...@github.com> >>>>>> Date: Fri, Oct 12, 2018 at 8:36 AM >>>>>> Subject: Re: [apache/beam] [BEAM-5442] Store duplicate unknown >>>>>> options in a list argument (#6600) >>>>>> To: apache/beam <b...@noreply.github.com> >>>>>> Cc: Thomas Weise <thomas.we...@gmail.com>, Mention < >>>>>> ment...@noreply.github.com> >>>>>> >>>>>> >>>>>> CC: @tweise <https://github.com/tweise> >>>>>> >>>>>> — >>>>>> You are receiving this because you were mentioned. >>>>>> Reply to this email directly, view it on GitHub >>>>>> <https://github.com/apache/beam/pull/6600#issuecomment-429367754>, >>>>>> or mute the thread >>>>>> <https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T> >>>>>> . >>>>>> >>>>> >>>