On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen <c...@google.com> wrote:

> What I mean is that a user may find that it works for them to pass
> "--myarg blah" and access it as "options.myarg" without explicitly defining
> a "my_arg" flag due to the added logic.  This is not the intended behavior
> and we may want to change this implementation detail in the future.
> However, having this logic in a released version makes it hard to change
> this behavior since users may erroneously depend on this undocumented
> behavior.  Instead, we should namespace / scope this so that it is obvious
> that this is meant for runner (and not Beam user) consumption.
>
> On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise <t...@apache.org> wrote:
>
>> Can you please elaborate more what practical problems this introduces for
>> users?
>>
>> I can see that this change allows a user to specify a runner specific
>> option, which in the future may change because we decide to scope
>> differently. If this only affects users of the portable Flink runner (like
>> us), then no need to revert, because at this early stage we prefer
>> something that works over being blocked.
>>
>> It would also be really great if some of the core Python SDK developers
>> could help out with the design aspects and PR reviews of changes that
>> affect common Python code. Anyone who specifically wants to be tagged on
>> relevant JIRAs and PRs?
>>
>
I would be happy to be tagged, and I can also help with including other
relevant folks whenever possible. In general I think Robert, Charles,
myself are good candidates.



>
>> Thanks
>>
>>
>> On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen <c...@google.com> wrote:
>>>
>>>> For context, I made comments on https://github.com/apache/
>>>> beam/pull/6600 noting that the changes being made were not good for
>>>> Beam backwards-compatibility.  The change as is allows users to use
>>>> pipeline options without explicitly defining them, which is not the type of
>>>> usage we would like to encourage since we prefer to be explicit whenever
>>>> possible.  If users write pipelines with this sort of pattern, they will
>>>> potentially encounter pain when upgrading to a later version since this is
>>>> an implementation detail and not an officially supported pattern.  I agree
>>>> with the comments above that this is ultimately a scoping issue.  I would
>>>> not have a problem with these changes if they were explicitly scoped under
>>>> either a runner or unparsed options namespace.
>>>>
>>>> As a second note, since the 2.8.0 release is being cut right now,
>>>> because of these backwards-compatibility concerns, I would suggest
>>>> reverting these changes, at least until 2.8.0 is cut, so we can have a
>>>> discussion here before committing to and releasing any API-level changes.
>>>>
>>>
>>> +1 I would like to revert the changes in order not rush this into the
>>> release. Once this discussion results in an agreement changes can be
>>> brought back.
>>>
>>>
>>>>
>>>> On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde <hero...@google.com>
>>>> wrote:
>>>>
>>>>> Agree that pipeline options lack some mechanism for scoping. It is
>>>>> also not always possible distinguish options meant to be consumed at
>>>>> pipeline construction time, by the runner, by the SDK harness, by the user
>>>>> code or any combination -- and this causes confusion every now and then.
>>>>>
>>>>> For Dataflow, we have been using "experiments" for arbitrary
>>>>> runner-specific options. It's simply a string list pipeline option that 
>>>>> all
>>>>> SDKs support and, for Go at least, is sent to portable runners. Flink can
>>>>> do the same in the short term to move forward.
>>>>>
>>>>> Henning
>>>>>
>>>>>
>>>>> On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise <t...@apache.org> wrote:
>>>>>
>>>>>> [moving to the list]
>>>>>>
>>>>>> The requirement driving this part of the change was to allow a user
>>>>>> to specify pipeline options that a runner supports without having to
>>>>>> declare those in each language SDK.
>>>>>>
>>>>>> In the specific scenario, we have options that the Flink runner
>>>>>> supports (and can validate), that are not enumerated in the Python SDK.
>>>>>>
>>>>>> I think we have a bigger problem scoping pipeline options. For
>>>>>> example, the runner options are dumped into the SDK worker. There is 
>>>>>> also a
>>>>>> possibility of name collisions. So I think this would benefit from 
>>>>>> broader
>>>>>> feedback.
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ---------
>>>>>> From: Charles Chen <notificati...@github.com>
>>>>>> Date: Fri, Oct 12, 2018 at 8:36 AM
>>>>>> Subject: Re: [apache/beam] [BEAM-5442] Store duplicate unknown
>>>>>> options in a list argument (#6600)
>>>>>> To: apache/beam <b...@noreply.github.com>
>>>>>> Cc: Thomas Weise <thomas.we...@gmail.com>, Mention <
>>>>>> ment...@noreply.github.com>
>>>>>>
>>>>>>
>>>>>> CC: @tweise <https://github.com/tweise>
>>>>>>
>>>>>> —
>>>>>> You are receiving this because you were mentioned.
>>>>>> Reply to this email directly, view it on GitHub
>>>>>> <https://github.com/apache/beam/pull/6600#issuecomment-429367754>,
>>>>>> or mute the thread
>>>>>> <https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T>
>>>>>> .
>>>>>>
>>>>>
>>>

Reply via email to