As a follow-up to the proposed changes from my first email, I've worked on
a doc with a more detailed changelist, including details still up for
discussion:
https://docs.google.com/document/d/1UeG5uNO00xCByGEZzDXk0m0LghX6HBWlMfRbMv_Xiyc/edit?usp=sharing

The doc is mostly full of my brainstorming on what the next version of the
user-facing Go SDF API will look like, so it's not too polished. But if
anyone's interested in this, I welcome any and all feedback!

On Mon, Jan 13, 2020 at 2:22 PM Luke Cwik <[email protected]> wrote:

> Thanks for the update and I agree with the points that you have made.
>
> On Fri, Jan 10, 2020 at 5:58 PM Robert Burke <[email protected]> wrote:
>
>> Thank you for sharing Daniel!
>>
>> Resolving SplittableDoFns for the Go SDK even just as far as initial
>> splitting will take the SDK that much closer to exiting its experimental
>> status.
>>
>> It's especially exciting seeing this work on Flink and on the Python
>> direct runner!
>>
>> On Fri, Jan 10, 2020, 5:36 PM Daniel Oliveira <[email protected]>
>> wrote:
>>
>>> Hey Beam devs,
>>>
>>> So several months ago I posted my Go SDF proposal and got a lot of good
>>> feedback (thread
>>> <https://lists.apache.org/thread.html/327bc72a0b30e18c6152b562bac2613c0edc942465d67b215830819e%40%3Cdev.beam.apache.org%3E>,
>>> doc <https://s.apache.org/beam-go-sdf>). Since then I've been working
>>> on implementing it and I've got an initial prototype ready to show off! It
>>> works with initial splitting on Flink, and has a decently documented API.
>>> Also in the second part of the email I'll also be proposing changes to the
>>> original doc, based on my experience working on this prototype.
>>>
>>> To be clear, this is *not* ready to officially go into Beam yet; the
>>> API is still likely to go through changes. Rather, I'm showing this off to
>>> show that progress is being made on SDFs, and to provide some context to
>>> the changes I'll be proposing below.
>>>
>>> Here's a link to the repo and branch so you can download it, and a link
>>> to the changes specifically:
>>> Repo: https://github.com/youngoli/beam/tree/gosdf
>>> Changes:
>>> https://github.com/apache/beam/commit/28140ee3471d6cb80e74a16e6fd108cc380d4831
>>>
>>> If you give it a try and have any thoughts, please let me know! I'm open
>>> to any and all feedback.
>>>
>>> ==================================
>>>
>>> Proposed Changes
>>> Doc: https://s.apache.org/beam-go-sdf (Select "Version 1" from version
>>> history.)
>>>
>>> For anyone reading this who hasn't already read the doc above, I suggest
>>> reading it first, since I'll be referring to concepts from it.
>>>
>>> After working on the prototype I've changed my mind on the original
>>> decisions to go with an interface approach and a combined restriction +
>>> tracker. But I don't want to go all in and create another doc with a
>>> detailed proposal, so I've laid out a brief summary of the changes to get
>>> some initial feedback before I go ahead and start working on these changes
>>> in detail. Please let me know what you think!
>>>
>>> *1. Change from native Go interfaces to dynamic reflection-based API.*
>>>
>>> Instead of the native Go interfaces (SplittableDoFn, RProvider, and
>>> RTracker) described in the doc and implemented in the prototype, use the
>>> same dynamic approach that the Go SDK already uses for DoFns: Use the
>>> reflection system to examine the names and signatures of methods in the
>>> user's DoFn, RProvider, and RTracker.
>>>
>>> Original approach reasoning:
>>>
>>>    - Simpler, so faster to implement and less bug-prone.
>>>    - The extra burden on the user to keep types consistent is ok since
>>>    most users of SDFs are more advanced
>>>
>>> Change reasoning:
>>>
>>>    - In the prototype, I found interfaces to require too much extra
>>>    boilerplate which added more complexity than expected. (Examples: 
>>> Constant
>>>    casting,
>>>    - More consistent API: Inconsistency between regular DoFns (dynamic)
>>>    and SDF API (interfaces) was jarring and unintuitive when implementing 
>>> SDFs
>>>    as a user.
>>>
>>> Implementation: Full details are up for discussion, but the goal is to
>>> make the RProvider and  RTracker interfaces dynamic, so we can replace all
>>> instances of interface{} in the methods with the actual element types
>>> (i.e. fake generics). Also uses of the RProvider and RTracker interfaces in
>>> signatures can be replaced with the implementations of those
>>> providers/trackers. This will require a good amount of additional work in
>>> the DoFn validation codebase and the code generator. Plus a fair amount of
>>> additional user code validation will be needed and more testing since the
>>> new code is more complex.
>>>
>>> *2. Seperate the restriction tracker and restriction.*
>>>
>>> Currently the API has the restriction combined with the tracker. In most
>>> other SDKs and within the SDF model, the two are usually separate concepts,
>>> and this change is to follow that approach and split the two.
>>>
>>> Original approach reasoning:
>>>
>>>    - It was considered simpler to avoid another level of type casting
>>>    in the API with the interface approach.
>>>
>>> Change reasoning:
>>>
>>>    - We are no longer going with the interface approach. With "fake
>>>    generics", it is simpler to keep the two concepts separate.
>>>    - Requiring users to specify custom coders in order to only encode
>>>    the restriction and not the tracker ended up adding additional complexity
>>>    anyway.
>>>
>>> Implementation: In the API have the restriction tracker initialized with
>>> a restriction object accessible via a getter. The restriction itself will
>>> be the only thing serialized, so it will be wrapped and unwrapped with the
>>> tracker before the user code is invoked. This wouldn't add very little work
>>> as it would mostly be bundled with the interface->dynamic approach change.
>>>
>>>
>>> Thanks,
>>> Daniel Oliveira
>>>
>>

Reply via email to