+1 to declaring Golang support out of experimental once the Go Modules
issues are solved. I don't think an SDK needs to support every feature
to be accepted, especially now that we can do cross-language
transforms, and Go definitely supports enough to be quite useful. (WRT
streaming, my understanding is that Go supports the streaming model
with windows and timestamps, and runs fine on a streaming runner, even
if more advanced features like state and timers aren't yet available.)

This is a great milestone.

On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]> wrote:
>
> WOW! Big news.
>
> I'm supportive of leaving experimental status after Go Modules are completed 
> and the LICENSE issue is resolved. I don't think that lacking streaming 
> support is a blocker. The other thing I checked to see was if there were 
> metrics available on metrics.beam.apache.org, specifically for measuring code 
> health via post-commit over time, which there are and the passing test rate 
> is high (Huzzah!). The one thing that surprised me from your summary is that 
> when Go introduces generics it won't result in any backwards incompatible 
> changes in Apache Beam. That's great news, but does it mean there will be a 
> need to support both non-generic and generic APIs moving forward? It seems 
> like generics will be introduced in the Go 1.17 release (optimistically) in 
> August this year.
>
>
>
> On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]> wrote:
>>
>> Hello Beam Community!
>>
>> I propose we stop calling the Apache Beam Go SDK experimental.
>>
>> This thread is to discuss it as a community, and any conditions that remain 
>> that would prevent the exit.
>>
>> tl;dr;
>> Ask Questions for answers and links! I have both.
>> This entails including it officially in the Release process, removing the 
>> various "experimental" text throughout the repo etc,
>> and otherwise treating it like Python and Java. Some Go specific tasks 
>> around dep versioning.
>>
>> The Go SDK implements the beam model efficiently for most batch tasks, 
>> including basic windowing.
>> Apache Beam Go jobs can execute, and are tested on all Portable runners.
>> The core APIs are not going to change in incompatible ways going forward.
>> Scalable transforms can be written through SplittableDoFns or via Cross 
>> Language transforms.
>>
>> The SDK isn't 100% feature complete, but keeping it experimental doesn't 
>> help with that any further.
>> Communities grow through contributions and use, and experimental markers 
>> dissuade users.
>> There's plenty to do in order expand what can be done with the SDK. 
>> (Contributions welcome)
>>
>> Why Exit Experimental now?
>>
>> Typically when we call an SDK or API Experimental, it's because there's a 
>> risk that API or behaviors may change significantly.
>> This in turn, leads to additional work for users of the SDK on every release 
>> which leads to sticking to older versions or forking
>> to preserve behavior. Version updates should be looked forward to, and 
>> viewed as having little risk. Further while there's been
>> previous dicussion about what the "low bar" is for a new SDK, it hasn't been 
>> summarily applied to the Go SDK. I feel this has
>> hurt development and contribution of new SDK languages (inherent difficulty 
>> of SDK development notwithstanding).
>>
>> When the SDK was designed, it wasn't entirely clear what the Beam Model 
>> should look like in an opinionated language like Go.
>> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) 
>> goes into detail what it means for a language without
>> Generics, or overloading, or inheritance to implement the beam model. One 
>> could largely throw away static types (like Python),
>> but this approach rings hollow for Go. It would not do if the approach 
>> couldn't grow and scale to the Beam Model. It's also hard
>> to tell if an API is any good before there are users.
>>
>> Further, in the early days of Portability, there wasn't a way to write 
>> scalable DoFns, dynamically or otherwise. It's an incredible
>> bottleneck to need to do all initial fanout of work on a single machine, 
>> write everything to a Reshuffle, just in order to scale up.
>> Without being able to scale, Beam is little more than overhead.
>>
>> At this point, both of these needs are met within the Go SDK for open source.
>>
>> Background
>>
>> The Go SDK has been a part of the beam repo for a few years now, since it 
>> was accidentally merged into master.
>> Since then it's been called experimental, and not officially part of the 
>> releases.
>>
>> Of the SDKs, it's was always designed around Beam Portability first. It 
>> never had any "Legacy" (SDK x Runner specific ) workers.
>> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first 
>> with some very experimental code on Dataflow, but now
>> on all portable supported runners, like Flink, Spark, the Python Portable 
>> runner, and Dataflow.
>>
>> API Stability
>>
>> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline 
>> construction since it was first merged in, and there are no
>> changes to that on the horizon that can't be made in a backwards compatible 
>> manner. Largely these are related to New Features, or
>> usability improvements enabled by the advent of Go Generics (think of "real" 
>> KV, emitter, and iterator types).
>>
>> It's an open secret that the Go SDK has largely been under work for use 
>> within Google. It's use is called FlumeGo, representing
>> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline 
>> processing engine. Thus most of the focus on improving
>> batch execution. FlumeGo sees ample use today, and there hasn't been a call 
>> for fundamental changes to the API for ergonomic or
>> usability concerns.
>>
>> Scalability
>>
>> Google could get away without the Go SDK having an SDK side scalability 
>> solution as a result of it's integration with Flume.
>> However, those days are now past.
>>
>> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which 
>> supports writing scalable batch transforms natively
>> in the Go SDK.
>> The SDK also supports Cross Language Transforms, with Beam Schema encodings. 
>> With it, production hardened transforms
>> from Java and Python are a wrapper away.
>>
>> Presently, Daniel Oliveira (who implemented the SDF side work, and completed 
>> the Xlang work,) is adding a wrapper for the
>> Java Kafka IO using Cross Language Transforms, which is often been 
>> requested. This will also enable use of the Beam SQL
>> transforms that java enables.
>>
>> Features
>>
>> The Go SDK implements the Beam C=core. The Go SDK implements standard 
>> coders, allows for user DoFns, and CombineFns and access
>> to core transforms like Flatten, GroupByKey, and features like Side Inputs, 
>> Windowing, and User Metrics.
>> Basic windowing will be fully supported for batch even through lifted 
>> combines in the 2.32.0 release.
>>
>> All of the above enables Beam Go to be versatile for batch execution on 
>> portable runners, and for simple streaming pipelines.
>>
>> Repo Testing
>>
>> On precommit the Go SDK runs all it's unit tests. On top of that, it runs 
>> all it's integration tests against the Python Portable runner,
>> making it quick and robust to detect breaking changes without overspending 
>> community resources. Those same tests are also
>> run against Dataflow, Flink, and Spark.
>>
>> The tests are executable against all runners via the appropriate Go commands 
>> (if you've stood up your own job management server),
>> or Gradle commands (which will spin up runner instances for you). 
>> Documentation for executing tests and adding new ones
>> is on the wiki. [2] They are accessible to Go developers as they're 
>> implemented with the standard Go testing tools.
>>
>> Shortcomings
>> That said, there's still much to do. Let me briefly tell you what doesn't 
>> work, and it's up to you to weigh whether they block
>> being out of experimental.
>>
>> At present, only a textio has been implemented as Splittable DoFn.
>> Once the Kafka wrapper is merged in, it will serve as a the first example 
>> for future contributions for
>> new transform wrappers for the Go SDK.
>> Transforms and IOs are lacking, but at this point users are empowered to 
>> write their own DoFns or wrap existing transforms for Cross Language use.
>>
>> In the core SDK, more streaming focused features have yet to be implemented, 
>> but they're largely additions to what exists already
>> rather than total rebuilds. Much of the work is definining how a user 
>> specifies their desires, and turning those into the appropriate
>> FnAPI requests at execution time. Back in October I wrote at length on the 
>> wiki [1] what's missing for additional streaming features.
>>
>> While we have bolstered our testing recently, there's likely still more we 
>> could test to improve our confidence in the SDK,
>> in particular regarding the included transforms libraries and examples.
>>
>> Moving Forward
>>
>> My immediate plan is to work on incorporating the Go SDK fully into the Beam 
>> Programming Guide. I've audited the guide [3], and
>> am beginning to add missing content and filling in the Go specific gaps. 
>> This will be tied to improving the Go Doc with more Go
>> specific user documentation that isn't appropriate for the BPG.
>> And resolving the LICENSE issue around the public display of that GoDoc.
>>
>> If this proposal is accepted by a binding vote, I will incorporate the SDK 
>> into the release process, and remove the "experimental"
>> language around the SDK. This largely entails updating the release scripts 
>> to also build and publish the Go SDK Docker containers.
>> As for releasing the code, we're technically already doing so whenever we 
>> tag a release branch [4].
>>
>> The clearest signal to the Go community however will be migrating the SDK to 
>> use Go Modules for dependency version control,
>> which Daniel is planning on working on after his Kafka task. This will put 
>> our repo infrastructure, SDK contributors, and users
>> on the same footing when it comes to dependency management. It will remove 
>> the "+incompatible" tags one sees on the
>> pkg.go.dev list at [4].
>>
>> I'm very happy to answer any questions you might have about the SDK, and 
>> provide additional links as needed. I intentionally avoided
>> a link barrage in this email, as they can distract from the point: The SDK 
>> is ready for folks to use it, we need to tell them that they can
>> rather than they shouldn't.
>>
>> Robert Burke
>> Defacto Beam Go TL
>>
>> [0] https://s.apache.org/beam-go-sdk-design-rfc
>> [1] 
>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>> [3] 
>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>  (SDK Audit sheet)
>> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions

Reply via email to