Hello Beam Community! I propose we stop calling the Apache Beam Go SDK experimental.
This thread is to discuss it as a community, and any conditions that remain that would prevent the exit. *tl;dr;* *Ask Questions for answers and links! I have both.* This entails including it officially in the Release process, removing the various "experimental" text throughout the repo etc, and otherwise treating it like Python and Java. Some Go specific tasks around dep versioning. The Go SDK implements the beam model efficiently for most batch tasks, including basic windowing. Apache Beam Go jobs can execute, and are tested on all Portable runners. The core APIs are not going to change in incompatible ways going forward. Scalable transforms can be written through SplittableDoFns or via Cross Language transforms. The SDK isn't 100% feature complete, but keeping it experimental doesn't help with that any further. Communities grow through contributions and use, and experimental markers dissuade users. There's plenty to do in order expand what can be done with the SDK. (Contributions welcome) *Why Exit Experimental now?* Typically when we call an SDK or API Experimental, it's because there's a risk that API or behaviors may change significantly. This in turn, leads to additional work for users of the SDK on every release which leads to sticking to older versions or forking to preserve behavior. Version updates should be looked forward to, and viewed as having little risk. Further while there's been previous dicussion about what the "low bar" is for a new SDK, it hasn't been summarily applied to the Go SDK. I feel this has hurt development and contribution of new SDK languages (inherent difficulty of SDK development notwithstanding). When the SDK was designed, it wasn't entirely clear what the Beam Model should look like in an opinionated language like Go. Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it means for a language without Generics, or overloading, or inheritance to implement the beam model. One could largely throw away static types (like Python), but this approach rings hollow for Go. It would not do if the approach couldn't grow and scale to the Beam Model. It's also hard to tell if an API is any good before there are users. Further, in the early days of Portability, there wasn't a way to write scalable DoFns, dynamically or otherwise. It's an incredible bottleneck to need to do all initial fanout of work on a single machine, write everything to a Reshuffle, just in order to scale up. Without being able to scale, Beam is little more than overhead. At this point, both of these needs are met within the Go SDK for open source. *Background* The Go SDK has been a part of the beam repo for a few years now, since it was accidentally merged into master. Since then it's been called experimental, and not officially part of the releases. Of the SDKs, it's was always designed around Beam Portability first. It never had any "Legacy" (SDK x Runner specific ) workers. It's always used the Beam Pipeline protos and FnAPI to execute jobs, first with some very experimental code on Dataflow, but now on all portable supported runners, like Flink, Spark, the Python Portable runner, and Dataflow. *API Stability* The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline construction since it was first merged in, and there are no changes to that on the horizon that can't be made in a backwards compatible manner. Largely these are related to New Features, or usability improvements enabled by the advent of Go Generics (think of "real" KV, emitter, and iterator types). It's an open secret that the Go SDK has largely been under work for use within Google. It's use is called FlumeGo, representing the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline processing engine. Thus most of the focus on improving batch execution. FlumeGo sees ample use today, and there hasn't been a call for fundamental changes to the API for ergonomic or usability concerns. *Scalability* Google could get away without the Go SDK having an SDK side scalability solution as a result of it's integration with Flume. However, those days are now past. The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which supports writing scalable batch transforms natively in the Go SDK. The SDK also supports Cross Language Transforms, with Beam Schema encodings. With it, production hardened transforms from Java and Python are a wrapper away. Presently, Daniel Oliveira (who implemented the SDF side work, and completed the Xlang work,) is adding a wrapper for the Java Kafka IO using Cross Language Transforms, which is often been requested. This will also enable use of the Beam SQL transforms that java enables. *Features* The Go SDK implements the Beam C=core. The Go SDK implements standard coders, allows for user DoFns, and CombineFns and access to core transforms like Flatten, GroupByKey, and features like Side Inputs, Windowing, and User Metrics. Basic windowing will be fully supported for batch even through lifted combines in the 2.32.0 release. All of the above enables Beam Go to be versatile for batch execution on portable runners, and for simple streaming pipelines. *Repo Testing* On precommit the Go SDK runs all it's unit tests. On top of that, it runs all it's integration tests against the Python Portable runner, making it quick and robust to detect breaking changes without overspending community resources. Those same tests are also run against Dataflow, Flink, and Spark. The tests are executable against all runners via the appropriate Go commands (if you've stood up your own job management server), or Gradle commands (which will spin up runner instances for you). Documentation for executing tests and adding new ones is on the wiki. [2] They are accessible to Go developers as they're implemented with the standard Go testing tools. *Shortcomings* That said, there's still much to do. Let me briefly tell you what doesn't work, and it's up to you to weigh whether they block being out of experimental. At present, only a textio has been implemented as Splittable DoFn. Once the Kafka wrapper is merged in, it will serve as a the first example for future contributions for new transform wrappers for the Go SDK. Transforms and IOs are lacking, but at this point users are empowered to write their own DoFns or wrap existing transforms for Cross Language use. In the core SDK, more streaming focused features have yet to be implemented, but they're largely additions to what exists already rather than total rebuilds. Much of the work is definining how a user specifies their desires, and turning those into the appropriate FnAPI requests at execution time. Back in October I wrote at length on the wiki [1] what's missing for additional streaming features. While we have bolstered our testing recently, there's likely still more we could test to improve our confidence in the SDK, in particular regarding the included transforms libraries and examples. *Moving Forward* My immediate plan is to work on incorporating the Go SDK fully into the Beam Programming Guide. I've audited the guide [3], and am beginning to add missing content and filling in the Go specific gaps. This will be tied to improving the Go Doc with more Go specific user documentation that isn't appropriate for the BPG. And resolving the LICENSE issue around the public display of that GoDoc. If this proposal is accepted by a binding vote, I will incorporate the SDK into the release process, and remove the "experimental" language around the SDK. This largely entails updating the release scripts to also build and publish the Go SDK Docker containers. As for releasing the code, we're technically already doing so whenever we tag a release branch [4]. The clearest signal to the Go community however will be migrating the SDK to use Go Modules for dependency version control, which Daniel is planning on working on after his Kafka task. This will put our repo infrastructure, SDK contributors, and users on the same footing when it comes to dependency management. It will remove the "+incompatible" tags one sees on the pkg.go.dev list at [4]. I'm very happy to answer any questions you might have about the SDK, and provide additional links as needed. I intentionally avoided a link barrage in this email, as they can distract from the point: The SDK is ready for folks to use it, we need to tell them that they can rather than they shouldn't. Robert Burke Defacto Beam Go TL [0] https://s.apache.org/beam-go-sdk-design-rfc [1] https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips [3] https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 (SDK Audit sheet) [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
