+1 to declaring Golang support out of experimental once the Go Modules issues are solved. I don't think an SDK needs to support every feature to be accepted, especially now that we can do cross-language transforms, and Go definitely supports enough to be quite useful. (WRT streaming, my understanding is that Go supports the streaming model with windows and timestamps, and runs fine on a streaming runner, even if more advanced features like state and timers aren't yet available.)
This is a great milestone. On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]> wrote: > > WOW! Big news. > > I'm supportive of leaving experimental status after Go Modules are completed > and the LICENSE issue is resolved. I don't think that lacking streaming > support is a blocker. The other thing I checked to see was if there were > metrics available on metrics.beam.apache.org, specifically for measuring code > health via post-commit over time, which there are and the passing test rate > is high (Huzzah!). The one thing that surprised me from your summary is that > when Go introduces generics it won't result in any backwards incompatible > changes in Apache Beam. That's great news, but does it mean there will be a > need to support both non-generic and generic APIs moving forward? It seems > like generics will be introduced in the Go 1.17 release (optimistically) in > August this year. > > > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]> wrote: >> >> Hello Beam Community! >> >> I propose we stop calling the Apache Beam Go SDK experimental. >> >> This thread is to discuss it as a community, and any conditions that remain >> that would prevent the exit. >> >> tl;dr; >> Ask Questions for answers and links! I have both. >> This entails including it officially in the Release process, removing the >> various "experimental" text throughout the repo etc, >> and otherwise treating it like Python and Java. Some Go specific tasks >> around dep versioning. >> >> The Go SDK implements the beam model efficiently for most batch tasks, >> including basic windowing. >> Apache Beam Go jobs can execute, and are tested on all Portable runners. >> The core APIs are not going to change in incompatible ways going forward. >> Scalable transforms can be written through SplittableDoFns or via Cross >> Language transforms. >> >> The SDK isn't 100% feature complete, but keeping it experimental doesn't >> help with that any further. >> Communities grow through contributions and use, and experimental markers >> dissuade users. >> There's plenty to do in order expand what can be done with the SDK. >> (Contributions welcome) >> >> Why Exit Experimental now? >> >> Typically when we call an SDK or API Experimental, it's because there's a >> risk that API or behaviors may change significantly. >> This in turn, leads to additional work for users of the SDK on every release >> which leads to sticking to older versions or forking >> to preserve behavior. Version updates should be looked forward to, and >> viewed as having little risk. Further while there's been >> previous dicussion about what the "low bar" is for a new SDK, it hasn't been >> summarily applied to the Go SDK. I feel this has >> hurt development and contribution of new SDK languages (inherent difficulty >> of SDK development notwithstanding). >> >> When the SDK was designed, it wasn't entirely clear what the Beam Model >> should look like in an opinionated language like Go. >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) >> goes into detail what it means for a language without >> Generics, or overloading, or inheritance to implement the beam model. One >> could largely throw away static types (like Python), >> but this approach rings hollow for Go. It would not do if the approach >> couldn't grow and scale to the Beam Model. It's also hard >> to tell if an API is any good before there are users. >> >> Further, in the early days of Portability, there wasn't a way to write >> scalable DoFns, dynamically or otherwise. It's an incredible >> bottleneck to need to do all initial fanout of work on a single machine, >> write everything to a Reshuffle, just in order to scale up. >> Without being able to scale, Beam is little more than overhead. >> >> At this point, both of these needs are met within the Go SDK for open source. >> >> Background >> >> The Go SDK has been a part of the beam repo for a few years now, since it >> was accidentally merged into master. >> Since then it's been called experimental, and not officially part of the >> releases. >> >> Of the SDKs, it's was always designed around Beam Portability first. It >> never had any "Legacy" (SDK x Runner specific ) workers. >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first >> with some very experimental code on Dataflow, but now >> on all portable supported runners, like Flink, Spark, the Python Portable >> runner, and Dataflow. >> >> API Stability >> >> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline >> construction since it was first merged in, and there are no >> changes to that on the horizon that can't be made in a backwards compatible >> manner. Largely these are related to New Features, or >> usability improvements enabled by the advent of Go Generics (think of "real" >> KV, emitter, and iterator types). >> >> It's an open secret that the Go SDK has largely been under work for use >> within Google. It's use is called FlumeGo, representing >> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline >> processing engine. Thus most of the focus on improving >> batch execution. FlumeGo sees ample use today, and there hasn't been a call >> for fundamental changes to the API for ergonomic or >> usability concerns. >> >> Scalability >> >> Google could get away without the Go SDK having an SDK side scalability >> solution as a result of it's integration with Flume. >> However, those days are now past. >> >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which >> supports writing scalable batch transforms natively >> in the Go SDK. >> The SDK also supports Cross Language Transforms, with Beam Schema encodings. >> With it, production hardened transforms >> from Java and Python are a wrapper away. >> >> Presently, Daniel Oliveira (who implemented the SDF side work, and completed >> the Xlang work,) is adding a wrapper for the >> Java Kafka IO using Cross Language Transforms, which is often been >> requested. This will also enable use of the Beam SQL >> transforms that java enables. >> >> Features >> >> The Go SDK implements the Beam C=core. The Go SDK implements standard >> coders, allows for user DoFns, and CombineFns and access >> to core transforms like Flatten, GroupByKey, and features like Side Inputs, >> Windowing, and User Metrics. >> Basic windowing will be fully supported for batch even through lifted >> combines in the 2.32.0 release. >> >> All of the above enables Beam Go to be versatile for batch execution on >> portable runners, and for simple streaming pipelines. >> >> Repo Testing >> >> On precommit the Go SDK runs all it's unit tests. On top of that, it runs >> all it's integration tests against the Python Portable runner, >> making it quick and robust to detect breaking changes without overspending >> community resources. Those same tests are also >> run against Dataflow, Flink, and Spark. >> >> The tests are executable against all runners via the appropriate Go commands >> (if you've stood up your own job management server), >> or Gradle commands (which will spin up runner instances for you). >> Documentation for executing tests and adding new ones >> is on the wiki. [2] They are accessible to Go developers as they're >> implemented with the standard Go testing tools. >> >> Shortcomings >> That said, there's still much to do. Let me briefly tell you what doesn't >> work, and it's up to you to weigh whether they block >> being out of experimental. >> >> At present, only a textio has been implemented as Splittable DoFn. >> Once the Kafka wrapper is merged in, it will serve as a the first example >> for future contributions for >> new transform wrappers for the Go SDK. >> Transforms and IOs are lacking, but at this point users are empowered to >> write their own DoFns or wrap existing transforms for Cross Language use. >> >> In the core SDK, more streaming focused features have yet to be implemented, >> but they're largely additions to what exists already >> rather than total rebuilds. Much of the work is definining how a user >> specifies their desires, and turning those into the appropriate >> FnAPI requests at execution time. Back in October I wrote at length on the >> wiki [1] what's missing for additional streaming features. >> >> While we have bolstered our testing recently, there's likely still more we >> could test to improve our confidence in the SDK, >> in particular regarding the included transforms libraries and examples. >> >> Moving Forward >> >> My immediate plan is to work on incorporating the Go SDK fully into the Beam >> Programming Guide. I've audited the guide [3], and >> am beginning to add missing content and filling in the Go specific gaps. >> This will be tied to improving the Go Doc with more Go >> specific user documentation that isn't appropriate for the BPG. >> And resolving the LICENSE issue around the public display of that GoDoc. >> >> If this proposal is accepted by a binding vote, I will incorporate the SDK >> into the release process, and remove the "experimental" >> language around the SDK. This largely entails updating the release scripts >> to also build and publish the Go SDK Docker containers. >> As for releasing the code, we're technically already doing so whenever we >> tag a release branch [4]. >> >> The clearest signal to the Go community however will be migrating the SDK to >> use Go Modules for dependency version control, >> which Daniel is planning on working on after his Kafka task. This will put >> our repo infrastructure, SDK contributors, and users >> on the same footing when it comes to dependency management. It will remove >> the "+incompatible" tags one sees on the >> pkg.go.dev list at [4]. >> >> I'm very happy to answer any questions you might have about the SDK, and >> provide additional links as needed. I intentionally avoided >> a link barrage in this email, as they can distract from the point: The SDK >> is ready for folks to use it, we need to tell them that they can >> rather than they shouldn't. >> >> Robert Burke >> Defacto Beam Go TL >> >> [0] https://s.apache.org/beam-go-sdk-design-rfc >> [1] >> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips >> [3] >> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 >> (SDK Audit sheet) >> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
