Sounds reasonable to me. I agree. We'll aim to get those (Go modules and 
LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut if 
release images aren't added to the 2.32 process.

Regarding Go Generics: at some point in the future, we may want a harder break 
between a newer Generic first API and and the current version, but there's no 
rush. Generics/TypeParameters in Go aren't identical to the feature referred to 
by that term in Java, C++, Rust, etc, so it'll take a bit of time for that 
expertise to develop.

However, by the current nature of Go, we had to have pretty sophisticated 
reflective analysis to handle DoFns and map them to their graph inputs. So, 
adding new helpers like a KV, emitter, and Iterator types, shouldn't be too 
difficult. Changing Go SDK internals to use generics (like the implementation 
of Stats DoFns like Min, Max, etc) would also be able to be made transparently 
to most users, and certainly any of the framework for execution time handling 
(the "worker's SDK harness") would be able to be cleaned up if need be. 
Finally, adding more sophisticated DoFn registration and code generation would 
be able to replace the optional code generator entirely, saving some users a 
`go generate` step, simplifying getting improved execution performance.

Changing things like making a Type Parameterized PCollection, would be far more 
involved, as would trying to use some kind of Apply format. The lack of Method 
Overrides prevents the apply chaining approach. Or at least prevents it from 
working simply.

Finally, Go Generics won't be available until Go 1.18, which isn't until next 
year. See https://blog.golang.org/generics-proposal for details. 

Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling 
convention, leading to a modest performance improvement across the board.

Cheers,
Robert Burke

On 2021/06/15 18:10:46, Robert Bradshaw <[email protected]> wrote: 
> +1 to declaring Golang support out of experimental once the Go Modules
> issues are solved. I don't think an SDK needs to support every feature
> to be accepted, especially now that we can do cross-language
> transforms, and Go definitely supports enough to be quite useful. (WRT
> streaming, my understanding is that Go supports the streaming model
> with windows and timestamps, and runs fine on a streaming runner, even
> if more advanced features like state and timers aren't yet available.)
> 
> This is a great milestone.
> 
> On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]> wrote:
> >
> > WOW! Big news.
> >
> > I'm supportive of leaving experimental status after Go Modules are 
> > completed and the LICENSE issue is resolved. I don't think that lacking 
> > streaming support is a blocker. The other thing I checked to see was if 
> > there were metrics available on metrics.beam.apache.org, specifically for 
> > measuring code health via post-commit over time, which there are and the 
> > passing test rate is high (Huzzah!). The one thing that surprised me from 
> > your summary is that when Go introduces generics it won't result in any 
> > backwards incompatible changes in Apache Beam. That's great news, but does 
> > it mean there will be a need to support both non-generic and generic APIs 
> > moving forward? It seems like generics will be introduced in the Go 1.17 
> > release (optimistically) in August this year.
> >
> >
> >
> > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]> wrote:
> >>
> >> Hello Beam Community!
> >>
> >> I propose we stop calling the Apache Beam Go SDK experimental.
> >>
> >> This thread is to discuss it as a community, and any conditions that 
> >> remain that would prevent the exit.
> >>
> >> tl;dr;
> >> Ask Questions for answers and links! I have both.
> >> This entails including it officially in the Release process, removing the 
> >> various "experimental" text throughout the repo etc,
> >> and otherwise treating it like Python and Java. Some Go specific tasks 
> >> around dep versioning.
> >>
> >> The Go SDK implements the beam model efficiently for most batch tasks, 
> >> including basic windowing.
> >> Apache Beam Go jobs can execute, and are tested on all Portable runners.
> >> The core APIs are not going to change in incompatible ways going forward.
> >> Scalable transforms can be written through SplittableDoFns or via Cross 
> >> Language transforms.
> >>
> >> The SDK isn't 100% feature complete, but keeping it experimental doesn't 
> >> help with that any further.
> >> Communities grow through contributions and use, and experimental markers 
> >> dissuade users.
> >> There's plenty to do in order expand what can be done with the SDK. 
> >> (Contributions welcome)
> >>
> >> Why Exit Experimental now?
> >>
> >> Typically when we call an SDK or API Experimental, it's because there's a 
> >> risk that API or behaviors may change significantly.
> >> This in turn, leads to additional work for users of the SDK on every 
> >> release which leads to sticking to older versions or forking
> >> to preserve behavior. Version updates should be looked forward to, and 
> >> viewed as having little risk. Further while there's been
> >> previous dicussion about what the "low bar" is for a new SDK, it hasn't 
> >> been summarily applied to the Go SDK. I feel this has
> >> hurt development and contribution of new SDK languages (inherent 
> >> difficulty of SDK development notwithstanding).
> >>
> >> When the SDK was designed, it wasn't entirely clear what the Beam Model 
> >> should look like in an opinionated language like Go.
> >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) 
> >> goes into detail what it means for a language without
> >> Generics, or overloading, or inheritance to implement the beam model. One 
> >> could largely throw away static types (like Python),
> >> but this approach rings hollow for Go. It would not do if the approach 
> >> couldn't grow and scale to the Beam Model. It's also hard
> >> to tell if an API is any good before there are users.
> >>
> >> Further, in the early days of Portability, there wasn't a way to write 
> >> scalable DoFns, dynamically or otherwise. It's an incredible
> >> bottleneck to need to do all initial fanout of work on a single machine, 
> >> write everything to a Reshuffle, just in order to scale up.
> >> Without being able to scale, Beam is little more than overhead.
> >>
> >> At this point, both of these needs are met within the Go SDK for open 
> >> source.
> >>
> >> Background
> >>
> >> The Go SDK has been a part of the beam repo for a few years now, since it 
> >> was accidentally merged into master.
> >> Since then it's been called experimental, and not officially part of the 
> >> releases.
> >>
> >> Of the SDKs, it's was always designed around Beam Portability first. It 
> >> never had any "Legacy" (SDK x Runner specific ) workers.
> >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first 
> >> with some very experimental code on Dataflow, but now
> >> on all portable supported runners, like Flink, Spark, the Python Portable 
> >> runner, and Dataflow.
> >>
> >> API Stability
> >>
> >> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline 
> >> construction since it was first merged in, and there are no
> >> changes to that on the horizon that can't be made in a backwards 
> >> compatible manner. Largely these are related to New Features, or
> >> usability improvements enabled by the advent of Go Generics (think of 
> >> "real" KV, emitter, and iterator types).
> >>
> >> It's an open secret that the Go SDK has largely been under work for use 
> >> within Google. It's use is called FlumeGo, representing
> >> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline 
> >> processing engine. Thus most of the focus on improving
> >> batch execution. FlumeGo sees ample use today, and there hasn't been a 
> >> call for fundamental changes to the API for ergonomic or
> >> usability concerns.
> >>
> >> Scalability
> >>
> >> Google could get away without the Go SDK having an SDK side scalability 
> >> solution as a result of it's integration with Flume.
> >> However, those days are now past.
> >>
> >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, 
> >> which supports writing scalable batch transforms natively
> >> in the Go SDK.
> >> The SDK also supports Cross Language Transforms, with Beam Schema 
> >> encodings. With it, production hardened transforms
> >> from Java and Python are a wrapper away.
> >>
> >> Presently, Daniel Oliveira (who implemented the SDF side work, and 
> >> completed the Xlang work,) is adding a wrapper for the
> >> Java Kafka IO using Cross Language Transforms, which is often been 
> >> requested. This will also enable use of the Beam SQL
> >> transforms that java enables.
> >>
> >> Features
> >>
> >> The Go SDK implements the Beam C=core. The Go SDK implements standard 
> >> coders, allows for user DoFns, and CombineFns and access
> >> to core transforms like Flatten, GroupByKey, and features like Side 
> >> Inputs, Windowing, and User Metrics.
> >> Basic windowing will be fully supported for batch even through lifted 
> >> combines in the 2.32.0 release.
> >>
> >> All of the above enables Beam Go to be versatile for batch execution on 
> >> portable runners, and for simple streaming pipelines.
> >>
> >> Repo Testing
> >>
> >> On precommit the Go SDK runs all it's unit tests. On top of that, it runs 
> >> all it's integration tests against the Python Portable runner,
> >> making it quick and robust to detect breaking changes without overspending 
> >> community resources. Those same tests are also
> >> run against Dataflow, Flink, and Spark.
> >>
> >> The tests are executable against all runners via the appropriate Go 
> >> commands (if you've stood up your own job management server),
> >> or Gradle commands (which will spin up runner instances for you). 
> >> Documentation for executing tests and adding new ones
> >> is on the wiki. [2] They are accessible to Go developers as they're 
> >> implemented with the standard Go testing tools.
> >>
> >> Shortcomings
> >> That said, there's still much to do. Let me briefly tell you what doesn't 
> >> work, and it's up to you to weigh whether they block
> >> being out of experimental.
> >>
> >> At present, only a textio has been implemented as Splittable DoFn.
> >> Once the Kafka wrapper is merged in, it will serve as a the first example 
> >> for future contributions for
> >> new transform wrappers for the Go SDK.
> >> Transforms and IOs are lacking, but at this point users are empowered to 
> >> write their own DoFns or wrap existing transforms for Cross Language use.
> >>
> >> In the core SDK, more streaming focused features have yet to be 
> >> implemented, but they're largely additions to what exists already
> >> rather than total rebuilds. Much of the work is definining how a user 
> >> specifies their desires, and turning those into the appropriate
> >> FnAPI requests at execution time. Back in October I wrote at length on the 
> >> wiki [1] what's missing for additional streaming features.
> >>
> >> While we have bolstered our testing recently, there's likely still more we 
> >> could test to improve our confidence in the SDK,
> >> in particular regarding the included transforms libraries and examples.
> >>
> >> Moving Forward
> >>
> >> My immediate plan is to work on incorporating the Go SDK fully into the 
> >> Beam Programming Guide. I've audited the guide [3], and
> >> am beginning to add missing content and filling in the Go specific gaps. 
> >> This will be tied to improving the Go Doc with more Go
> >> specific user documentation that isn't appropriate for the BPG.
> >> And resolving the LICENSE issue around the public display of that GoDoc.
> >>
> >> If this proposal is accepted by a binding vote, I will incorporate the SDK 
> >> into the release process, and remove the "experimental"
> >> language around the SDK. This largely entails updating the release scripts 
> >> to also build and publish the Go SDK Docker containers.
> >> As for releasing the code, we're technically already doing so whenever we 
> >> tag a release branch [4].
> >>
> >> The clearest signal to the Go community however will be migrating the SDK 
> >> to use Go Modules for dependency version control,
> >> which Daniel is planning on working on after his Kafka task. This will put 
> >> our repo infrastructure, SDK contributors, and users
> >> on the same footing when it comes to dependency management. It will remove 
> >> the "+incompatible" tags one sees on the
> >> pkg.go.dev list at [4].
> >>
> >> I'm very happy to answer any questions you might have about the SDK, and 
> >> provide additional links as needed. I intentionally avoided
> >> a link barrage in this email, as they can distract from the point: The SDK 
> >> is ready for folks to use it, we need to tell them that they can
> >> rather than they shouldn't.
> >>
> >> Robert Burke
> >> Defacto Beam Go TL
> >>
> >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> >> [1] 
> >> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> >> [3] 
> >> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> >>  (SDK Audit sheet)
> >> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> 

Reply via email to