Oups forgot to write one question. Will this come with revamped website instructions/doc for golang too?
On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <[email protected]> wrote: > > Huge +1 > > This is definitely something many people have asked about, so it is > great to see it finally happening. > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <[email protected]> wrote: > > > > +1 awesome > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <[email protected]> wrote: > >> > >> Sounds reasonable to me. I agree. We'll aim to get those (Go modules and > >> LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut > >> if release images aren't added to the 2.32 process. > >> > >> Regarding Go Generics: at some point in the future, we may want a harder > >> break between a newer Generic first API and and the current version, but > >> there's no rush. Generics/TypeParameters in Go aren't identical to the > >> feature referred to by that term in Java, C++, Rust, etc, so it'll take a > >> bit of time for that expertise to develop. > >> > >> However, by the current nature of Go, we had to have pretty sophisticated > >> reflective analysis to handle DoFns and map them to their graph inputs. > >> So, adding new helpers like a KV, emitter, and Iterator types, shouldn't > >> be too difficult. Changing Go SDK internals to use generics (like the > >> implementation of Stats DoFns like Min, Max, etc) would also be able to be > >> made transparently to most users, and certainly any of the framework for > >> execution time handling (the "worker's SDK harness") would be able to be > >> cleaned up if need be. Finally, adding more sophisticated DoFn > >> registration and code generation would be able to replace the optional > >> code generator entirely, saving some users a `go generate` step, > >> simplifying getting improved execution performance. > >> > >> Changing things like making a Type Parameterized PCollection, would be far > >> more involved, as would trying to use some kind of Apply format. The lack > >> of Method Overrides prevents the apply chaining approach. Or at least > >> prevents it from working simply. > >> > >> Finally, Go Generics won't be available until Go 1.18, which isn't until > >> next year. See https://blog.golang.org/generics-proposal for details. > >> > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling > >> convention, leading to a modest performance improvement across the board. > >> > >> Cheers, > >> Robert Burke > >> > >> On 2021/06/15 18:10:46, Robert Bradshaw <[email protected]> wrote: > >> > +1 to declaring Golang support out of experimental once the Go Modules > >> > issues are solved. I don't think an SDK needs to support every feature > >> > to be accepted, especially now that we can do cross-language > >> > transforms, and Go definitely supports enough to be quite useful. (WRT > >> > streaming, my understanding is that Go supports the streaming model > >> > with windows and timestamps, and runs fine on a streaming runner, even > >> > if more advanced features like state and timers aren't yet available.) > >> > > >> > This is a great milestone. > >> > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]> > >> > wrote: > >> > > > >> > > WOW! Big news. > >> > > > >> > > I'm supportive of leaving experimental status after Go Modules are > >> > > completed and the LICENSE issue is resolved. I don't think that > >> > > lacking streaming support is a blocker. The other thing I checked to > >> > > see was if there were metrics available on metrics.beam.apache.org, > >> > > specifically for measuring code health via post-commit over time, > >> > > which there are and the passing test rate is high (Huzzah!). The one > >> > > thing that surprised me from your summary is that when Go introduces > >> > > generics it won't result in any backwards incompatible changes in > >> > > Apache Beam. That's great news, but does it mean there will be a need > >> > > to support both non-generic and generic APIs moving forward? It seems > >> > > like generics will be introduced in the Go 1.17 release > >> > > (optimistically) in August this year. > >> > > > >> > > > >> > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]> > >> > > wrote: > >> > >> > >> > >> Hello Beam Community! > >> > >> > >> > >> I propose we stop calling the Apache Beam Go SDK experimental. > >> > >> > >> > >> This thread is to discuss it as a community, and any conditions that > >> > >> remain that would prevent the exit. > >> > >> > >> > >> tl;dr; > >> > >> Ask Questions for answers and links! I have both. > >> > >> This entails including it officially in the Release process, removing > >> > >> the various "experimental" text throughout the repo etc, > >> > >> and otherwise treating it like Python and Java. Some Go specific > >> > >> tasks around dep versioning. > >> > >> > >> > >> The Go SDK implements the beam model efficiently for most batch > >> > >> tasks, including basic windowing. > >> > >> Apache Beam Go jobs can execute, and are tested on all Portable > >> > >> runners. > >> > >> The core APIs are not going to change in incompatible ways going > >> > >> forward. > >> > >> Scalable transforms can be written through SplittableDoFns or via > >> > >> Cross Language transforms. > >> > >> > >> > >> The SDK isn't 100% feature complete, but keeping it experimental > >> > >> doesn't help with that any further. > >> > >> Communities grow through contributions and use, and experimental > >> > >> markers dissuade users. > >> > >> There's plenty to do in order expand what can be done with the SDK. > >> > >> (Contributions welcome) > >> > >> > >> > >> Why Exit Experimental now? > >> > >> > >> > >> Typically when we call an SDK or API Experimental, it's because > >> > >> there's a risk that API or behaviors may change significantly. > >> > >> This in turn, leads to additional work for users of the SDK on every > >> > >> release which leads to sticking to older versions or forking > >> > >> to preserve behavior. Version updates should be looked forward to, > >> > >> and viewed as having little risk. Further while there's been > >> > >> previous dicussion about what the "low bar" is for a new SDK, it > >> > >> hasn't been summarily applied to the Go SDK. I feel this has > >> > >> hurt development and contribution of new SDK languages (inherent > >> > >> difficulty of SDK development notwithstanding). > >> > >> > >> > >> When the SDK was designed, it wasn't entirely clear what the Beam > >> > >> Model should look like in an opinionated language like Go. > >> > >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc > >> > >> [0]) goes into detail what it means for a language without > >> > >> Generics, or overloading, or inheritance to implement the beam model. > >> > >> One could largely throw away static types (like Python), > >> > >> but this approach rings hollow for Go. It would not do if the > >> > >> approach couldn't grow and scale to the Beam Model. It's also hard > >> > >> to tell if an API is any good before there are users. > >> > >> > >> > >> Further, in the early days of Portability, there wasn't a way to > >> > >> write scalable DoFns, dynamically or otherwise. It's an incredible > >> > >> bottleneck to need to do all initial fanout of work on a single > >> > >> machine, write everything to a Reshuffle, just in order to scale up. > >> > >> Without being able to scale, Beam is little more than overhead. > >> > >> > >> > >> At this point, both of these needs are met within the Go SDK for open > >> > >> source. > >> > >> > >> > >> Background > >> > >> > >> > >> The Go SDK has been a part of the beam repo for a few years now, > >> > >> since it was accidentally merged into master. > >> > >> Since then it's been called experimental, and not officially part of > >> > >> the releases. > >> > >> > >> > >> Of the SDKs, it's was always designed around Beam Portability first. > >> > >> It never had any "Legacy" (SDK x Runner specific ) workers. > >> > >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, > >> > >> first with some very experimental code on Dataflow, but now > >> > >> on all portable supported runners, like Flink, Spark, the Python > >> > >> Portable runner, and Dataflow. > >> > >> > >> > >> API Stability > >> > >> > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn and > >> > >> pipeline construction since it was first merged in, and there are no > >> > >> changes to that on the horizon that can't be made in a backwards > >> > >> compatible manner. Largely these are related to New Features, or > >> > >> usability improvements enabled by the advent of Go Generics (think of > >> > >> "real" KV, emitter, and iterator types). > >> > >> > >> > >> It's an open secret that the Go SDK has largely been under work for > >> > >> use within Google. It's use is called FlumeGo, representing > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch > >> > >> pipeline processing engine. Thus most of the focus on improving > >> > >> batch execution. FlumeGo sees ample use today, and there hasn't been > >> > >> a call for fundamental changes to the API for ergonomic or > >> > >> usability concerns. > >> > >> > >> > >> Scalability > >> > >> > >> > >> Google could get away without the Go SDK having an SDK side > >> > >> scalability solution as a result of it's integration with Flume. > >> > >> However, those days are now past. > >> > >> > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, > >> > >> which supports writing scalable batch transforms natively > >> > >> in the Go SDK. > >> > >> The SDK also supports Cross Language Transforms, with Beam Schema > >> > >> encodings. With it, production hardened transforms > >> > >> from Java and Python are a wrapper away. > >> > >> > >> > >> Presently, Daniel Oliveira (who implemented the SDF side work, and > >> > >> completed the Xlang work,) is adding a wrapper for the > >> > >> Java Kafka IO using Cross Language Transforms, which is often been > >> > >> requested. This will also enable use of the Beam SQL > >> > >> transforms that java enables. > >> > >> > >> > >> Features > >> > >> > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements standard > >> > >> coders, allows for user DoFns, and CombineFns and access > >> > >> to core transforms like Flatten, GroupByKey, and features like Side > >> > >> Inputs, Windowing, and User Metrics. > >> > >> Basic windowing will be fully supported for batch even through lifted > >> > >> combines in the 2.32.0 release. > >> > >> > >> > >> All of the above enables Beam Go to be versatile for batch execution > >> > >> on portable runners, and for simple streaming pipelines. > >> > >> > >> > >> Repo Testing > >> > >> > >> > >> On precommit the Go SDK runs all it's unit tests. On top of that, it > >> > >> runs all it's integration tests against the Python Portable runner, > >> > >> making it quick and robust to detect breaking changes without > >> > >> overspending community resources. Those same tests are also > >> > >> run against Dataflow, Flink, and Spark. > >> > >> > >> > >> The tests are executable against all runners via the appropriate Go > >> > >> commands (if you've stood up your own job management server), > >> > >> or Gradle commands (which will spin up runner instances for you). > >> > >> Documentation for executing tests and adding new ones > >> > >> is on the wiki. [2] They are accessible to Go developers as they're > >> > >> implemented with the standard Go testing tools. > >> > >> > >> > >> Shortcomings > >> > >> That said, there's still much to do. Let me briefly tell you what > >> > >> doesn't work, and it's up to you to weigh whether they block > >> > >> being out of experimental. > >> > >> > >> > >> At present, only a textio has been implemented as Splittable DoFn. > >> > >> Once the Kafka wrapper is merged in, it will serve as a the first > >> > >> example for future contributions for > >> > >> new transform wrappers for the Go SDK. > >> > >> Transforms and IOs are lacking, but at this point users are empowered > >> > >> to write their own DoFns or wrap existing transforms for Cross > >> > >> Language use. > >> > >> > >> > >> In the core SDK, more streaming focused features have yet to be > >> > >> implemented, but they're largely additions to what exists already > >> > >> rather than total rebuilds. Much of the work is definining how a user > >> > >> specifies their desires, and turning those into the appropriate > >> > >> FnAPI requests at execution time. Back in October I wrote at length > >> > >> on the wiki [1] what's missing for additional streaming features. > >> > >> > >> > >> While we have bolstered our testing recently, there's likely still > >> > >> more we could test to improve our confidence in the SDK, > >> > >> in particular regarding the included transforms libraries and > >> > >> examples. > >> > >> > >> > >> Moving Forward > >> > >> > >> > >> My immediate plan is to work on incorporating the Go SDK fully into > >> > >> the Beam Programming Guide. I've audited the guide [3], and > >> > >> am beginning to add missing content and filling in the Go specific > >> > >> gaps. This will be tied to improving the Go Doc with more Go > >> > >> specific user documentation that isn't appropriate for the BPG. > >> > >> And resolving the LICENSE issue around the public display of that > >> > >> GoDoc. > >> > >> > >> > >> If this proposal is accepted by a binding vote, I will incorporate > >> > >> the SDK into the release process, and remove the "experimental" > >> > >> language around the SDK. This largely entails updating the release > >> > >> scripts to also build and publish the Go SDK Docker containers. > >> > >> As for releasing the code, we're technically already doing so > >> > >> whenever we tag a release branch [4]. > >> > >> > >> > >> The clearest signal to the Go community however will be migrating the > >> > >> SDK to use Go Modules for dependency version control, > >> > >> which Daniel is planning on working on after his Kafka task. This > >> > >> will put our repo infrastructure, SDK contributors, and users > >> > >> on the same footing when it comes to dependency management. It will > >> > >> remove the "+incompatible" tags one sees on the > >> > >> pkg.go.dev list at [4]. > >> > >> > >> > >> I'm very happy to answer any questions you might have about the SDK, > >> > >> and provide additional links as needed. I intentionally avoided > >> > >> a link barrage in this email, as they can distract from the point: > >> > >> The SDK is ready for folks to use it, we need to tell them that they > >> > >> can > >> > >> rather than they shouldn't. > >> > >> > >> > >> Robert Burke > >> > >> Defacto Beam Go TL > >> > >> > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc > >> > >> [1] > >> > >> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips > >> > >> [3] > >> > >> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 > >> > >> (SDK Audit sheet) > >> > >> [4] > >> > >> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions > >> >
