The current draft of the exit blog post is https://github.com/apache/beam/pull/15894 Comments are very welcome. I'm going to continue looking for Known issues (which will be linked to their respective JIRAs) tomorrow.
Since RC1 is getting cycled, I can also go back to the original plan of v2.33.0, if we'd like to get it out this week. On Wed, 3 Nov 2021 at 10:17, Robert Burke <[email protected]> wrote: > Investigation yielded that there's no way around the prefixed tags. The > JIRA has been commented with the explanation. > > https://github.com/apache/beam/pull/15881 has the release script updates. > > I'm working on the Exit blogpost and the updated Go SDK roadmap. The draft > PR will be linked here. > > Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm > inclined to wait for that release to finish before publishing the blogpost. > I'll link the draft PR here as soon as it's ready. > > Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also prefix > tagged so there isn't a gap in versions between the unmoduled code and > moduled code. > > Once published, that'll be the end of this thread. > > Thank you very much everyone. > > Robert Burke > Beam Go Busybody > > On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <[email protected]> wrote: > >> +1 to extra tags. They'll be trivial to add to our release process, and >> git tags are lightweight by design so I don't foresee any problems. >> >> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <[email protected]> >> wrote: >> >>> Glad you were able to figure it out. The extra tags are certainly >>> worth making this work if it's what we have to do, and shouldn't be >>> too much of a problem (until, hopefully, it's fixed on the go side). >>> >>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <[email protected]> >>> wrote: >>> > >>> > With Kyle's help with the additional tagging of the next RC, we have >>> validated that this is the currently correct approach. >>> > >>> > >>> https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam?tab=versions >>> > >>> https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam >>> > >>> > Or even: >>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam (links >>> to latest tagged version) >>> > >>> > The main cost to this approach is doubling the number of tags in the >>> tags list: https://github.com/apache/beam/tags which is not ideal, but >>> overall a small cost. There's no need for "full publish" of these >>> additional tags, so we won't be doubling our "releases" (see >>> https://github.com/apache/beam/releases). >>> > >>> > I'll still be filing a bug against the Go commands since the mandatory >>> prefixing is unintuitive, and seems unnecessary. If it becomes so, we can >>> always delete the tags from the affected branches, and cease the behavior >>> going forward. I'll search through the existing Go issues first however to >>> see if this has been previously discussed, and report my findings here >>> either way. >>> > >>> > This does require 2 small changes to release guide: The rc tagging >>> script, and the finally tagging: >>> > >>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh >>> > >>> > >>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag >>> > >>> > I'll make this change later this week (or early next) assuming there >>> are no objections. >>> > >>> > Thank you all very much for your patience, >>> > Robert Burke >>> > Beam Go Busybody >>> > >>> > >>> > On 2021/10/26 23:01:00, Robert Burke <[email protected]> wrote: >>> > > With much research in reading the Go Modules documentation, I have >>> confirmed what the issue is. >>> > > >>> > > We added the go.mod file to sdks/ under the repo root because it's a >>> cleaner spot for the change, captures the Java and Python container boot >>> code (written in Go) into the module and avoids conflicts in >>> interpretations of the vendor directory that lives at the root level. >>> > > >>> > > However, we missed that when doing so, the standard version tags >>> would only apply to modules at the root level, not at modules in >>> subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting >>> the important paragraph: >>> > > >>> > > > If a module is defined in a subdirectory within the repository, >>> that is, the module subdirectory portion of >>> > > > the module path is not empty, then each tag name must be prefixed >>> with the module subdirectory, >>> > > > followed by a slash. For example, the module >>> golang.org/x/tools/gopls is defined in the gopls >>> > > > subdirectory of the repository with root path golang.org/x/tools. >>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in >>> that repository. >>> > > >>> > > Specifically, for the Go SDK to be able to be fetched at the right >>> version, we need to have prefixed tags like "sdks/v2.33.0" or >>> "sdks/v2.34.0-RC1" >>> > > >>> > > So, the fix for the Go versioning issue is to amend our Release >>> process (including generating Release Candidate builds) to also add a >>> prefixed version tag with the same version. >>> > > >>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there >>> are no objections we can back update the 2.33.0 release branch with such a >>> prefixed tag. At which point I can also write the Official Experiemental >>> Exit Blog post. >>> > > >>> > > Thank you all for your patience. >>> > > Robert Burke >>> > > >>> > > On 2021/10/14 00:00:53, Ahmet Altay <[email protected]> wrote: >>> > > > Thank you for the detailed update! Let us know if we can help. >>> > > > >>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <[email protected]> >>> wrote: >>> > > > >>> > > > > This is a status update. >>> > > > > >>> > > > > At this point 2.33.0 is released, but there are difficulties with >>> > > > > accessing the tagged versions using the standard go tools. It's >>> currently >>> > > > > under investigation. >>> > > > > >>> > > > > Using the v2 path in a go program then running `go mod tidy` >>> will populate >>> > > > > the file with a pseudo-version rather than the latest tag >>> (v2.33.0) (eg >>> > > > > the line looks like >>> > > > > require github.com/apache/beam/sdks/v2 >>> v2.0.0-20211013181004-a9120e083008 >>> > > > > ) >>> > > > > >>> > > > > While this will work, it's not the desired experience for users >>> at this >>> > > > > point. Current downside is that the releases are not meaningful >>> targets for >>> > > > > some reason. However, we retain the other benefits of Go Modules >>> (actual >>> > > > > dependency versioning, management by go tools). >>> > > > > >>> > > > > The issue is some combination of the go tooling [A] , that we >>> added a go >>> > > > > mod file outside of the repo root [B], and that we did not >>> increment the >>> > > > > major version (v2 -> v3) when adding the go mod file [C]. >>> > > > > >>> > > > > [B] From the go documentation, this should be legal and fine, >>> even if it's >>> > > > > not recommended. This is fortunate because the root of the repo >>> would have >>> > > > > played poorly with root vendor directory, which the go tools >>> have opinions >>> > > > > on. >>> > > > > >>> > > > > [C] Incrementing the major version is recommended,in the Go >>> Modules >>> > > > > documentation, when transitioning to Go Modules. However, it >>> never said it >>> > > > > was required, nor did it indicate this current failure mode. If >>> anything >>> > > > > this should be documented in those docs, if it's not another >>> bug. We would >>> > > > > not necessarily want to declare a global v3 for beam at this >>> time, for just >>> > > > > the Go SDK, it would become confusing rather quickly. Notionally >>> there are >>> > > > > some larger breaking changes the Java and Python SDKs would want >>> to make in >>> > > > > such an event, and thus it's a larger conversation, that is out >>> of scope at >>> > > > > this time. >>> > > > > >>> > > > > This leaves [A] where some mis-understanding of the documented >>> semantics >>> > > > > occurred. I certainly expected the tagged version of the >>> non-root go-module >>> > > > > to be inherited from the parent, not wholesale ignored. As a >>> result, I'll >>> > > > > be filing a bug against the go tools to determine this, and see >>> what paths >>> > > > > forward exist. >>> > > > > >>> > > > > It's my hope to resolve this before we write a properly >>> Experimental Exit >>> > > > > blog post for the Go SDK. >>> > > > > >>> > > > > Thank you for your patience, and time. >>> > > > > Robert Burke >>> > > > > Beam Go Busybody >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > On 2021/08/23 18:12:00, Robert Burke <[email protected]> >>> wrote: >>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK >>> now uses Go >>> > > > > Modules for dependency management, simplifying Go SDK >>> contributions. [2] >>> > > > > > >>> > > > > > The Module file lives in the sdks/ directory so there's a >>> single Go >>> > > > > Module for the whole SDK, tests, examples, and any support code >>> for the >>> > > > > container boot builds. This excludes the Go SDK Code katas [3] >>> go modules >>> > > > > which can be updated once 2.33.0 has been released. >>> > > > > > >>> > > > > > PR 15365 [4] adds the SDK containers back to the release >>> builds, and >>> > > > > default uses the release specific container for docker execution >>> jobs. For >>> > > > > at least the 2.33.0 release this does mean that manual >>> validation will >>> > > > > need to explictly specify RC versions of containers. However, >>> given that >>> > > > > the Go SDK container and worker boot process rarely changes, >>> this is >>> > > > > unlikely to be an issue. >>> > > > > > >>> > > > > > At present I'm cleaning up some of the references to >>> experimental, and >>> > > > > making it clear that 2.33.0 is the first non-experimental >>> release (even >>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md >>> will be >>> > > > > updated to note the event, but a larger blogpost will happen >>> after the >>> > > > > release goes public. >>> > > > > > >>> > > > > > Cheers, >>> > > > > > Robert Burke >>> > > > > > Defacto Beam Go TL. >>> > > > > > >>> > > > > > [1] >>> > > > > >>> https://pkg.go.dev/github.com/apache/[email protected]+incompatible/sdks/go/pkg/beam >>> > > > > > [2] https://github.com/apache/beam/pull/15323 >>> > > > > > [3] >>> https://github.com/apache/beam/tree/master/learning/katas/go >>> > > > > > [4] https://github.com/apache/beam/pull/15365 >>> > > > > > >>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <[email protected]> wrote: >>> > > > > > > +1, congratulations & thank you! >>> > > > > > > >>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke < >>> [email protected]> >>> > > > > wrote: >>> > > > > > > >>> > > > > > > > Regarding documentation update: Initial PR is >>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up >>> to section >>> > > > > ~4.3. >>> > > > > > > > JIRA link for Programing Guide changes: >>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513 >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <[email protected]> >>> wrote: >>> > > > > > > > > Yup! >>> > > > > > > > > >>> > > > > > > > > My immediate plan is to work on incorporating the Go SDK >>> fully >>> > > > > into the >>> > > > > > > > > Beam Programming Guide. I've audited the guide, and >>> > > > > > > > > am beginning to add missing content and filling in the >>> Go specific >>> > > > > gaps. >>> > > > > > > > > This will be tied to improving the Go Doc with more Go >>> > > > > > > > > specific user documentation that isn't appropriate for >>> the BPG. >>> > > > > > > > > >>> > > > > > > > > My audit of the guide is here: >>> > > > > > > > > >>> > > > > > > > >>> > > > > >>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 >>> > > > > > > > > >>> > > > > > > > > The other sheets focus on features and tests. The >>> feature page >>> > > > > looks >>> > > > > > > > worse >>> > > > > > > > > than it is, as it was more productive to focus on what >>> isn't >>> > > > > available >>> > > > > > > > than >>> > > > > > > > > what is. That's a snapshot of my actual working sheet >>> but I'll be >>> > > > > > > > updating >>> > > > > > > > > it as needed. >>> > > > > > > > > >>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía < >>> [email protected]> >>> > > > > wrote: >>> > > > > > > > > >>> > > > > > > > > > Oups forgot to write one question. Will this come with >>> revamped >>> > > > > > > > > > website instructions/doc for golang too? >>> > > > > > > > > > >>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía < >>> [email protected]> >>> > > > > > > > wrote: >>> > > > > > > > > > > >>> > > > > > > > > > > Huge +1 >>> > > > > > > > > > > >>> > > > > > > > > > > This is definitely something many people have asked >>> about, so >>> > > > > it is >>> > > > > > > > > > > great to see it finally happening. >>> > > > > > > > > > > >>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles < >>> > > > > [email protected]> >>> > > > > > > > wrote: >>> > > > > > > > > > > > >>> > > > > > > > > > > > +1 awesome >>> > > > > > > > > > > > >>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke < >>> > > > > [email protected] >>> > > > > > > > > >>> > > > > > > > > > wrote: >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to >>> get those (Go >>> > > > > > > > modules >>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and >>> certainly >>> > > > > before the >>> > > > > > > > 2.33 >>> > > > > > > > > > cut if release images aren't added to the 2.32 process. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Regarding Go Generics: at some point in the >>> future, we may >>> > > > > want a >>> > > > > > > > > > harder break between a newer Generic first API and and >>> the >>> > > > > current >>> > > > > > > > version, >>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go >>> aren't >>> > > > > identical to >>> > > > > > > > the >>> > > > > > > > > > feature referred to by that term in Java, C++, Rust, >>> etc, so >>> > > > > it'll >>> > > > > > > > take a >>> > > > > > > > > > bit of time for that expertise to develop. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> However, by the current nature of Go, we had to >>> have pretty >>> > > > > > > > > > sophisticated reflective analysis to handle DoFns and >>> map them >>> > > > > to their >>> > > > > > > > > > graph inputs. So, adding new helpers like a KV, >>> emitter, and >>> > > > > Iterator >>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK >>> internals to >>> > > > > use >>> > > > > > > > > > generics (like the implementation of Stats DoFns like >>> Min, Max, >>> > > > > etc) >>> > > > > > > > would >>> > > > > > > > > > also be able to be made transparently to most users, >>> and >>> > > > > certainly any >>> > > > > > > > of >>> > > > > > > > > > the framework for execution time handling (the >>> "worker's SDK >>> > > > > harness") >>> > > > > > > > > > would be able to be cleaned up if need be. Finally, >>> adding more >>> > > > > > > > > > sophisticated DoFn registration and code generation >>> would be >>> > > > > able to >>> > > > > > > > > > replace the optional code generator entirely, saving >>> some users >>> > > > > a `go >>> > > > > > > > > > generate` step, simplifying getting improved execution >>> > > > > performance. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Changing things like making a Type Parameterized >>> > > > > PCollection, >>> > > > > > > > would >>> > > > > > > > > > be far more involved, as would trying to use some kind >>> of Apply >>> > > > > > > > format. The >>> > > > > > > > > > lack of Method Overrides prevents the apply chaining >>> approach. >>> > > > > Or at >>> > > > > > > > least >>> > > > > > > > > > prevents it from working simply. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Finally, Go Generics won't be available until Go >>> 1.18, >>> > > > > which isn't >>> > > > > > > > > > until next year. See >>> https://blog.golang.org/generics-proposal >>> > > > > for >>> > > > > > > > > > details. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does >>> include a >>> > > > > Register >>> > > > > > > > > > calling convention, leading to a modest performance >>> improvement >>> > > > > across >>> > > > > > > > the >>> > > > > > > > > > board. >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Cheers, >>> > > > > > > > > > > >> Robert Burke >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw < >>> > > > > [email protected]> >>> > > > > > > > wrote: >>> > > > > > > > > > > >> > +1 to declaring Golang support out of >>> experimental once >>> > > > > the Go >>> > > > > > > > > > Modules >>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs >>> to support >>> > > > > every >>> > > > > > > > > > feature >>> > > > > > > > > > > >> > to be accepted, especially now that we can do >>> > > > > cross-language >>> > > > > > > > > > > >> > transforms, and Go definitely supports enough >>> to be quite >>> > > > > > > > useful. >>> > > > > > > > > > (WRT >>> > > > > > > > > > > >> > streaming, my understanding is that Go supports >>> the >>> > > > > streaming >>> > > > > > > > model >>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on a >>> streaming >>> > > > > > > > runner, >>> > > > > > > > > > even >>> > > > > > > > > > > >> > if more advanced features like state and timers >>> aren't yet >>> > > > > > > > > > available.) >>> > > > > > > > > > > >> > >>> > > > > > > > > > > >> > This is a great milestone. >>> > > > > > > > > > > >> > >>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton >>> < >>> > > > > > > > [email protected]> >>> > > > > > > > > > wrote: >>> > > > > > > > > > > >> > > >>> > > > > > > > > > > >> > > WOW! Big news. >>> > > > > > > > > > > >> > > >>> > > > > > > > > > > >> > > I'm supportive of leaving experimental status >>> after Go >>> > > > > Modules >>> > > > > > > > > > are completed and the LICENSE issue is resolved. I >>> don't think >>> > > > > that >>> > > > > > > > lacking >>> > > > > > > > > > streaming support is a blocker. The other thing I >>> checked to see >>> > > > > was if >>> > > > > > > > > > there were metrics available on >>> metrics.beam.apache.org, >>> > > > > specifically >>> > > > > > > > for >>> > > > > > > > > > measuring code health via post-commit over time, which >>> there are >>> > > > > and >>> > > > > > > > the >>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing that >>> > > > > surprised me >>> > > > > > > > from >>> > > > > > > > > > your summary is that when Go introduces generics it >>> won't result >>> > > > > in any >>> > > > > > > > > > backwards incompatible changes in Apache Beam. That's >>> great >>> > > > > news, but >>> > > > > > > > does >>> > > > > > > > > > it mean there will be a need to support both >>> non-generic and >>> > > > > generic >>> > > > > > > > APIs >>> > > > > > > > > > moving forward? It seems like generics will be >>> introduced in the >>> > > > > Go >>> > > > > > > > 1.17 >>> > > > > > > > > > release (optimistically) in August this year. >>> > > > > > > > > > > >> > > >>> > > > > > > > > > > >> > > >>> > > > > > > > > > > >> > > >>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke < >>> > > > > > > > [email protected]> >>> > > > > > > > > > wrote: >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Hello Beam Community! >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go >>> SDK >>> > > > > > > > experimental. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> This thread is to discuss it as a community, >>> and any >>> > > > > > > > conditions >>> > > > > > > > > > that remain that would prevent the exit. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> tl;dr; >>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I have >>> both. >>> > > > > > > > > > > >> > >> This entails including it officially in the >>> Release >>> > > > > process, >>> > > > > > > > > > removing the various "experimental" text throughout >>> the repo etc, >>> > > > > > > > > > > >> > >> and otherwise treating it like Python and >>> Java. Some Go >>> > > > > > > > specific >>> > > > > > > > > > tasks around dep versioning. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The Go SDK implements the beam model >>> efficiently for >>> > > > > most >>> > > > > > > > batch >>> > > > > > > > > > tasks, including basic windowing. >>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are >>> tested on all >>> > > > > > > > Portable >>> > > > > > > > > > runners. >>> > > > > > > > > > > >> > >> The core APIs are not going to change in >>> incompatible >>> > > > > ways >>> > > > > > > > going >>> > > > > > > > > > forward. >>> > > > > > > > > > > >> > >> Scalable transforms can be written through >>> > > > > SplittableDoFns or >>> > > > > > > > > > via Cross Language transforms. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but >>> keeping it >>> > > > > > > > experimental >>> > > > > > > > > > doesn't help with that any further. >>> > > > > > > > > > > >> > >> Communities grow through contributions and >>> use, and >>> > > > > > > > experimental >>> > > > > > > > > > markers dissuade users. >>> > > > > > > > > > > >> > >> There's plenty to do in order expand what >>> can be done >>> > > > > with >>> > > > > > > > the >>> > > > > > > > > > SDK. (Contributions welcome) >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Why Exit Experimental now? >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Typically when we call an SDK or API >>> Experimental, it's >>> > > > > > > > because >>> > > > > > > > > > there's a risk that API or behaviors may change >>> significantly. >>> > > > > > > > > > > >> > >> This in turn, leads to additional work for >>> users of >>> > > > > the SDK >>> > > > > > > > on >>> > > > > > > > > > every release which leads to sticking to older >>> versions or >>> > > > > forking >>> > > > > > > > > > > >> > >> to preserve behavior. Version updates should >>> be looked >>> > > > > > > > forward >>> > > > > > > > > > to, and viewed as having little risk. Further while >>> there's been >>> > > > > > > > > > > >> > >> previous dicussion about what the "low bar" >>> is for a >>> > > > > new >>> > > > > > > > SDK, it >>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel >>> this has >>> > > > > > > > > > > >> > >> hurt development and contribution of new SDK >>> languages >>> > > > > > > > (inherent >>> > > > > > > > > > difficulty of SDK development notwithstanding). >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't >>> entirely clear >>> > > > > what the >>> > > > > > > > > > Beam Model should look like in an opinionated language >>> like Go. >>> > > > > > > > > > > >> > >> Their initial take (see >>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes >>> into >>> > > > > detail >>> > > > > > > > what it >>> > > > > > > > > > means for a language without >>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance to >>> implement >>> > > > > the >>> > > > > > > > beam >>> > > > > > > > > > model. One could largely throw away static types (like >>> Python), >>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It >>> would not do >>> > > > > if the >>> > > > > > > > > > approach couldn't grow and scale to the Beam Model. >>> It's also >>> > > > > hard >>> > > > > > > > > > > >> > >> to tell if an API is any good before there >>> are users. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Further, in the early days of Portability, >>> there >>> > > > > wasn't a >>> > > > > > > > way to >>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's an >>> > > > > incredible >>> > > > > > > > > > > >> > >> bottleneck to need to do all initial fanout >>> of work on >>> > > > > a >>> > > > > > > > single >>> > > > > > > > > > machine, write everything to a Reshuffle, just in >>> order to scale >>> > > > > up. >>> > > > > > > > > > > >> > >> Without being able to scale, Beam is little >>> more than >>> > > > > > > > overhead. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> At this point, both of these needs are met >>> within the >>> > > > > Go SDK >>> > > > > > > > for >>> > > > > > > > > > open source. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Background >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo >>> for a few >>> > > > > years >>> > > > > > > > now, >>> > > > > > > > > > since it was accidentally merged into master. >>> > > > > > > > > > > >> > >> Since then it's been called experimental, >>> and not >>> > > > > officially >>> > > > > > > > > > part of the releases. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed around >>> Beam >>> > > > > Portability >>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner >>> specific ) >>> > > > > workers. >>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos >>> and FnAPI to >>> > > > > > > > execute >>> > > > > > > > > > jobs, first with some very experimental code on >>> Dataflow, but now >>> > > > > > > > > > > >> > >> on all portable supported runners, like >>> Flink, Spark, >>> > > > > the >>> > > > > > > > Python >>> > > > > > > > > > Portable runner, and Dataflow. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> API Stability >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's >>> user API >>> > > > > for DoFn >>> > > > > > > > > > and pipeline construction since it was first merged >>> in, and >>> > > > > there are >>> > > > > > > > no >>> > > > > > > > > > > >> > >> changes to that on the horizon that can't be >>> made in a >>> > > > > > > > backwards >>> > > > > > > > > > compatible manner. Largely these are related to New >>> Features, or >>> > > > > > > > > > > >> > >> usability improvements enabled by the advent >>> of Go >>> > > > > Generics >>> > > > > > > > > > (think of "real" KV, emitter, and iterator types). >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has >>> largely been >>> > > > > under >>> > > > > > > > work >>> > > > > > > > > > for use within Google. It's use is called FlumeGo, >>> representing >>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of >>> Flume, >>> > > > > Google's >>> > > > > > > > batch >>> > > > > > > > > > pipeline processing engine. Thus most of the focus on >>> improving >>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use >>> today, and >>> > > > > there >>> > > > > > > > hasn't >>> > > > > > > > > > been a call for fundamental changes to the API for >>> ergonomic or >>> > > > > > > > > > > >> > >> usability concerns. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Scalability >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Google could get away without the Go SDK >>> having an SDK >>> > > > > side >>> > > > > > > > > > scalability solution as a result of it's integration >>> with Flume. >>> > > > > > > > > > > >> > >> However, those days are now past. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns >>> along with >>> > > > > Dynamic >>> > > > > > > > > > Splitting, which supports writing scalable batch >>> transforms >>> > > > > natively >>> > > > > > > > > > > >> > >> in the Go SDK. >>> > > > > > > > > > > >> > >> The SDK also supports Cross Language >>> Transforms, with >>> > > > > Beam >>> > > > > > > > > > Schema encodings. With it, production hardened >>> transforms >>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented >>> the SDF >>> > > > > side >>> > > > > > > > work, >>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper for >>> the >>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language >>> Transforms, which >>> > > > > is often >>> > > > > > > > > > been requested. This will also enable use of the Beam >>> SQL >>> > > > > > > > > > > >> > >> transforms that java enables. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Features >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The >>> Go SDK >>> > > > > implements >>> > > > > > > > > > standard coders, allows for user DoFns, and CombineFns >>> and access >>> > > > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey, >>> and >>> > > > > features >>> > > > > > > > like >>> > > > > > > > > > Side Inputs, Windowing, and User Metrics. >>> > > > > > > > > > > >> > >> Basic windowing will be fully supported for >>> batch even >>> > > > > > > > through >>> > > > > > > > > > lifted combines in the 2.32.0 release. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be >>> versatile for >>> > > > > batch >>> > > > > > > > > > execution on portable runners, and for simple streaming >>> > > > > pipelines. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Repo Testing >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit >>> tests. On >>> > > > > top of >>> > > > > > > > > > that, it runs all it's integration tests against the >>> Python >>> > > > > Portable >>> > > > > > > > runner, >>> > > > > > > > > > > >> > >> making it quick and robust to detect >>> breaking changes >>> > > > > without >>> > > > > > > > > > overspending community resources. Those same tests are >>> also >>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The tests are executable against all runners >>> via the >>> > > > > > > > appropriate >>> > > > > > > > > > Go commands (if you've stood up your own job >>> management server), >>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up runner >>> > > > > instances for >>> > > > > > > > > > you). Documentation for executing tests and adding new >>> ones >>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go >>> > > > > developers as >>> > > > > > > > > > they're implemented with the standard Go testing tools. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Shortcomings >>> > > > > > > > > > > >> > >> That said, there's still much to do. Let me >>> briefly >>> > > > > tell you >>> > > > > > > > > > what doesn't work, and it's up to you to weigh whether >>> they block >>> > > > > > > > > > > >> > >> being out of experimental. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> At present, only a textio has been >>> implemented as >>> > > > > Splittable >>> > > > > > > > > > DoFn. >>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will >>> serve as >>> > > > > a the >>> > > > > > > > > > first example for future contributions for >>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK. >>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this >>> point >>> > > > > users are >>> > > > > > > > > > empowered to write their own DoFns or wrap existing >>> transforms >>> > > > > for >>> > > > > > > > Cross >>> > > > > > > > > > Language use. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused >>> features have >>> > > > > yet to >>> > > > > > > > be >>> > > > > > > > > > implemented, but they're largely additions to what >>> exists already >>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the work >>> is >>> > > > > definining >>> > > > > > > > how a >>> > > > > > > > > > user specifies their desires, and turning those into >>> the >>> > > > > appropriate >>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in >>> October I >>> > > > > wrote at >>> > > > > > > > > > length on the wiki [1] what's missing for additional >>> streaming >>> > > > > > > > features. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> While we have bolstered our testing >>> recently, there's >>> > > > > likely >>> > > > > > > > > > still more we could test to improve our confidence in >>> the SDK, >>> > > > > > > > > > > >> > >> in particular regarding the included >>> transforms >>> > > > > libraries and >>> > > > > > > > > > examples. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Moving Forward >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> My immediate plan is to work on >>> incorporating the Go >>> > > > > SDK >>> > > > > > > > fully >>> > > > > > > > > > into the Beam Programming Guide. I've audited the >>> guide [3], and >>> > > > > > > > > > > >> > >> am beginning to add missing content and >>> filling in the >>> > > > > Go >>> > > > > > > > > > specific gaps. This will be tied to improving the Go >>> Doc with >>> > > > > more Go >>> > > > > > > > > > > >> > >> specific user documentation that isn't >>> appropriate for >>> > > > > the >>> > > > > > > > BPG. >>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the >>> public >>> > > > > display of >>> > > > > > > > > > that GoDoc. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding >>> vote, I will >>> > > > > > > > > > incorporate the SDK into the release process, and >>> remove the >>> > > > > > > > "experimental" >>> > > > > > > > > > > >> > >> language around the SDK. This largely >>> entails updating >>> > > > > the >>> > > > > > > > > > release scripts to also build and publish the Go SDK >>> Docker >>> > > > > containers. >>> > > > > > > > > > > >> > >> As for releasing the code, we're technically >>> already >>> > > > > doing so >>> > > > > > > > > > whenever we tag a release branch [4]. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> The clearest signal to the Go community >>> however will be >>> > > > > > > > > > migrating the SDK to use Go Modules for dependency >>> version >>> > > > > control, >>> > > > > > > > > > > >> > >> which Daniel is planning on working on after >>> his Kafka >>> > > > > task. >>> > > > > > > > > > This will put our repo infrastructure, SDK >>> contributors, and >>> > > > > users >>> > > > > > > > > > > >> > >> on the same footing when it comes to >>> dependency >>> > > > > management. >>> > > > > > > > It >>> > > > > > > > > > will remove the "+incompatible" tags one sees on the >>> > > > > > > > > > > >> > >> pkg.go.dev list at [4]. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you >>> might have >>> > > > > about >>> > > > > > > > the >>> > > > > > > > > > SDK, and provide additional links as needed. I >>> intentionally >>> > > > > avoided >>> > > > > > > > > > > >> > >> a link barrage in this email, as they can >>> distract >>> > > > > from the >>> > > > > > > > > > point: The SDK is ready for folks to use it, we need >>> to tell >>> > > > > them that >>> > > > > > > > they >>> > > > > > > > > > can >>> > > > > > > > > > > >> > >> rather than they shouldn't. >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> Robert Burke >>> > > > > > > > > > > >> > >> Defacto Beam Go TL >>> > > > > > > > > > > >> > >> >>> > > > > > > > > > > >> > >> [0] >>> https://s.apache.org/beam-go-sdk-design-rfc >>> > > > > > > > > > > >> > >> [1] >>> > > > > > > > > > >>> > > > > > > > >>> > > > > >>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK >>> > > > > > > > > > > >> > >> [2] >>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips >>> > > > > > > > > > > >> > >> [3] >>> > > > > > > > > > >>> > > > > > > > >>> > > > > >>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 >>> > > > > > > > > > (SDK Audit sheet) >>> > > > > > > > > > > >> > >> [4] >>> > > > > > > > > > >>> > > > > > > > >>> > > > > >>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions >>> > > > > > > > > > > >> > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> >>
