Regarding documentation update: Initial PR is 
https://github.com/apache/beam/pull/15057 which goes up to section ~4.3. JIRA 
link for Programing Guide changes: 
https://issues.apache.org/jira/browse/BEAM-12513


On 2021/06/17 14:58:54, Robert Burke <[email protected]> wrote: 
> Yup!
> 
> My immediate plan is to work on incorporating the Go SDK fully into the
> Beam Programming Guide. I've audited the guide, and
> am beginning to add missing content and filling in the Go specific gaps.
> This will be tied to improving the Go Doc with more Go
> specific user documentation that isn't appropriate for the BPG.
> 
> My audit of the guide is here:
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> 
> The other sheets focus on features and tests. The feature page looks worse
> than it is, as it was more productive to focus on what isn't available than
> what is. That's a snapshot of my actual working sheet but I'll be updating
> it as needed.
> 
> On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <[email protected]> wrote:
> 
> > Oups forgot to write one question. Will this come with revamped
> > website instructions/doc for golang too?
> >
> > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <[email protected]> wrote:
> > >
> > > Huge +1
> > >
> > > This is definitely something many people have asked about, so it is
> > > great to see it finally happening.
> > >
> > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <[email protected]> wrote:
> > > >
> > > > +1 awesome
> > > >
> > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <[email protected]>
> > wrote:
> > > >>
> > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go modules
> > and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33
> > cut if release images aren't added to the 2.32 process.
> > > >>
> > > >> Regarding Go Generics: at some point in the future, we may want a
> > harder break between a newer Generic first API and and the current version,
> > but there's no rush. Generics/TypeParameters in Go aren't identical to the
> > feature referred to by that term in Java, C++, Rust, etc, so it'll take a
> > bit of time for that expertise to develop.
> > > >>
> > > >> However, by the current nature of Go, we had to have pretty
> > sophisticated reflective analysis to handle DoFns and map them to their
> > graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> > types, shouldn't be too difficult. Changing Go SDK internals to use
> > generics (like the implementation of Stats DoFns like Min, Max, etc) would
> > also be able to be made transparently to most users, and certainly any of
> > the framework for execution time handling (the "worker's SDK harness")
> > would be able to be cleaned up if need be. Finally, adding more
> > sophisticated DoFn registration and code generation would be able to
> > replace the optional code generator entirely, saving some users a `go
> > generate` step, simplifying getting improved execution performance.
> > > >>
> > > >> Changing things like making a Type Parameterized PCollection, would
> > be far more involved, as would trying to use some kind of Apply format. The
> > lack of Method Overrides prevents the apply chaining approach. Or at least
> > prevents it from working simply.
> > > >>
> > > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> > until next year. See https://blog.golang.org/generics-proposal for
> > details.
> > > >>
> > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> > calling convention, leading to a modest performance improvement across the
> > board.
> > > >>
> > > >> Cheers,
> > > >> Robert Burke
> > > >>
> > > >> On 2021/06/15 18:10:46, Robert Bradshaw <[email protected]> wrote:
> > > >> > +1 to declaring Golang support out of experimental once the Go
> > Modules
> > > >> > issues are solved. I don't think an SDK needs to support every
> > feature
> > > >> > to be accepted, especially now that we can do cross-language
> > > >> > transforms, and Go definitely supports enough to be quite useful.
> > (WRT
> > > >> > streaming, my understanding is that Go supports the streaming model
> > > >> > with windows and timestamps, and runs fine on a streaming runner,
> > even
> > > >> > if more advanced features like state and timers aren't yet
> > available.)
> > > >> >
> > > >> > This is a great milestone.
> > > >> >
> > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]>
> > wrote:
> > > >> > >
> > > >> > > WOW! Big news.
> > > >> > >
> > > >> > > I'm supportive of leaving experimental status after Go Modules
> > are completed and the LICENSE issue is resolved. I don't think that lacking
> > streaming support is a blocker. The other thing I checked to see was if
> > there were metrics available on metrics.beam.apache.org, specifically for
> > measuring code health via post-commit over time, which there are and the
> > passing test rate is high (Huzzah!). The one thing that surprised me from
> > your summary is that when Go introduces generics it won't result in any
> > backwards incompatible changes in Apache Beam. That's great news, but does
> > it mean there will be a need to support both non-generic and generic APIs
> > moving forward? It seems like generics will be introduced in the Go 1.17
> > release (optimistically) in August this year.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]>
> > wrote:
> > > >> > >>
> > > >> > >> Hello Beam Community!
> > > >> > >>
> > > >> > >> I propose we stop calling the Apache Beam Go SDK experimental.
> > > >> > >>
> > > >> > >> This thread is to discuss it as a community, and any conditions
> > that remain that would prevent the exit.
> > > >> > >>
> > > >> > >> tl;dr;
> > > >> > >> Ask Questions for answers and links! I have both.
> > > >> > >> This entails including it officially in the Release process,
> > removing the various "experimental" text throughout the repo etc,
> > > >> > >> and otherwise treating it like Python and Java. Some Go specific
> > tasks around dep versioning.
> > > >> > >>
> > > >> > >> The Go SDK implements the beam model efficiently for most batch
> > tasks, including basic windowing.
> > > >> > >> Apache Beam Go jobs can execute, and are tested on all Portable
> > runners.
> > > >> > >> The core APIs are not going to change in incompatible ways going
> > forward.
> > > >> > >> Scalable transforms can be written through SplittableDoFns or
> > via Cross Language transforms.
> > > >> > >>
> > > >> > >> The SDK isn't 100% feature complete, but keeping it experimental
> > doesn't help with that any further.
> > > >> > >> Communities grow through contributions and use, and experimental
> > markers dissuade users.
> > > >> > >> There's plenty to do in order expand what can be done with the
> > SDK. (Contributions welcome)
> > > >> > >>
> > > >> > >> Why Exit Experimental now?
> > > >> > >>
> > > >> > >> Typically when we call an SDK or API Experimental, it's because
> > there's a risk that API or behaviors may change significantly.
> > > >> > >> This in turn, leads to additional work for users of the SDK on
> > every release which leads to sticking to older versions or forking
> > > >> > >> to preserve behavior. Version updates should be looked forward
> > to, and viewed as having little risk. Further while there's been
> > > >> > >> previous dicussion about what the "low bar" is for a new SDK, it
> > hasn't been summarily applied to the Go SDK. I feel this has
> > > >> > >> hurt development and contribution of new SDK languages (inherent
> > difficulty of SDK development notwithstanding).
> > > >> > >>
> > > >> > >> When the SDK was designed, it wasn't entirely clear what the
> > Beam Model should look like in an opinionated language like Go.
> > > >> > >> Their initial take (see
> > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it
> > means for a language without
> > > >> > >> Generics, or overloading, or inheritance to implement the beam
> > model. One could largely throw away static types (like Python),
> > > >> > >> but this approach rings hollow for Go. It would not do if the
> > approach couldn't grow and scale to the Beam Model. It's also hard
> > > >> > >> to tell if an API is any good before there are users.
> > > >> > >>
> > > >> > >> Further, in the early days of Portability, there wasn't a way to
> > write scalable DoFns, dynamically or otherwise. It's an incredible
> > > >> > >> bottleneck to need to do all initial fanout of work on a single
> > machine, write everything to a Reshuffle, just in order to scale up.
> > > >> > >> Without being able to scale, Beam is little more than overhead.
> > > >> > >>
> > > >> > >> At this point, both of these needs are met within the Go SDK for
> > open source.
> > > >> > >>
> > > >> > >> Background
> > > >> > >>
> > > >> > >> The Go SDK has been a part of the beam repo for a few years now,
> > since it was accidentally merged into master.
> > > >> > >> Since then it's been called experimental, and not officially
> > part of the releases.
> > > >> > >>
> > > >> > >> Of the SDKs, it's was always designed around Beam Portability
> > first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > > >> > >> It's always used the Beam Pipeline protos and FnAPI to execute
> > jobs, first with some very experimental code on Dataflow, but now
> > > >> > >> on all portable supported runners, like Flink, Spark, the Python
> > Portable runner, and Dataflow.
> > > >> > >>
> > > >> > >> API Stability
> > > >> > >>
> > > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> > and pipeline construction since it was first merged in, and there are no
> > > >> > >> changes to that on the horizon that can't be made in a backwards
> > compatible manner. Largely these are related to New Features, or
> > > >> > >> usability improvements enabled by the advent of Go Generics
> > (think of "real" KV, emitter, and iterator types).
> > > >> > >>
> > > >> > >> It's an open secret that the Go SDK has largely been under work
> > for use within Google. It's use is called FlumeGo, representing
> > > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch
> > pipeline processing engine. Thus most of the focus on improving
> > > >> > >> batch execution. FlumeGo sees ample use today, and there hasn't
> > been a call for fundamental changes to the API for ergonomic or
> > > >> > >> usability concerns.
> > > >> > >>
> > > >> > >> Scalability
> > > >> > >>
> > > >> > >> Google could get away without the Go SDK having an SDK side
> > scalability solution as a result of it's integration with Flume.
> > > >> > >> However, those days are now past.
> > > >> > >>
> > > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> > Splitting, which supports writing scalable batch transforms natively
> > > >> > >> in the Go SDK.
> > > >> > >> The SDK also supports Cross Language Transforms, with Beam
> > Schema encodings. With it, production hardened transforms
> > > >> > >> from Java and Python are a wrapper away.
> > > >> > >>
> > > >> > >> Presently, Daniel Oliveira (who implemented the SDF side work,
> > and completed the Xlang work,) is adding a wrapper for the
> > > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> > been requested. This will also enable use of the Beam SQL
> > > >> > >> transforms that java enables.
> > > >> > >>
> > > >> > >> Features
> > > >> > >>
> > > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> > standard coders, allows for user DoFns, and CombineFns and access
> > > >> > >> to core transforms like Flatten, GroupByKey, and features like
> > Side Inputs, Windowing, and User Metrics.
> > > >> > >> Basic windowing will be fully supported for batch even through
> > lifted combines in the 2.32.0 release.
> > > >> > >>
> > > >> > >> All of the above enables Beam Go to be versatile for batch
> > execution on portable runners, and for simple streaming pipelines.
> > > >> > >>
> > > >> > >> Repo Testing
> > > >> > >>
> > > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> > that, it runs all it's integration tests against the Python Portable runner,
> > > >> > >> making it quick and robust to detect breaking changes without
> > overspending community resources. Those same tests are also
> > > >> > >> run against Dataflow, Flink, and Spark.
> > > >> > >>
> > > >> > >> The tests are executable against all runners via the appropriate
> > Go commands (if you've stood up your own job management server),
> > > >> > >> or Gradle commands (which will spin up runner instances for
> > you). Documentation for executing tests and adding new ones
> > > >> > >> is on the wiki. [2] They are accessible to Go developers as
> > they're implemented with the standard Go testing tools.
> > > >> > >>
> > > >> > >> Shortcomings
> > > >> > >> That said, there's still much to do. Let me briefly tell you
> > what doesn't work, and it's up to you to weigh whether they block
> > > >> > >> being out of experimental.
> > > >> > >>
> > > >> > >> At present, only a textio has been implemented as Splittable
> > DoFn.
> > > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> > first example for future contributions for
> > > >> > >> new transform wrappers for the Go SDK.
> > > >> > >> Transforms and IOs are lacking, but at this point users are
> > empowered to write their own DoFns or wrap existing transforms for Cross
> > Language use.
> > > >> > >>
> > > >> > >> In the core SDK, more streaming focused features have yet to be
> > implemented, but they're largely additions to what exists already
> > > >> > >> rather than total rebuilds. Much of the work is definining how a
> > user specifies their desires, and turning those into the appropriate
> > > >> > >> FnAPI requests at execution time. Back in October I wrote at
> > length on the wiki [1] what's missing for additional streaming features.
> > > >> > >>
> > > >> > >> While we have bolstered our testing recently, there's likely
> > still more we could test to improve our confidence in the SDK,
> > > >> > >> in particular regarding the included transforms libraries and
> > examples.
> > > >> > >>
> > > >> > >> Moving Forward
> > > >> > >>
> > > >> > >> My immediate plan is to work on incorporating the Go SDK fully
> > into the Beam Programming Guide. I've audited the guide [3], and
> > > >> > >> am beginning to add missing content and filling in the Go
> > specific gaps. This will be tied to improving the Go Doc with more Go
> > > >> > >> specific user documentation that isn't appropriate for the BPG.
> > > >> > >> And resolving the LICENSE issue around the public display of
> > that GoDoc.
> > > >> > >>
> > > >> > >> If this proposal is accepted by a binding vote, I will
> > incorporate the SDK into the release process, and remove the "experimental"
> > > >> > >> language around the SDK. This largely entails updating the
> > release scripts to also build and publish the Go SDK Docker containers.
> > > >> > >> As for releasing the code, we're technically already doing so
> > whenever we tag a release branch [4].
> > > >> > >>
> > > >> > >> The clearest signal to the Go community however will be
> > migrating the SDK to use Go Modules for dependency version control,
> > > >> > >> which Daniel is planning on working on after his Kafka task.
> > This will put our repo infrastructure, SDK contributors, and users
> > > >> > >> on the same footing when it comes to dependency management. It
> > will remove the "+incompatible" tags one sees on the
> > > >> > >> pkg.go.dev list at [4].
> > > >> > >>
> > > >> > >> I'm very happy to answer any questions you might have about the
> > SDK, and provide additional links as needed. I intentionally avoided
> > > >> > >> a link barrage in this email, as they can distract from the
> > point: The SDK is ready for folks to use it, we need to tell them that they
> > can
> > > >> > >> rather than they shouldn't.
> > > >> > >>
> > > >> > >> Robert Burke
> > > >> > >> Defacto Beam Go TL
> > > >> > >>
> > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > >> > >> [1]
> > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > >> > >> [3]
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > (SDK Audit sheet)
> > > >> > >> [4]
> > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > >> >
> >
> 

Reply via email to