Did we end up updating the documentation (in [1] and elsewhere)? There was a (not major but still) breaking change in python related to typehints in 2.24.0 release [2].
/cc +Udi Meiri <[email protected]> [1] https://beam.apache.org/get-started/downloads/ [2] https://github.com/apache/beam/pull/12745/commits/222cd448fe0262fcc5557186b58013ec7bf26622 On Thu, Jun 4, 2020 at 5:07 PM Robert Bradshaw <[email protected]> wrote: > That tool is looks great; we should use it more often! (In fact, there's a > pending RC right now :). It looks like we don't generally do too bad, but > can help prevent accidental slippage. > > As for whether we should provide semantic versioning, that's a really > difficult question. I don't think Beam is at a point that we can or should > provide 100% semantic versioning, but exceptions should be few and far > between, and hopefully appropriately called out. Technically bugfixes can > be "backwards incompatible" and I think discouraging/disallowing dangerous > (or unexpected) behavior can also be fair game. @Experimental annotations > can be useful, but when something has that label for years I think it > loses its meaning. > > If we're claiming strict semantic versioning, we should update that, or at > least add some caveats. > > > On Fri, May 29, 2020 at 9:57 PM Kenneth Knowles <[email protected]> wrote: > >> Ismaël, this is awesome!! Can we incorporate it into our processes? No >> matter what our policy, this is great data. (FWIW also we hit 100% >> compatibility a lot of the time... so there! :-p) >> >> But anyhow I completely agree that strict semver is too hard and not >> valuable enough. I think we should use semver "in spirit". We "basically" >> and "mostly" and "most reasonably" expect to not break users when they >> upgrade minor versions. And patch versions are irrelevant but if we >> released one, they "should" also be reversible without breakage. >> >> To add my own spin on Ismaël's point about treating this as a tradeoff, >> backwards compatibility is always more vague than it seems at first, and >> usually prioritized wrong IMO. >> >> - Breaking someone's compile is usually easy to detect (hence can be >> automated) but also usually easy to fix so the user burden is minimal >> except when you are a diamond dependency (and then semver doesn't help, >> because transitive deps will not have compatible major versions... see >> Guava or Protobuf) >> - Breaking someone via a runtime error is a bigger deal. It happens >> often, of course, and we fix it pretty urgently. >> - Breaking someone via a performance degradation is just as broken, but >> hard to repro if it isn't widespread. We may never fix them. >> - Breaking someone via giving a wrong answer is the worst possible >> breakage. It is silent, looks fine, and simply breaks their results. >> - Breaking someone via upgrading a transitive dep where there is a >> backwards-incompatible change somewhere deep is often unfixable. >> - Breaking someone via turning down a version of the client library. >> Cloud providers do this because they cannot afford not to. Services have >> shorter lifespans than shipped libraries (notably the opposite for >> build-every-time libraries). >> - Breaking someone via keeping transitive deps stable, hence transitive >> services turn down versions of their client library. This is a direct >> conflict between "shipped library" style compatibility guarantees and >> "service" style compatibility guarantees. >> - Breaking someone via some field they found via reflection, or >> sdk/util, or context.stateInternals, or Python where private stuff sort of >> exist but not really, and <insert other language's encapsulation >> limitations>. Really, only having very strict documentation can define a >> supported API surface, and we are not even close to having this. And even >> if we had it, IDEs would autocomplete things we don't want them to. This is >> pretty hard. >> >> Examples where I am not that happy with how Beam went: >> >> - https://s.apache.org/finishing-triggers-drop-data: Because it was a >> construction-time breaking change, this was pushed back for years, even >> though leaving it in caused data loss. (based on the StackOverflow and >> email volume for this, plenty of users had finishing triggers) >> - https://issues.apache.org/jira/browse/BEAM-6906: mutable accumulators >> are unsafe to use unless cloned, given our CombineFn API and standard >> fusion techniques. We introduced a new API and documented that users should >> probably use it, leaving the data loss risk in place to avoid a compile >> time breaking change. It is still there. >> - Beam 2.0.0: we waited a long time to have a first "stable" release, >> and rushed to make all the breaking changes, because we were going to >> freeze everything forever. It is bad to wait for a long time and also bad >> to rush in the breakages. >> - Beam 0.x: we only get to do it once. That is a mistake. Between 2.x.y >> and 3.0.0 you need a period to mature the breaking APIs. You need a "0.x" >> period in between each major version. Our @Experimental tag is one idea. >> Another is setting up an LTS and making breaking changes in between. LTS >> would ideally be a _major_ version, not a minor version. Linux alternating >> versions was an interesting take. (caveat: my experience is as a Linux >> 2.4/2.6 user while 2.7 was in development, and they may have changed >> everything since then). >> >> All of this will feed into a renewed Beam 3 brainstorm at some point. But >> now is actually the wrong time for all of that. Right now, even more than >> usual, what people need is stability and reliability in every form >> mentioned. We / our users don't have a lot of surplus capacity for making >> surprise urgent fixes. It would be great to focus entirely on testing and >> additional static analyses, etc. I think temporarily going super strict on >> semver, using tools to ensure it, would serve our users well. >> >> Kenn >> >> On Thu, May 28, 2020 at 11:28 AM Luke Cwik <[email protected]> wrote: >> >>> Updating our documentation makes sense. >>> >>> The backwards compat discussion is an interesting read. One of the >>> points that they mention is that they like Spark users to be on the latest >>> Spark. I can say that this is also true for Dataflow where we want users to >>> be on the latest version of Beam. In Beam, I have seen that backwards >>> compatibility is hard because the APIs that users use to construct their >>> pipeline and what their functions use when the pipeline is executing reach >>> into the internals of Beam and/or runners and I was wondering whether Spark >>> was hitting the same issues in this regard? >>> >>> With portability and the no knobs philosophy, I can see that we should >>> be able to relax which version of a runner is being used a lot more from >>> what version of Beam is being used so we might want to go in a different >>> direction then what was proposed in the Spark thread as well since we may >>> be able to achieve a greater level of decoupling. >>> >>> >>> On Thu, May 28, 2020 at 9:18 AM Ismaël Mejía <[email protected]> wrote: >>> >>>> I am surprised that we are claiming in the Beam website to use semantic >>>> versioning (semver) [1] in Beam [2]. We have NEVER really followed >>>> semantic >>>> versioning and we have broken multiple times both internal and external >>>> APIs (at >>>> least for Java) as you can find in this analysis of source and binary >>>> compatibility between beam versions that I did for ‘sdks/java/core’ two >>>> months >>>> ago in the following link: >>>> >>>> >>>> https://cloudflare-ipfs.com/ipfs/QmQSkWYmzerpUjT7fhE9CF7M9hm2uvJXNpXi58mS8RKcNi/ >>>> >>>> This report was produced by running the following script that excludes >>>> both >>>> @Experimental and @Internal annotations as well as many internal >>>> packages like >>>> ‘sdk/util/’, ‘transforms/reflect/’ and ‘sdk/testing/’ among others, for >>>> more >>>> details on the exclusions refer to this script code: >>>> >>>> https://gist.github.com/iemejia/5277fc269c63c4e49f1bb065454a895e >>>> >>>> Respecting semantic versioning is REALLY HARD and a strong compromise >>>> that may >>>> bring both positive and negative impact to the project, as usual it is >>>> all about >>>> trade-offs. Semver requires tooling that we do not have yet in place to >>>> find >>>> regressions before releases to fix them (or to augment major versions >>>> to respect >>>> the semver contract). We as a polyglot project need these tools for >>>> every >>>> supported language, and since all our languages live in the same >>>> repository and >>>> are released simultaneously an incompatible change in one language may >>>> trigger a >>>> full new major version number for the whole project which does not look >>>> like a >>>> desirable outcome. >>>> >>>> For these reasons I think we should soften the claim of using semantic >>>> versioning claim and producing our own Beam semantic versioning policy >>>> that is >>>> consistent with our reality where we can also highlight the lack of >>>> guarantees >>>> for code marked as @Internal and @Experimental as well as for some >>>> modules where >>>> we may be interested on still having the freedom of not guaranteeing >>>> stability >>>> like runners/core* or any class in the different runners that is not a >>>> PipelineOptions one. >>>> >>>> In general whatever we decide we should probably not be as strict but >>>> consider >>>> in detail the tradeoffs of the policy. There is an ongoing discussion on >>>> versioning in the Apache Spark community that is really worth the read >>>> and >>>> proposes an analysis between Costs to break and API vs costs to >>>> maintain an API >>>> [3]. I think we can use it as an inspiration for an initial version. >>>> >>>> WDYT? >>>> >>>> [1] https://semver.org/ >>>> [2] https://beam.apache.org/get-started/downloads/ >>>> [3] >>>> https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E >>>> >>>> >>>> On Thu, May 28, 2020 at 4:36 PM Reuven Lax <[email protected]> wrote: >>>> >>>>> Most of those items are either in APIs marked @Experimental (the >>>>> definition of Experimental in Beam is that we can make breaking changes to >>>>> the API) or are changes in a specific runner - not the Beam API. >>>>> >>>>> Reuven >>>>> >>>>> On Thu, May 28, 2020 at 7:19 AM Ashwin Ramaswami < >>>>> [email protected]> wrote: >>>>> >>>>>> There's a "Breaking Changes" section on this blogpost: >>>>>> https://beam.apache.org/blog/beam-2.21.0/ (and really, for earlier >>>>>> minor versions too) >>>>>> >>>>>> Ashwin Ramaswami >>>>>> Student >>>>>> *Find me on my:* LinkedIn <https://www.linkedin.com/in/ashwin-r> | >>>>>> Website <https://epicfaace.github.io/> | GitHub >>>>>> <https://github.com/epicfaace> >>>>>> >>>>>> >>>>>> On Thu, May 28, 2020 at 10:01 AM Reuven Lax <[email protected]> wrote: >>>>>> >>>>>>> What did we break? >>>>>>> >>>>>>> On Thu, May 28, 2020, 6:31 AM Ashwin Ramaswami < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Do we really use semantic versioning? It appears we introduced >>>>>>>> breaking changes from 2.20.0 -> 2.21.0. If not, we should update the >>>>>>>> documentation under "API Stability" on this page: >>>>>>>> https://beam.apache.org/get-started/downloads/ >>>>>>>> >>>>>>>> What would be a better way to word the way in which we decide >>>>>>>> version numbering? >>>>>>>> >>>>>>>
