Updating our documentation makes sense.

The backwards compat discussion is an interesting read. One of the points
that they mention is that they like Spark users to be on the latest Spark.
I can say that this is also true for Dataflow where we want users to be on
the latest version of Beam. In Beam, I have seen that backwards
compatibility is hard because the APIs that users use to construct their
pipeline and what their functions use when the pipeline is executing reach
into the internals of Beam and/or runners and I was wondering whether Spark
was hitting the same issues in this regard?

With portability and the no knobs philosophy, I can see that we should be
able to relax which version of a runner is being used a lot more from what
version of Beam is being used so we might want to go in a different
direction then what was proposed in the Spark thread as well since we may
be able to achieve a greater level of decoupling.


On Thu, May 28, 2020 at 9:18 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> I am surprised that we are claiming in the Beam website to use semantic
> versioning (semver) [1] in Beam [2]. We have NEVER really followed semantic
> versioning and we have broken multiple times both internal and external
> APIs (at
> least for Java) as you can find in this analysis of source and binary
> compatibility between beam versions that I did for ‘sdks/java/core’ two
> months
> ago in the following link:
>
>
> https://cloudflare-ipfs.com/ipfs/QmQSkWYmzerpUjT7fhE9CF7M9hm2uvJXNpXi58mS8RKcNi/
>
> This report was produced by running the following script that excludes both
> @Experimental and @Internal annotations as well as many internal packages
> like
> ‘sdk/util/’, ‘transforms/reflect/’ and ‘sdk/testing/’ among others, for
> more
> details on the exclusions refer to this script code:
>
> https://gist.github.com/iemejia/5277fc269c63c4e49f1bb065454a895e
>
> Respecting semantic versioning is REALLY HARD and a strong compromise that
> may
> bring both positive and negative impact to the project, as usual it is all
> about
> trade-offs. Semver requires tooling that we do not have yet in place to
> find
> regressions before releases to fix them (or to augment major versions to
> respect
> the semver contract). We as a polyglot project need these tools for every
> supported language, and since all our languages live in the same
> repository and
> are released simultaneously an incompatible change in one language may
> trigger a
> full new major version number for the whole project which does not look
> like a
> desirable outcome.
>
> For these reasons I think we should soften the claim of using semantic
> versioning claim and producing our own Beam semantic versioning policy
> that is
> consistent with our reality where we can also highlight the lack of
> guarantees
> for code marked as @Internal and @Experimental as well as for some modules
> where
> we may be interested on still having the freedom of not guaranteeing
> stability
> like runners/core* or any class in the different runners that is not a
> PipelineOptions one.
>
> In general whatever we decide we should probably not be as strict but
> consider
> in detail the tradeoffs of the policy. There is an ongoing discussion on
> versioning in the Apache Spark community that is really worth the read and
> proposes an analysis between Costs to break and API vs costs to maintain
> an API
> [3]. I think we can use it as an inspiration for an initial version.
>
> WDYT?
>
> [1] https://semver.org/
> [2] https://beam.apache.org/get-started/downloads/
> [3]
> https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E
>
>
> On Thu, May 28, 2020 at 4:36 PM Reuven Lax <re...@google.com> wrote:
>
>> Most of those items are either in APIs marked @Experimental (the
>> definition of Experimental in Beam is that we can make breaking changes to
>> the API) or are changes in a specific runner - not the Beam API.
>>
>> Reuven
>>
>> On Thu, May 28, 2020 at 7:19 AM Ashwin Ramaswami <aramaswa...@gmail.com>
>> wrote:
>>
>>> There's a "Breaking Changes" section on this blogpost:
>>> https://beam.apache.org/blog/beam-2.21.0/ (and really, for earlier
>>> minor versions too)
>>>
>>> Ashwin Ramaswami
>>> Student
>>> *Find me on my:* LinkedIn <https://www.linkedin.com/in/ashwin-r> |
>>> Website <https://epicfaace.github.io/> | GitHub
>>> <https://github.com/epicfaace>
>>>
>>>
>>> On Thu, May 28, 2020 at 10:01 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> What did we break?
>>>>
>>>> On Thu, May 28, 2020, 6:31 AM Ashwin Ramaswami <aramaswa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Do we really use semantic versioning? It appears we introduced
>>>>> breaking changes from 2.20.0 -> 2.21.0. If not, we should update the
>>>>> documentation under "API Stability" on this page:
>>>>> https://beam.apache.org/get-started/downloads/
>>>>>
>>>>> What would be a better way to word the way in which we decide version
>>>>> numbering?
>>>>>
>>>>

Reply via email to