I strongly believe that we should continue to have Beam optimize for the user - and while having separate components would allow those of us who are contributors and committers move faster, the downsides of not having everything "in one box" for a new user where the components are all relatively guaranteed to work together at that version level are very high.
Beam having everything included is absolutely a competitive advantage for Beam and I would not want to lose that. On Wed, Dec 14, 2022 at 9:31 AM Byron Ellis via dev <dev@beam.apache.org> wrote: > Talk it with a grain of salt since I'm not even a committer, but is > perhaps the reorganization of Beam into smaller components the real work of > a 3.0 effort? Splitting of Beam into smaller more independently managed > components would be a pretty huge breaking change from a dependency > management perspective which would potentially be largely separate from any > code changes. > > Best, > B > > On Wed, Dec 14, 2022 at 9:23 AM Alexey Romanenko <aromanenko....@gmail.com> > wrote: > >> On 12 Dec 2022, at 22:23, Robert Bradshaw via dev <dev@beam.apache.org> >> wrote: >> >> >> Saving up all the breaking changes until a major release definitely >> has its downsides (look at Python 3). The migration path is often as >> important (if not more so) than the final destination. >> >> >> Actually, it proves that the major releases *should not* be delayed for >> a long period of time and *should* be issued more often to reduce the >> number of breaking changes (that, of course, likely may happen). That will >> help users to do much more smooth and less risky upgrades, and developers >> to not keep burden forever. Beam 2.0.0 was released back in may 2017 and >> we've almost never talked about Beam 3.0 and what are the criteria for it. >> I understand that it’s a completely different discussion but seems that >> this time has come =) >> >> As for this particular change, I would question how the benefit (it's >> unclear what the exact benefit is--better internal organization?) >> exceeds the pain of making every user refactor their code. I think a >> stronger case can be made for things like the Avro dependency that >> cause real pain. >> >> >> Agree. I think that if it doesn’t bring any pain with additional external >> dependecies and this code is used in almost every other SDK module, then >> there are no reasons for such breaking changes. On the other hand, Avro >> case, that you mentioned above, is a good example why sometimes it would be >> better to keep such code outside of “core”. >> >> As for the pipeline update feature, we've long discussed having >> "pick-your-implementation" transforms that specify alternative, >> equivalent implementations. Upgrades can choose the old one whereas >> new pipelines can get the latest and greatest. It won't solve all >> issues, and requires keeping old codepaths around, but could be an >> important step forward. >> >> On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles <k...@apache.org> wrote: >> >> >> I agree with Mortiz. To answer a few specifics in my own words: >> >> - It is a perfectly sensible refactor, but as a counterpoint without >> file-based IO the SDK isn't functional so it is also a reasonable design >> point to have this included. There are other things in the core SDK that >> are far less "core" and could be moved out with greater benefit. The main >> goal for any separation of modules would be lighter weight transitive >> dependencies, IMO. >> >> - No, Beam has not made any deliberate breaking changes of this nature. >> Hence we are still on major version 2. We have made some bugfixes for data >> loss risks that could be called "breaking changes" but since the feature >> was unsafe to use in the first place we did not bump the major version. >> >> - It is sometimes possible to do such a refactor and have the deprecated >> location proxy to the new location. In this case that seems hard to achieve. >> >> - It is not actually necessary to maintain both locations, as we can >> declare the old location will be unmaintained (but left alone) and all new >> development goes to the new location. That isn't a great choice for users >> who may simply upgrade their SDK version and not notice that their old code >> is now pointing at a version that will not receive e.g. security updates. >> >> - I like the style where if/when we transition from Beam 2 to Beam 3 we >> should have the exact functionality of Beam 3 available as an opt-in flag >> first. So if a user passes --beam-3 they get exactly what will be the >> default functionality when we bump the major version. It really is a >> problem to do a whole bunch of stuff feverishly before a major version >> bump. The other style that I think works well is the linux kernel style >> where major versions alternate between stable and unstable (in other words, >> returning to the 0.x style with every alternating version). >> >> - I do think Beam suffers from fear and inability to do significant code >> gardening. I don't think backwards compatibility in the code sense is the >> biggest blocker. I think the "pipeline update" feature is perhaps the thing >> most holding Beam back from making radical rapid forward progress. >> >> Kenn >> >> On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote: >> >> >> Hi Damon, >> >> >> >> I fear the current release / versioning strategy of Beam doesn’t lend >> itself well for such breaking changes. Alexey and I have spent quite some >> time discussing how to proceed with the problematic Avro dependency in core >> (and respectively AvroIO, of course). >> >> Such changes essentially always require duplicating code to continue >> supporting a deprecated legacy code path to not break users’ code. But this >> comes at a very high price. Until the deprecated code path can be finally >> removed again, it must be maintained in two places. >> >> Unfortunately, the removal of deprecated code is rather problematic >> without a major version release as it would break semantic versioning and >> people’s expectations. With that deprecations bear the inherent risk to >> unintentionally deplete quality rather than improving it. >> >> I’d therefore recommend against such efforts unless there’s very strong >> reasons to do so. >> >> >> >> Best, Moritz >> >> >> >> On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org> wrote: >> >> >> >> Hello Everyone, If you identify yourself on the Beam learning journey, >> even if this is your first day, please see yourself as a welcome >> participant in this conversation and consider reviewing the bottom portion >> of this email for guidance. The >> >> Hello Everyone, >> >> >> >> If you identify yourself on the Beam learning journey, even if this is >> your first day, please see yourself as a welcome participant in this >> conversation and consider reviewing the bottom portion of this email for >> guidance. >> >> >> >> The Short Version (For those with Java Beam SDK knowledge): >> >> >> >> Should we migrate FileIO / TextIO and related classes from >> :sdks:java:core to :sdks:java:io:file? If so, should we target such a >> migration to a future Beam version with repeated announcements? Does the >> Beam repository have any example of a similar change in the past? What >> learnings from said past change could be potentially applied to this one? >> >> >> >> The Long Version (For those on the learning path): >> >> >> >> This email is more about our repository organization rather than Beam. >> The proposal is to move two highly used classes (and anything related) in >> our Java SDK called FileIO [1] and TextIO [2]. The Beam GitHub repository >> uses a software called gradle [3], to automate routine code tasks such as >> build and test. Gradle projects, such as Beam, organize code in what are >> called modules [4]. The three main ingredients that make a module are 1) a >> unique directory path, 2) a file called build.gradle (or build.gradle.kts) >> in this directory, 3) referencing the gradle module in a settings.gradle >> (or settings.gradle.kts) file at the root of the repository. >> >> >> >> The gradle documentation discusses why such organization might matter and >> how to achieve this with large projects [5]. Essentially, modules allow us >> to have mini-projects inside our large project and focus related >> automations to this one focused portion of our larger repository. In Beam, >> we have the module :sdks:java:core [6] with all things related to the core >> of Beam, whereas we have separate modules related to reading from and >> writing to various resources within :sdks:java:io [7]. >> >> >> >> The proposal suggests moving the aforementioned file reading and writing >> classes, FileIO and TextIO, and anything related, to its own >> :sdks:java:io:file module. This would correspond to a new >> sdks/java/io/file directory and moving these classes into >> sdks/java/io/file/main/java/org/apache/beam/sdk/io/file. >> >> >> >> Definitions / References: >> >> >> >> 1. FileIO - a General-purpose transforms for working with files: listing >> files (matching), reading and writing. See - >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html >> >> >> >> 2. TextIO - Similar to FileIO but focused on text files. See >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html >> >> >> >> 3. Gradle - a build automation tool used by the Apache Beam repository to >> automate code-related tasks. See >> https://docs.gradle.org/current/userguide/what_is_gradle.html >> >> >> >> 4. Gradle Module - a subsection of your larger repository. See >> https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module >> >> >> >> 5. Structuring Large Projects with Gradle - >> https://docs.gradle.org/current/userguide/structuring_software_products.html >> >> >> >> 6. sdks:java:core - Corresponds to the sdks/java/core repository >> directory. See https://github.com/apache/beam/tree/master/sdks/java/core >> >> >> >> 7. sdks:java:io - Corresponds to the sdks/java/io repository directory. >> See https://github.com/apache/beam/tree/master/sdks/java/io >> >> >> >> Best, >> >> >> >> Damon >> >> >> >> As a recipient of an email from Talend, your contact personal data will >> be on our systems. Please see our privacy notice. >> >> >> >> >>