Hi Nimrod, Quick clarification—my proposal will not touch API-specific documentation for the specific reasons you mentioned (signatures, behavior, etc.). It just aims to make the *programming guides *versionless. Programming guides should teach fundamentals of Spark, and the fundamentals of Spark should not change between releases.
There are a few issues with updating documentation multiple times after Spark releases. First, fixes that apply to all existing versions' programming guides need backport PRs. For example, this change <https://github.com/apache/spark/pull/46797/files> applies to all the versions of the SS programming guide, but is likely to be fixed only in Spark 4.0. Additionally, any such update within a Spark release will require re-building the static sites in the spark repo, and copying those files to spark-website via a commit in spark-website. Making a typo fix like the one I linked would then require <number of versions we want to update> + 1 PRs, opposed to 1 PR in the versionless programming guide world. Neil On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > Hi, > > While I think that the documentation needs a lot of improvement and > important details are missing - and detaching the documentation from the > main project can help iterating faster on documentation specific tasks, I > don't think we can nor should move to versionless documentation. > > Documentation is version specific: parameters are added and removed, new > features are added, behaviours sometimes change etc. > > I think the documentation should be version specific- but separate from > spark release cadence - and can be updated multiple times after spark > release. > The way I see it is that the documentation should be updated only for the > latest version and some time before a new release should be archived and > the updated documentation should reflect the new version. > > Thanks, > Nimrod > > בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu > <praveen.ga...@databricks.com.invalid>: > >> +1. This helps for greater velocity in improving docs. However, we might >> still need a way to provide version specific information isn't it, i.e. >> what features are available in which version etc. >> >> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy <n...@ramaswamy.org> wrote: >> >>> Hi all, >>> >>> I've written up a proposal to migrate all the Apache Spark programming >>> guides to be versionless. You can find the proposal here >>> <https://docs.google.com/document/d/1OqeQ71zZleUa1XRZrtaPDFnJ-gVJdGM80o42yJVg9zg/>. >>> Please leave comments, or reply in this DISCUSS thread. >>> >>> TLDR: by making the programming guides versionless, we can make updates >>> to them whenever we'd like, instead of at the Spark release cadence. This >>> increased update velocity will enable us to make gradual improvements, >>> including breaking up the Structured Streaming programming guide into >>> smaller sub-guides. The proposal does not break *any *existing URLs, >>> and it does not affect our versioned API docs in any way. >>> >>> Thanks! >>> Neil >>> >>