Hi Nimrod,

Quick clarification—my proposal will not touch API-specific documentation
for the specific reasons you mentioned (signatures, behavior, etc.). It
just aims to make the *programming guides *versionless. Programming guides
should teach fundamentals of Spark, and the fundamentals of Spark should
not change between releases.

There are a few issues with updating documentation multiple times after
Spark releases. First, fixes that apply to all existing versions'
programming guides need backport PRs. For example, this change
<https://github.com/apache/spark/pull/46797/files> applies to all the
versions of the SS programming guide, but is likely to be fixed only in
Spark 4.0. Additionally, any such update within a Spark release will require
re-building the static sites in the spark repo, and copying those files to
spark-website via a commit in spark-website. Making a typo fix like the one
I linked would then require <number of versions we want to update> + 1 PRs,
opposed to 1 PR in the versionless programming guide world.

Neil

On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:

> Hi,
>
> While I think that the documentation needs a lot of improvement and
> important details are missing - and detaching the documentation from the
> main project can help iterating faster on documentation specific tasks, I
> don't think we can nor should move to versionless documentation.
>
> Documentation is version specific: parameters are added and removed, new
> features are added, behaviours sometimes change etc.
>
> I think the documentation should be version specific- but separate from
> spark release cadence - and can be updated multiple times after spark
> release.
> The way I see it is that the documentation should be updated only for the
> latest version and some time before a new release should be archived and
> the updated documentation should reflect the new version.
>
> Thanks,
> Nimrod
>
> בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu
> ‏<praveen.ga...@databricks.com.invalid>:
>
>> +1. This helps for greater velocity in improving docs. However, we might
>> still need a way to provide version specific information isn't it, i.e.
>> what features are available in which version etc.
>>
>> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy <n...@ramaswamy.org> wrote:
>>
>>> Hi all,
>>>
>>> I've written up a proposal to migrate all the Apache Spark programming
>>> guides to be versionless. You can find the proposal here
>>> <https://docs.google.com/document/d/1OqeQ71zZleUa1XRZrtaPDFnJ-gVJdGM80o42yJVg9zg/>.
>>> Please leave comments, or reply in this DISCUSS thread.
>>>
>>> TLDR: by making the programming guides versionless, we can make updates
>>> to them whenever we'd like, instead of at the Spark release cadence. This
>>> increased update velocity will enable us to make gradual improvements,
>>> including breaking up the Structured Streaming programming guide into
>>> smaller sub-guides. The proposal does not break *any *existing URLs,
>>> and it does not affect our versioned API docs in any way.
>>>
>>> Thanks!
>>> Neil
>>>
>>

Reply via email to