While I have no practical knowledge of how documentation is maintained in the 
spark project, I must agree with Nimrod. For users on older versions, having a 
programming guide that refers to features or API methods that does not exist in 
that version is confusing and detrimental.

Surely there must be a better way to allow updating documentation more often?

Best Regards,
Martin

________________________________
From: Nimrod Ofek <ofek.nim...@gmail.com>
Sent: Wednesday, June 5, 2024 08:26
To: Neil Ramaswamy <n...@ramaswamy.org>
Cc: Praveen Gattu <praveen.ga...@databricks.com.invalid>; dev 
<dev@spark.apache.org>
Subject: Re: [DISCUSS] Versionless Spark Programming Guide Proposal


EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.


Hi Neil,


While you wrote you don't mean the api docs (of course), the programming guides 
are also different between versions since features are being added, configs are 
being added/ removed/ changed, defaults are being changed etc.

I know of "backport hell" - which is why I wrote that once a version is 
released it's freezed and the documentation will be updated for the new version 
only.

I think of it as facing forward and keeping older versions but focusing on the 
new releases to keep the community updating.
While spark has support window of 18 months until eol, we can have only 6 
months support cycle until eol for documentation- there are no major security 
concerns for documentation...

Nimrod

בתאריך יום ד׳, 5 ביוני 2024, 08:28, מאת Neil Ramaswamy 
‏<n...@ramaswamy.org<mailto:n...@ramaswamy.org>>:
Hi Nimrod,

Quick clarification—my proposal will not touch API-specific documentation for 
the specific reasons you mentioned (signatures, behavior, etc.). It just aims 
to make the programming guides versionless. Programming guides should teach 
fundamentals of Spark, and the fundamentals of Spark should not change between 
releases.

There are a few issues with updating documentation multiple times after Spark 
releases. First, fixes that apply to all existing versions' programming guides 
need backport PRs. For example, this 
change<https://github.com/apache/spark/pull/46797/files> applies to all the 
versions of the SS programming guide, but is likely to be fixed only in Spark 
4.0. Additionally, any such update within a Spark release will require 
re-building the static sites in the spark repo, and copying those files to 
spark-website via a commit in spark-website. Making a typo fix like the one I 
linked would then require <number of versions we want to update> + 1 PRs, 
opposed to 1 PR in the versionless programming guide world.

Neil

On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek 
<ofek.nim...@gmail.com<mailto:ofek.nim...@gmail.com>> wrote:
Hi,

While I think that the documentation needs a lot of improvement and important 
details are missing - and detaching the documentation from the main project can 
help iterating faster on documentation specific tasks, I don't think we can nor 
should move to versionless documentation.

Documentation is version specific: parameters are added and removed, new 
features are added, behaviours sometimes change etc.

I think the documentation should be version specific- but separate from spark 
release cadence - and can be updated multiple times after spark release.
The way I see it is that the documentation should be updated only for the 
latest version and some time before a new release should be archived and the 
updated documentation should reflect the new version.

Thanks,
Nimrod

בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu 
‏<praveen.ga...@databricks.com.invalid>:
+1. This helps for greater velocity in improving docs. However, we might still 
need a way to provide version specific information isn't it, i.e. what features 
are available in which version etc.

On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy 
<n...@ramaswamy.org<mailto:n...@ramaswamy.org>> wrote:
Hi all,

I've written up a proposal to migrate all the Apache Spark programming guides 
to be versionless. You can find the proposal 
here<https://docs.google.com/document/d/1OqeQ71zZleUa1XRZrtaPDFnJ-gVJdGM80o42yJVg9zg/>.
 Please leave comments, or reply in this DISCUSS thread.

TLDR: by making the programming guides versionless, we can make updates to them 
whenever we'd like, instead of at the Spark release cadence. This increased 
update velocity will enable us to make gradual improvements, including breaking 
up the Structured Streaming programming guide into smaller sub-guides. The 
proposal does not break any existing URLs, and it does not affect our versioned 
API docs in any way.

Thanks!
Neil
CONFIDENTIALITY NOTICE: This email message (and any attachment) is intended 
only for the individual or entity to which it is addressed. The information in 
this email is confidential and may contain information that is legally 
privileged or exempt from disclosure under applicable law. If you are not the 
intended recipient, you are strictly prohibited from reading, using, publishing 
or disseminating such information and upon receipt, must permanently delete the 
original and destroy any copies. We take steps to protect against viruses and 
other defects but advise you to carry out your own checks and precautions as 
Kambi does not accept any liability for any which remain. Thank you for your 
co-operation.

Reply via email to