Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Nimrod Ofek
Hi Neil,


While you wrote you don't mean the api docs (of course), the programming
guides are also different between versions since features are being added,
configs are being added/ removed/ changed, defaults are being changed etc.

I know of "backport hell" - which is why I wrote that once a version is
released it's freezed and the documentation will be updated for the new
version only.

I think of it as facing forward and keeping older versions but focusing on
the new releases to keep the community updating.
While spark has support window of 18 months until eol, we can have only 6
months support cycle until eol for documentation- there are no major
security concerns for documentation...

Nimrod

בתאריך יום ד׳, 5 ביוני 2024, 08:28, מאת Neil Ramaswamy ‏:

> Hi Nimrod,
>
> Quick clarification—my proposal will not touch API-specific documentation
> for the specific reasons you mentioned (signatures, behavior, etc.). It
> just aims to make the *programming guides *versionless. Programming
> guides should teach fundamentals of Spark, and the fundamentals of Spark
> should not change between releases.
>
> There are a few issues with updating documentation multiple times after
> Spark releases. First, fixes that apply to all existing versions'
> programming guides need backport PRs. For example, this change
>  applies to all the
> versions of the SS programming guide, but is likely to be fixed only in
> Spark 4.0. Additionally, any such update within a Spark release will require
> re-building the static sites in the spark repo, and copying those files to
> spark-website via a commit in spark-website. Making a typo fix like the one
> I linked would then require  + 1 PRs,
> opposed to 1 PR in the versionless programming guide world.
>
> Neil
>
> On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek  wrote:
>
>> Hi,
>>
>> While I think that the documentation needs a lot of improvement and
>> important details are missing - and detaching the documentation from the
>> main project can help iterating faster on documentation specific tasks, I
>> don't think we can nor should move to versionless documentation.
>>
>> Documentation is version specific: parameters are added and removed, new
>> features are added, behaviours sometimes change etc.
>>
>> I think the documentation should be version specific- but separate from
>> spark release cadence - and can be updated multiple times after spark
>> release.
>> The way I see it is that the documentation should be updated only for the
>> latest version and some time before a new release should be archived and
>> the updated documentation should reflect the new version.
>>
>> Thanks,
>> Nimrod
>>
>> בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu
>> ‏:
>>
>>> +1. This helps for greater velocity in improving docs. However, we might
>>> still need a way to provide version specific information isn't it, i.e.
>>> what features are available in which version etc.
>>>
>>> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy 
>>> wrote:
>>>
 Hi all,

 I've written up a proposal to migrate all the Apache Spark programming
 guides to be versionless. You can find the proposal here
 .
 Please leave comments, or reply in this DISCUSS thread.

 TLDR: by making the programming guides versionless, we can make updates
 to them whenever we'd like, instead of at the Spark release cadence. This
 increased update velocity will enable us to make gradual improvements,
 including breaking up the Structured Streaming programming guide into
 smaller sub-guides. The proposal does not break *any *existing URLs,
 and it does not affect our versioned API docs in any way.

 Thanks!
 Neil

>>>


Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Neil Ramaswamy
Hi Nimrod,

Quick clarification—my proposal will not touch API-specific documentation
for the specific reasons you mentioned (signatures, behavior, etc.). It
just aims to make the *programming guides *versionless. Programming guides
should teach fundamentals of Spark, and the fundamentals of Spark should
not change between releases.

There are a few issues with updating documentation multiple times after
Spark releases. First, fixes that apply to all existing versions'
programming guides need backport PRs. For example, this change
 applies to all the
versions of the SS programming guide, but is likely to be fixed only in
Spark 4.0. Additionally, any such update within a Spark release will require
re-building the static sites in the spark repo, and copying those files to
spark-website via a commit in spark-website. Making a typo fix like the one
I linked would then require  + 1 PRs,
opposed to 1 PR in the versionless programming guide world.

Neil

On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek  wrote:

> Hi,
>
> While I think that the documentation needs a lot of improvement and
> important details are missing - and detaching the documentation from the
> main project can help iterating faster on documentation specific tasks, I
> don't think we can nor should move to versionless documentation.
>
> Documentation is version specific: parameters are added and removed, new
> features are added, behaviours sometimes change etc.
>
> I think the documentation should be version specific- but separate from
> spark release cadence - and can be updated multiple times after spark
> release.
> The way I see it is that the documentation should be updated only for the
> latest version and some time before a new release should be archived and
> the updated documentation should reflect the new version.
>
> Thanks,
> Nimrod
>
> בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu
> ‏:
>
>> +1. This helps for greater velocity in improving docs. However, we might
>> still need a way to provide version specific information isn't it, i.e.
>> what features are available in which version etc.
>>
>> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy  wrote:
>>
>>> Hi all,
>>>
>>> I've written up a proposal to migrate all the Apache Spark programming
>>> guides to be versionless. You can find the proposal here
>>> .
>>> Please leave comments, or reply in this DISCUSS thread.
>>>
>>> TLDR: by making the programming guides versionless, we can make updates
>>> to them whenever we'd like, instead of at the Spark release cadence. This
>>> increased update velocity will enable us to make gradual improvements,
>>> including breaking up the Structured Streaming programming guide into
>>> smaller sub-guides. The proposal does not break *any *existing URLs,
>>> and it does not affect our versioned API docs in any way.
>>>
>>> Thanks!
>>> Neil
>>>
>>


Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Nimrod Ofek
Hi,

While I think that the documentation needs a lot of improvement and
important details are missing - and detaching the documentation from the
main project can help iterating faster on documentation specific tasks, I
don't think we can nor should move to versionless documentation.

Documentation is version specific: parameters are added and removed, new
features are added, behaviours sometimes change etc.

I think the documentation should be version specific- but separate from
spark release cadence - and can be updated multiple times after spark
release.
The way I see it is that the documentation should be updated only for the
latest version and some time before a new release should be archived and
the updated documentation should reflect the new version.

Thanks,
Nimrod

בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu
‏:

> +1. This helps for greater velocity in improving docs. However, we might
> still need a way to provide version specific information isn't it, i.e.
> what features are available in which version etc.
>
> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy  wrote:
>
>> Hi all,
>>
>> I've written up a proposal to migrate all the Apache Spark programming
>> guides to be versionless. You can find the proposal here
>> .
>> Please leave comments, or reply in this DISCUSS thread.
>>
>> TLDR: by making the programming guides versionless, we can make updates
>> to them whenever we'd like, instead of at the Spark release cadence. This
>> increased update velocity will enable us to make gradual improvements,
>> including breaking up the Structured Streaming programming guide into
>> smaller sub-guides. The proposal does not break *any *existing URLs, and
>> it does not affect our versioned API docs in any way.
>>
>> Thanks!
>> Neil
>>
>


Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Praveen Gattu
+1. This helps for greater velocity in improving docs. However, we might
still need a way to provide version specific information isn't it, i.e.
what features are available in which version etc.

On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy  wrote:

> Hi all,
>
> I've written up a proposal to migrate all the Apache Spark programming
> guides to be versionless. You can find the proposal here
> .
> Please leave comments, or reply in this DISCUSS thread.
>
> TLDR: by making the programming guides versionless, we can make updates to
> them whenever we'd like, instead of at the Spark release cadence. This
> increased update velocity will enable us to make gradual improvements,
> including breaking up the Structured Streaming programming guide into
> smaller sub-guides. The proposal does not break *any *existing URLs, and
> it does not affect our versioned API docs in any way.
>
> Thanks!
> Neil
>