Re: [discuss] ensure feature consistency across the 3 different spark versions

Kevin Liu Tue, 15 Jul 2025 11:44:42 -0700

Thanks for the context, Wing and Anton!

My main concern was around feature parity between the different Spark
versions. And especially if a feature is only implemented in an older
version of Spark.


> I believe the general practice is to implement a feature in the latest
supported version (currently Spark 4.0). Once the PR is merged, the author
may choose to backport the feature to older supported versions, but is not
obligated to. If the author does not backport it, others who want the
feature in an older version can choose to backport it.

This is great! This process addresses my concern.
I want to point out that Spark 4.0 is in an interesting state right now. Spark
4.0 is not yet the "latest supported version" since Iceberg 1.10 should be
the first version that works with Spark 4.0 according to #13162
<https://github.com/apache/iceberg/issues/13162#issuecomment-2912307091>.
So before the next release, Spark 3.5 is the "latest supported version".
I'll do a pass to make sure that newly added features in Spark 3.5 are also
available in Spark 4.0 so there's no discrepancy between the 2 versions.

> Shall we be more aggressive with dropping old Spark versions if we feel
the quality of those integrations is not on the expected level? For
instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0
was released in April of 2023 and is not maintained by Spark as of October
of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last
3.x release.

That makes sense to me. I think we should keep at least 1 of the Spark 3.x
versions. Not sure if there's value in keeping both 3.4 and 3.5. Let's
start a separate thread to solicit some feedback from the community.

Best,
Kevin Liu

On Fri, Jul 11, 2025 at 11:38 AM Anton Okolnychyi <aokolnyc...@gmail.com>
wrote:

> I agree with what Wing Yew said. It has always been the agreement to
> actively develop against the latest supported version of Spark and folks
> that are interested in older Spark versions can backport features of their
> interest. That said, we do try to fix correctness bugs and prevent
> corruptions across all maintained Spark versions.
>
> Shall we be more aggressive with dropping old Spark versions if we feel
> the quality of those integrations is not on the expected level? For
> instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0
> was released in April of 2023 and is not maintained by Spark as of October
> of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last
> 3.x release.
>
> - Anton
>
> пт, 11 лип. 2025 р. о 11:13 Wing Yew Poon <wyp...@cloudera.com.invalid>
> пише:
>
>> Hi Kevin,
>>
>> I believe the general practice is to implement a feature in the latest
>> supported version (currently Spark 4.0). Once the PR is merged, the author
>> may choose to backport the feature to older supported versions, but is not
>> obligated to. If the author does not backport it, others who want the
>> feature in an older version can choose to backport it.
>>
>> Sometimes a change is simple enough that it makes sense to implement it
>> for all supported versions at once (in one PR). In addition, if a change
>> requires changes in core Iceberg that then requires the same change in
>> other Spark versions, the change is implemented for all Spark versions in
>> one PR.
>>
>> Sometimes a feature depends on changes in the latest supported Spark
>> version and so cannot be backported.
>>
>> Finally, sometimes a PR has already been in progress for a long time and
>> the latest supported Spark version changes in the meantime. It may still
>> get merged and then be forward ported.
>>
>> I understand that your intent is to ensure that features/fixes that *can*
>> be backported be backported. A diff of git logs by itself cannot tell you
>> if a missing change is portable or not. How and when do you propose to do
>> this diff, and does the result of the diff cause anything to be blocked or
>> any action to be taken? Do you perhaps envision this to be done as a kind
>> of pre-release audit (with enough time to address missing backports)?
>>
>> - Wing Yew
>>
>>
>> On Thu, Jul 10, 2025 at 6:43 PM Kevin Liu <kevinjq...@apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> We currently maintain 3 different versions of spark under
>>> https://github.com/apache/iceberg/tree/main/spark
>>> I've seen this issue a couple of times where a feature would be
>>> implemented for only one of the Spark versions. For example, see
>>> https://github.com/apache/iceberg/pull/13324 and
>>> https://github.com/apache/iceberg/pull/13459. It's hard to remember
>>> that there are 3 different versions of Spark.
>>>
>>> Do we want to verify that features are implemented across all 3 versions
>>> if possible? If so, we can diff the git logs between spark 3.4
>>> <https://github.com/apache/iceberg/commits/main/spark/v3.4>, 3.5
>>> <https://github.com/apache/iceberg/commits/main/spark/v3.5>, and 4.0
>>> <https://github.com/apache/iceberg/commits/main/spark/v4.0>.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>

Re: [discuss] ensure feature consistency across the 3 different spark versions

Reply via email to