Re: [discuss] ensure feature consistency across the 3 different spark versions

Wing Yew Poon Tue, 15 Jul 2025 12:12:04 -0700

Kevin,

Just a minor clarification:


I want to point out that Spark 4.0 is in an interesting state right
> now. Spark 4.0 is not yet the "latest supported version" since Iceberg
> 1.10 should be the first version that works with Spark 4.0 according to
> #13162
> <https://github.com/apache/iceberg/issues/13162#issuecomment-2912307091>.
> So before the next release, Spark 3.5 is the "latest supported version".


By "latest supported version", I did not mean in released Iceberg; I meant
in the branch under development. So once Huaxin added support for Spark 4.0
and Spark 4.0 was released (
https://github.com/apache/iceberg/commit/b504f9c51c6c0e0a5c0c5ff53f295e69b67d8e59),
the latest supported version became 4.0.

- Wing Yew


On Tue, Jul 15, 2025 at 11:44 AM Kevin Liu <kevinjq...@apache.org> wrote:

> Thanks for the context, Wing and Anton!
>
> My main concern was around feature parity between the different Spark
> versions. And especially if a feature is only implemented in an older
> version of Spark.
>
> > I believe the general practice is to implement a feature in the latest
> supported version (currently Spark 4.0). Once the PR is merged, the author
> may choose to backport the feature to older supported versions, but is not
> obligated to. If the author does not backport it, others who want the
> feature in an older version can choose to backport it.
>
> This is great! This process addresses my concern.
> I want to point out that Spark 4.0 is in an interesting state right now. Spark
> 4.0 is not yet the "latest supported version" since Iceberg 1.10 should
> be the first version that works with Spark 4.0 according to #13162
> <https://github.com/apache/iceberg/issues/13162#issuecomment-2912307091>.
> So before the next release, Spark 3.5 is the "latest supported version".
> I'll do a pass to make sure that newly added features in Spark 3.5 are also
> available in Spark 4.0 so there's no discrepancy between the 2 versions.
>
> > Shall we be more aggressive with dropping old Spark versions if we feel
> the quality of those integrations is not on the expected level? For
> instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0
> was released in April of 2023 and is not maintained by Spark as of October
> of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last
> 3.x release.
>
> That makes sense to me. I think we should keep at least 1 of the Spark 3.x
> versions. Not sure if there's value in keeping both 3.4 and 3.5. Let's
> start a separate thread to solicit some feedback from the community.
>
> Best,
> Kevin Liu
>
> On Fri, Jul 11, 2025 at 11:38 AM Anton Okolnychyi <aokolnyc...@gmail.com>
> wrote:
>
>> I agree with what Wing Yew said. It has always been the agreement to
>> actively develop against the latest supported version of Spark and folks
>> that are interested in older Spark versions can backport features of their
>> interest. That said, we do try to fix correctness bugs and prevent
>> corruptions across all maintained Spark versions.
>>
>> Shall we be more aggressive with dropping old Spark versions if we feel
>> the quality of those integrations is not on the expected level? For
>> instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0
>> was released in April of 2023 and is not maintained by Spark as of October
>> of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last
>> 3.x release.
>>
>> - Anton
>>
>> пт, 11 лип. 2025 р. о 11:13 Wing Yew Poon <wyp...@cloudera.com.invalid>
>> пише:
>>
>>> Hi Kevin,
>>>
>>> I believe the general practice is to implement a feature in the latest
>>> supported version (currently Spark 4.0). Once the PR is merged, the author
>>> may choose to backport the feature to older supported versions, but is not
>>> obligated to. If the author does not backport it, others who want the
>>> feature in an older version can choose to backport it.
>>>
>>> Sometimes a change is simple enough that it makes sense to implement it
>>> for all supported versions at once (in one PR). In addition, if a change
>>> requires changes in core Iceberg that then requires the same change in
>>> other Spark versions, the change is implemented for all Spark versions in
>>> one PR.
>>>
>>> Sometimes a feature depends on changes in the latest supported Spark
>>> version and so cannot be backported.
>>>
>>> Finally, sometimes a PR has already been in progress for a long time and
>>> the latest supported Spark version changes in the meantime. It may still
>>> get merged and then be forward ported.
>>>
>>> I understand that your intent is to ensure that features/fixes that
>>> *can* be backported be backported. A diff of git logs by itself cannot tell
>>> you if a missing change is portable or not. How and when do you propose to
>>> do this diff, and does the result of the diff cause anything to be blocked
>>> or any action to be taken? Do you perhaps envision this to be done as a
>>> kind of pre-release audit (with enough time to address missing backports)?
>>>
>>> - Wing Yew
>>>
>>>
>>> On Thu, Jul 10, 2025 at 6:43 PM Kevin Liu <kevinjq...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We currently maintain 3 different versions of spark under
>>>> https://github.com/apache/iceberg/tree/main/spark
>>>> I've seen this issue a couple of times where a feature would be
>>>> implemented for only one of the Spark versions. For example, see
>>>> https://github.com/apache/iceberg/pull/13324 and
>>>> https://github.com/apache/iceberg/pull/13459. It's hard to remember
>>>> that there are 3 different versions of Spark.
>>>>
>>>> Do we want to verify that features are implemented across all 3
>>>> versions if possible? If so, we can diff the git logs between spark 3.4
>>>> <https://github.com/apache/iceberg/commits/main/spark/v3.4>, 3.5
>>>> <https://github.com/apache/iceberg/commits/main/spark/v3.5>, and 4.0
>>>> <https://github.com/apache/iceberg/commits/main/spark/v4.0>.
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>

Re: [discuss] ensure feature consistency across the 3 different spark versions

Reply via email to