Kevin, Just a minor clarification:
I want to point out that Spark 4.0 is in an interesting state right > now. Spark 4.0 is not yet the "latest supported version" since Iceberg > 1.10 should be the first version that works with Spark 4.0 according to > #13162 > <https://github.com/apache/iceberg/issues/13162#issuecomment-2912307091>. > So before the next release, Spark 3.5 is the "latest supported version". By "latest supported version", I did not mean in released Iceberg; I meant in the branch under development. So once Huaxin added support for Spark 4.0 and Spark 4.0 was released ( https://github.com/apache/iceberg/commit/b504f9c51c6c0e0a5c0c5ff53f295e69b67d8e59), the latest supported version became 4.0. - Wing Yew On Tue, Jul 15, 2025 at 11:44 AM Kevin Liu <kevinjq...@apache.org> wrote: > Thanks for the context, Wing and Anton! > > My main concern was around feature parity between the different Spark > versions. And especially if a feature is only implemented in an older > version of Spark. > > > I believe the general practice is to implement a feature in the latest > supported version (currently Spark 4.0). Once the PR is merged, the author > may choose to backport the feature to older supported versions, but is not > obligated to. If the author does not backport it, others who want the > feature in an older version can choose to backport it. > > This is great! This process addresses my concern. > I want to point out that Spark 4.0 is in an interesting state right now. Spark > 4.0 is not yet the "latest supported version" since Iceberg 1.10 should > be the first version that works with Spark 4.0 according to #13162 > <https://github.com/apache/iceberg/issues/13162#issuecomment-2912307091>. > So before the next release, Spark 3.5 is the "latest supported version". > I'll do a pass to make sure that newly added features in Spark 3.5 are also > available in Spark 4.0 so there's no discrepancy between the 2 versions. > > > Shall we be more aggressive with dropping old Spark versions if we feel > the quality of those integrations is not on the expected level? For > instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0 > was released in April of 2023 and is not maintained by Spark as of October > of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last > 3.x release. > > That makes sense to me. I think we should keep at least 1 of the Spark 3.x > versions. Not sure if there's value in keeping both 3.4 and 3.5. Let's > start a separate thread to solicit some feedback from the community. > > Best, > Kevin Liu > > On Fri, Jul 11, 2025 at 11:38 AM Anton Okolnychyi <aokolnyc...@gmail.com> > wrote: > >> I agree with what Wing Yew said. It has always been the agreement to >> actively develop against the latest supported version of Spark and folks >> that are interested in older Spark versions can backport features of their >> interest. That said, we do try to fix correctness bugs and prevent >> corruptions across all maintained Spark versions. >> >> Shall we be more aggressive with dropping old Spark versions if we feel >> the quality of those integrations is not on the expected level? For >> instance, can we deprecate 3.4 in the upcoming 1.10 release? Spark 3.4.0 >> was released in April of 2023 and is not maintained by Spark as of October >> of 2024. I can imagine maintaining 3.5 for a bit longer as it is the last >> 3.x release. >> >> - Anton >> >> пт, 11 лип. 2025 р. о 11:13 Wing Yew Poon <wyp...@cloudera.com.invalid> >> пише: >> >>> Hi Kevin, >>> >>> I believe the general practice is to implement a feature in the latest >>> supported version (currently Spark 4.0). Once the PR is merged, the author >>> may choose to backport the feature to older supported versions, but is not >>> obligated to. If the author does not backport it, others who want the >>> feature in an older version can choose to backport it. >>> >>> Sometimes a change is simple enough that it makes sense to implement it >>> for all supported versions at once (in one PR). In addition, if a change >>> requires changes in core Iceberg that then requires the same change in >>> other Spark versions, the change is implemented for all Spark versions in >>> one PR. >>> >>> Sometimes a feature depends on changes in the latest supported Spark >>> version and so cannot be backported. >>> >>> Finally, sometimes a PR has already been in progress for a long time and >>> the latest supported Spark version changes in the meantime. It may still >>> get merged and then be forward ported. >>> >>> I understand that your intent is to ensure that features/fixes that >>> *can* be backported be backported. A diff of git logs by itself cannot tell >>> you if a missing change is portable or not. How and when do you propose to >>> do this diff, and does the result of the diff cause anything to be blocked >>> or any action to be taken? Do you perhaps envision this to be done as a >>> kind of pre-release audit (with enough time to address missing backports)? >>> >>> - Wing Yew >>> >>> >>> On Thu, Jul 10, 2025 at 6:43 PM Kevin Liu <kevinjq...@apache.org> wrote: >>> >>>> Hi everyone, >>>> >>>> We currently maintain 3 different versions of spark under >>>> https://github.com/apache/iceberg/tree/main/spark >>>> I've seen this issue a couple of times where a feature would be >>>> implemented for only one of the Spark versions. For example, see >>>> https://github.com/apache/iceberg/pull/13324 and >>>> https://github.com/apache/iceberg/pull/13459. It's hard to remember >>>> that there are 3 different versions of Spark. >>>> >>>> Do we want to verify that features are implemented across all 3 >>>> versions if possible? If so, we can diff the git logs between spark 3.4 >>>> <https://github.com/apache/iceberg/commits/main/spark/v3.4>, 3.5 >>>> <https://github.com/apache/iceberg/commits/main/spark/v3.5>, and 4.0 >>>> <https://github.com/apache/iceberg/commits/main/spark/v4.0>. >>>> >>>> Best, >>>> Kevin Liu >>>> >>>