+1 for copying the spec into our repository, I think we need to own it fully as a part of the table spec, and we can build compatibility through tests.
-Jack On Wed, Aug 14, 2024 at 12:52 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I'm not really in favor of linking and annotating as that just makes > things more complicated and still is essentially forking just with more > steps. If we just track our annotations / modifications to a single > commit/version then we have the same issue again but now you have to go to > multiple sources to get the actual Spec. *In addition, our very copy of > the Spec is going to require new types which don't exist in the Spark Spec > which necessarily means diverging. *We will need to take up new primitive > id's (as noted in my first email) > > The other issue I have is I don't think the Spark Spec is really going > through a thorough review process from all members of the Spark community, > I believe it probably should have gone through the SPIP but instead seems > to have been merged without broad community involvement. > > The only way to truly avoid diverging is to only have a single copy of the > spec, in our previous discussions the vast majority of Apache Iceberg > community want it to exist here. > > On Wed, Aug 14, 2024 at 2:19 PM Daniel Weeks <dwe...@apache.org> wrote: > >> I'm really excited about the introduction of variant type to Iceberg, but >> I want to raise concerns about forking the spec. >> >> I feel like preemptively forking would create the situation where we end >> up diverging because there's little reason to work with both communities to >> evolve in a way that benefits everyone. >> >> I would much rather point to a specific version of the spec and annotate >> any variance in Iceberg's handling. This would allow us to continue >> without dividing the communities. >> >> If at any point there are irreconcilable differences, I would support >> forking, but I don't feel like that should be the initial step. >> >> No one is excited about the possibility that the physical representations >> end up diverging, but it feels like we're setting ourselves up for that >> exact scenario. >> >> -Dan >> >> >> On Wed, Aug 14, 2024 at 6:54 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>> +1 to what's already being said here. It is good to copy the spec to >>> Iceberg and add context that's specific to Iceberg, but at the same time, >>> we should maintain compatibility. >>> >>> Kind regards, >>> Fokko >>> >>> Op wo 14 aug 2024 om 15:30 schreef Manu Zhang <owenzhang1...@gmail.com>: >>> >>>> +1 to copy the spec into our repository. I think the best way to keep >>>> compatibility is building integration tests. >>>> >>>> Thanks, >>>> Manu >>>> >>>> On Wed, Aug 14, 2024 at 8:27 PM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Russell and Aihua for pushing Variant support! >>>>> >>>>> Given the differences between the supported types and the lack of >>>>> interest from the other project, I think it is reasonable to duplicate the >>>>> specification to our repository. >>>>> I would give very strong emphasis on sticking to the Spark spec as >>>>> much as possible, to keep compatibility as much as possible. Maybe even >>>>> revert to a shared specification if the situation changes. >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>>> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, >>>>> 19:52): >>>>> >>>>>> Thanks Russell for bringing this up. >>>>>> >>>>>> This is the main blocker to move forward with the Variant support in >>>>>> Iceberg and hopefully we can have a consensus. To me, I also feel it >>>>>> makes >>>>>> more sense to move the spec into Iceberg rather than Spark engine owns it >>>>>> and we try to keep it compatible with Spark spec. >>>>>> >>>>>> Thanks, >>>>>> Aihua >>>>>> >>>>>> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer < >>>>>> russell.spit...@gmail.com> wrote: >>>>>> >>>>>>> Hi Y’all, >>>>>>> >>>>>>> We’ve hit a bit of a roadblock with the Variant Proposal, while we >>>>>>> were hoping to move the Variant and Shredding specifications from Spark >>>>>>> into Iceberg there doesn’t seem to be a lot of interest in that. >>>>>>> Unfortunately, I think we have a number of issues with just linking to >>>>>>> the >>>>>>> Spark project directly from within Iceberg and *I believe we need >>>>>>> to copy the specifications into our repository*. >>>>>>> >>>>>>> There are a few reasons why i think this is necessary >>>>>>> >>>>>>> First, we have a divergence of types already. The Spark >>>>>>> Specification already includes types which Iceberg has no definition >>>>>>> for (19, >>>>>>> 20 >>>>>>> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types> >>>>>>> - Interval Types) and Iceberg already has a type which is not included >>>>>>> within the Spark Specification (Time) and will soon have more with >>>>>>> TimestampNS, and Geo. >>>>>>> >>>>>>> Second, We would like to make sure that Spark is not a hard >>>>>>> dependency for other engines. We are working with several implementers >>>>>>> of >>>>>>> the Iceberg spec and it has previously been agreed that it would be >>>>>>> best if >>>>>>> the source of truth for Variant existed in an engine and file format >>>>>>> neutral location. The Iceberg project has a good open model of >>>>>>> governance >>>>>>> and, as we have seen so far discussing Variant >>>>>>> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, >>>>>>> open and active collaboration. This would also help as we can strictly >>>>>>> version our changes in-line with the rest of the Iceberg spec. >>>>>>> >>>>>>> Third, The Shredding spec is not quite finished and requires some >>>>>>> group analysis and discussion before we commit it. I think again the >>>>>>> Iceberg community is probably the right place for this to happen as we >>>>>>> have >>>>>>> already started discussions here on these topics. >>>>>>> >>>>>>> For these reasons I think we should go with a direct copy of the >>>>>>> existing specification from the Spark Project and move ahead with our >>>>>>> discussions and modifications within Iceberg. That said, *I do not >>>>>>> want to diverge if possible from the Spark proposal*. For example, >>>>>>> although we do not use the Interval types above, I think we should >>>>>>> not reuse those type ids within our spec. Iceberg's Variant Spec >>>>>>> types 19 and 20 would remain unused along with any other types we think >>>>>>> are >>>>>>> not applicable. We should strive whenever possible to allow for >>>>>>> compatibility. >>>>>>> >>>>>>> In the interest of moving forward with this proposal I am hoping to >>>>>>> see if anyone in the community objects to this plan going forward or >>>>>>> has a >>>>>>> better alternative. >>>>>>> >>>>>>> As always I am thankful for your time and am eager to hear back from >>>>>>> everyone, >>>>>>> Russ >>>>>>> >>>>>>>