+1 to what's already being said here. It is good to copy the spec to Iceberg and add context that's specific to Iceberg, but at the same time, we should maintain compatibility.
Kind regards, Fokko Op wo 14 aug 2024 om 15:30 schreef Manu Zhang <owenzhang1...@gmail.com>: > +1 to copy the spec into our repository. I think the best way to keep > compatibility is building integration tests. > > Thanks, > Manu > > On Wed, Aug 14, 2024 at 8:27 PM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Thanks Russell and Aihua for pushing Variant support! >> >> Given the differences between the supported types and the lack of >> interest from the other project, I think it is reasonable to duplicate the >> specification to our repository. >> I would give very strong emphasis on sticking to the Spark spec as much >> as possible, to keep compatibility as much as possible. Maybe even revert >> to a shared specification if the situation changes. >> >> Thanks, >> Peter >> >> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, >> 19:52): >> >>> Thanks Russell for bringing this up. >>> >>> This is the main blocker to move forward with the Variant support in >>> Iceberg and hopefully we can have a consensus. To me, I also feel it makes >>> more sense to move the spec into Iceberg rather than Spark engine owns it >>> and we try to keep it compatible with Spark spec. >>> >>> Thanks, >>> Aihua >>> >>> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> Hi Y’all, >>>> >>>> We’ve hit a bit of a roadblock with the Variant Proposal, while we were >>>> hoping to move the Variant and Shredding specifications from Spark into >>>> Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately, >>>> I think we have a number of issues with just linking to the Spark project >>>> directly from within Iceberg and *I believe we need to copy the >>>> specifications into our repository*. >>>> >>>> There are a few reasons why i think this is necessary >>>> >>>> First, we have a divergence of types already. The Spark Specification >>>> already includes types which Iceberg has no definition for (19, 20 >>>> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types> >>>> - Interval Types) and Iceberg already has a type which is not included >>>> within the Spark Specification (Time) and will soon have more with >>>> TimestampNS, and Geo. >>>> >>>> Second, We would like to make sure that Spark is not a hard dependency >>>> for other engines. We are working with several implementers of the Iceberg >>>> spec and it has previously been agreed that it would be best if the source >>>> of truth for Variant existed in an engine and file format neutral location. >>>> The Iceberg project has a good open model of governance and, as we have >>>> seen so far discussing Variant >>>> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, >>>> open and active collaboration. This would also help as we can strictly >>>> version our changes in-line with the rest of the Iceberg spec. >>>> >>>> Third, The Shredding spec is not quite finished and requires some group >>>> analysis and discussion before we commit it. I think again the Iceberg >>>> community is probably the right place for this to happen as we have already >>>> started discussions here on these topics. >>>> >>>> For these reasons I think we should go with a direct copy of the >>>> existing specification from the Spark Project and move ahead with our >>>> discussions and modifications within Iceberg. That said, *I do not >>>> want to diverge if possible from the Spark proposal*. For example, >>>> although we do not use the Interval types above, I think we should not >>>> reuse those type ids within our spec. Iceberg's Variant Spec types 19 and >>>> 20 would remain unused along with any other types we think are not >>>> applicable. We should strive whenever possible to allow for compatibility. >>>> >>>> In the interest of moving forward with this proposal I am hoping to see >>>> if anyone in the community objects to this plan going forward or has a >>>> better alternative. >>>> >>>> As always I am thankful for your time and am eager to hear back from >>>> everyone, >>>> Russ >>>> >>>>