I propose two things: 1. We officially release the specs with major and minor versions. With some experience of implementing against both the table and the catalog spec, I would say that such error is not as obvious as we think when there are hundreds of models and property fields you need to implement, and you just work with the latest text spec at a point of time to implement it. Having released spec versions marks that the spec text you are looking at is a version that can be consumed, and if there are any changes or fixes in existing parts of the spec, it will be in a compatible and predictable way.
2. We have an official reference implementation of the REST server. There have been a lot of debates about it already, but I am feeling increasingly strong about it to ensure the quality of the spec. This is another case where the error would have been easily caught if there was validation of end-to-end client-server interaction using the generated models. We had a similar issue previously with error models that the definition was not matching the implementation for a long time: https://github.com/apache/iceberg/pull/8914. As the spec evolves to be increasingly complex, it will be more and more challenging to spot errors just by looking at the spec text and the generated model shape without actually using it. -Jack On Tue, Jul 9, 2024 at 1:10 PM Ryan Blue <b...@databricks.com.invalid> wrote: > What I'm asking is whether we want to have a vote for a substantive change > and not for bug fixes. I'm not suggesting anything beyond that, like there > is some ability for committers and PMC members to not follow the same rules. > > While this is a change to a property name, the name that is being fixed > conflicts with both the implementation > <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L111-L112> > and > the table spec > <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>. > I think that's fairly clearly a bug or typo in the spec that is being > updated. (The generated Python code is an example) > > Raising awareness is definitely something we should do. It's also clear > that we want to make this change -- I don't think there are arguments > against it -- and I'm saying we may not want or need a vote to confirm. > Those can be separate things, like sending a [ANNOUNCE] message on this > list rather than having a [VOTE] thread open for 3 days before fixing it. > > Again, I'm asking a question about how we want to handle situations like > this moving forward. As I said originally, I think it's fine to have a > vote when people think it's needed. > > On Tue, Jul 9, 2024 at 12:44 PM Jack Ye <yezhao...@gmail.com> wrote: > >> I agree with Robert here that we need to get into the habit of doing >> votes on the spec changes. >> >> There are typos that could be in sections like description, that does not >> affect the overall spec usage at all, maybe those changes do not need an >> official vote. However, this is a change of an existing field name in a >> data model, and we do not know if there is a dependency of the model >> somewhere. We are assuming that the feature is likely not used, which might >> be true in this case, but might not be true going forward. >> >> Related to this, we should think about officially releasing the spec and >> properly updating global spec versions for this purpose. One important use >> case of a spec is that there are engines that implement against the spec >> directly, instead of using any provided SDKs in Java/Python/Rust. For >> example, we implemented all the format v2 support directly in Amazon >> Redshift against the table v2 spec without using any existing Iceberg >> library. I cannot imagine how such engines could implement against the >> catalog spec if somehow a field name can just change silently, and we don't >> know when we can actually take a dependency on it. A spec release seems >> like the proper way to inform that external parties can start to take a >> dependency on it, and future changes will be backwards compatible in minor >> version updates, or will require a deprecation cycle until the next major >> version update. >> >> -Jack >> >> >> >> >> >> On Tue, Jul 9, 2024 at 11:25 AM Robert Stupp <sn...@snazy.de> wrote: >> >>> I think it is needed, because of the reasons emphasized either by Daniel >>> or you yesterday in the call: people have to be aware of changes in >>> specifications. >>> >>> Maybe I'm alone and maybe it's perceived "pedantic", but I think you >>> missed the point: the rule mentioned in yesterday's call (not sure where >>> it's written tho) is to communicate every spec change. If Iceberg >>> committers and PMCs don't follow this rule, you cannot expect others to do >>> so. >>> >>> This PR changes the schema in the REST spec. Clients that have been >>> implemented relying on the REST spec (I think pyiceberg generates code from >>> it) are impacted. Other implementers might have just relied on the >>> _specification_. >>> >>> >>> On 09.07.24 17:28, Ryan Blue wrote: >>> >>> I think it's fine to have a vote for this if anyone thinks that it is >>> needed. But since this is just fixing the part of the REST spec that >>> duplicates >>> the table spec and correcting a typo >>> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>, >>> it seems like more of a correction than a substantive change. >>> >>> On Tue, Jul 9, 2024 at 3:14 AM Robert Stupp <sn...@snazy.de> wrote: >>> >>>> Hi Eduard, >>>> >>>> this needs to be a formal code-change vote, because it's a change to a >>>> spec (this was emphasized during yesterday's call). Can you add some >>>> background about the change? >>>> >>>> Robert >>>> >>>> >>>> On 09.07.24 11:26, Eduard Tudenhöfner wrote: >>>> >>>> Hey everyone, >>>> >>>> I've opened #10662 <https://github.com/apache/iceberg/pull/10662> to >>>> fix property names for statistics / partition statistics in the REST spec. >>>> I can start a separate VOTE thread if there is agreement around the >>>> proposed Spec change. >>>> >>>> Thanks >>>> Eduard >>>> >>>> -- >>>> Robert Stupp >>>> @snazy >>>> >>>> >>> >>> -- >>> Ryan Blue >>> Databricks >>> >>> -- >>> Robert Stupp >>> @snazy >>> >>> > > -- > Ryan Blue > Databricks >