I just want to point out that the Iceberg Improvement Proposal <https://iceberg.apache.org/contribute/#apache-iceberg-improvement-proposals> does use the language ". . . or significant additions/changes to any of the existing Iceberg implementations", which is consistent with other similar Apache improvement proposals, so that every minor clarification/fix/adjustment doesn't require a heavy weight process.
I'm sure we can argue what "significant" means in that context, but I'm fine if we want to hold a vote for something like this just to be consistent. We've made small clarifications to the Iceberg table spec without a vote, but I'm open to either handling these on a case by case basis or just holding a quick vote. -Dan On Tue, Jul 9, 2024 at 1:36 PM Jack Ye <yezhao...@gmail.com> wrote: > I propose two things: > > 1. We officially release the specs with major and minor versions. With > some experience of implementing against both the table and the catalog > spec, I would say that such error is not as obvious as we think when there > are hundreds of models and property fields you need to implement, and you > just work with the latest text spec at a point of time to implement it. > Having released spec versions marks that the spec text you are looking at > is a version that can be consumed, and if there are any changes or fixes in > existing parts of the spec, it will be in a compatible and predictable way. > > 2. We have an official reference implementation of the REST server. There > have been a lot of debates about it already, but I am feeling increasingly > strong about it to ensure the quality of the spec. This is another case > where the error would have been easily caught if there was validation of > end-to-end client-server interaction using the generated models. We had a > similar issue previously with error models that the definition was not > matching the implementation for a long time: > https://github.com/apache/iceberg/pull/8914. As the spec evolves to be > increasingly complex, it will be more and more challenging to spot errors > just by looking at the spec text and the generated model shape without > actually using it. > > -Jack > > > On Tue, Jul 9, 2024 at 1:10 PM Ryan Blue <b...@databricks.com.invalid> > wrote: > >> What I'm asking is whether we want to have a vote for a substantive >> change and not for bug fixes. I'm not suggesting anything beyond that, like >> there is some ability for committers and PMC members to not follow the same >> rules. >> >> While this is a change to a property name, the name that is being fixed >> conflicts with both the implementation >> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L111-L112> >> and >> the table spec >> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>. >> I think that's fairly clearly a bug or typo in the spec that is being >> updated. (The generated Python code is an example) >> >> Raising awareness is definitely something we should do. It's also clear >> that we want to make this change -- I don't think there are arguments >> against it -- and I'm saying we may not want or need a vote to confirm. >> Those can be separate things, like sending a [ANNOUNCE] message on this >> list rather than having a [VOTE] thread open for 3 days before fixing it. >> >> Again, I'm asking a question about how we want to handle situations like >> this moving forward. As I said originally, I think it's fine to have a >> vote when people think it's needed. >> >> On Tue, Jul 9, 2024 at 12:44 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> I agree with Robert here that we need to get into the habit of doing >>> votes on the spec changes. >>> >>> There are typos that could be in sections like description, that does >>> not affect the overall spec usage at all, maybe those changes do not need >>> an official vote. However, this is a change of an existing field name in a >>> data model, and we do not know if there is a dependency of the model >>> somewhere. We are assuming that the feature is likely not used, which might >>> be true in this case, but might not be true going forward. >>> >>> Related to this, we should think about officially releasing the spec and >>> properly updating global spec versions for this purpose. One important use >>> case of a spec is that there are engines that implement against the spec >>> directly, instead of using any provided SDKs in Java/Python/Rust. For >>> example, we implemented all the format v2 support directly in Amazon >>> Redshift against the table v2 spec without using any existing Iceberg >>> library. I cannot imagine how such engines could implement against the >>> catalog spec if somehow a field name can just change silently, and we don't >>> know when we can actually take a dependency on it. A spec release seems >>> like the proper way to inform that external parties can start to take a >>> dependency on it, and future changes will be backwards compatible in minor >>> version updates, or will require a deprecation cycle until the next major >>> version update. >>> >>> -Jack >>> >>> >>> >>> >>> >>> On Tue, Jul 9, 2024 at 11:25 AM Robert Stupp <sn...@snazy.de> wrote: >>> >>>> I think it is needed, because of the reasons emphasized either by >>>> Daniel or you yesterday in the call: people have to be aware of changes in >>>> specifications. >>>> >>>> Maybe I'm alone and maybe it's perceived "pedantic", but I think you >>>> missed the point: the rule mentioned in yesterday's call (not sure where >>>> it's written tho) is to communicate every spec change. If Iceberg >>>> committers and PMCs don't follow this rule, you cannot expect others to do >>>> so. >>>> >>>> This PR changes the schema in the REST spec. Clients that have been >>>> implemented relying on the REST spec (I think pyiceberg generates code from >>>> it) are impacted. Other implementers might have just relied on the >>>> _specification_. >>>> >>>> >>>> On 09.07.24 17:28, Ryan Blue wrote: >>>> >>>> I think it's fine to have a vote for this if anyone thinks that it is >>>> needed. But since this is just fixing the part of the REST spec that >>>> duplicates >>>> the table spec and correcting a typo >>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>, >>>> it seems like more of a correction than a substantive change. >>>> >>>> On Tue, Jul 9, 2024 at 3:14 AM Robert Stupp <sn...@snazy.de> wrote: >>>> >>>>> Hi Eduard, >>>>> >>>>> this needs to be a formal code-change vote, because it's a change to a >>>>> spec (this was emphasized during yesterday's call). Can you add some >>>>> background about the change? >>>>> >>>>> Robert >>>>> >>>>> >>>>> On 09.07.24 11:26, Eduard Tudenhöfner wrote: >>>>> >>>>> Hey everyone, >>>>> >>>>> I've opened #10662 <https://github.com/apache/iceberg/pull/10662> to >>>>> fix property names for statistics / partition statistics in the REST spec. >>>>> I can start a separate VOTE thread if there is agreement around the >>>>> proposed Spec change. >>>>> >>>>> Thanks >>>>> Eduard >>>>> >>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Databricks >>>> >>>> -- >>>> Robert Stupp >>>> @snazy >>>> >>>> >> >> -- >> Ryan Blue >> Databricks >> >