I agree with Robert here that we need to get into the habit of doing votes
on the spec changes.

There are typos that could be in sections like description, that does not
affect the overall spec usage at all, maybe those changes do not need an
official vote. However, this is a change of an existing field name in a
data model, and we do not know if there is a dependency of the model
somewhere. We are assuming that the feature is likely not used, which might
be true in this case, but might not be true going forward.

Related to this, we should think about officially releasing the spec and
properly updating global spec versions for this purpose. One important use
case of a spec is that there are engines that implement against the spec
directly, instead of using any provided SDKs in Java/Python/Rust. For
example, we implemented all the format v2 support directly in Amazon
Redshift against the table v2 spec without using any existing Iceberg
library. I cannot imagine how such engines could implement against the
catalog spec if somehow a field name can just change silently, and we don't
know when we can actually take a dependency on it. A spec release seems
like the proper way to inform that external parties can start to take a
dependency on it, and future changes will be backwards compatible in minor
version updates, or will require a deprecation cycle until the next major
version update.

-Jack





On Tue, Jul 9, 2024 at 11:25 AM Robert Stupp <sn...@snazy.de> wrote:

> I think it is needed, because of the reasons emphasized either by Daniel
> or you yesterday in the call: people have to be aware of changes in
> specifications.
>
> Maybe I'm alone and maybe it's perceived "pedantic", but I think you
> missed the point: the rule mentioned in yesterday's call (not sure where
> it's written tho) is to communicate every spec change. If Iceberg
> committers and PMCs don't follow this rule, you cannot expect others to do
> so.
>
> This PR changes the schema in the REST spec. Clients that have been
> implemented relying on the REST spec (I think pyiceberg generates code from
> it) are impacted. Other implementers might have just relied on the
> _specification_.
>
>
> On 09.07.24 17:28, Ryan Blue wrote:
>
> I think it's fine to have a vote for this if anyone thinks that it is
> needed. But since this is just fixing the part of the REST spec that 
> duplicates
> the table spec and correcting a typo
> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>,
> it seems like more of a correction than a substantive change.
>
> On Tue, Jul 9, 2024 at 3:14 AM Robert Stupp <sn...@snazy.de> wrote:
>
>> Hi Eduard,
>>
>> this needs to be a formal code-change vote, because it's a change to a
>> spec (this was emphasized during yesterday's call). Can you add some
>> background about the change?
>>
>> Robert
>>
>>
>> On 09.07.24 11:26, Eduard Tudenhöfner wrote:
>>
>> Hey everyone,
>>
>> I've opened #10662 <https://github.com/apache/iceberg/pull/10662> to fix
>> property names for statistics / partition statistics in the REST spec. I
>> can start a separate VOTE thread if there is agreement around the proposed
>> Spec change.
>>
>> Thanks
>> Eduard
>>
>> --
>> Robert Stupp
>> @snazy
>>
>>
>
> --
> Ryan Blue
> Databricks
>
> --
> Robert Stupp
> @snazy
>
>

Reply via email to