What I'm asking is whether we want to have a vote for a substantive change
and not for bug fixes. I'm not suggesting anything beyond that, like there
is some ability for committers and PMC members to not follow the same rules.

While this is a change to a property name, the name that is being fixed
conflicts with both the implementation
<https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L111-L112>
and
the table spec
<https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>.
I think that's fairly clearly a bug or typo in the spec that is being
updated. (The generated Python code is an example)

Raising awareness is definitely something we should do. It's also clear
that we want to make this change -- I don't think there are arguments
against it -- and I'm saying we may not want or need a vote to confirm.
Those can be separate things, like sending a [ANNOUNCE] message on this
list rather than having a [VOTE] thread open for 3 days before fixing it.

Again, I'm asking a question about how we want to handle situations like
this moving forward. As I said originally, I think it's fine to have a
vote when people think it's needed.

On Tue, Jul 9, 2024 at 12:44 PM Jack Ye <yezhao...@gmail.com> wrote:

> I agree with Robert here that we need to get into the habit of doing votes
> on the spec changes.
>
> There are typos that could be in sections like description, that does not
> affect the overall spec usage at all, maybe those changes do not need an
> official vote. However, this is a change of an existing field name in a
> data model, and we do not know if there is a dependency of the model
> somewhere. We are assuming that the feature is likely not used, which might
> be true in this case, but might not be true going forward.
>
> Related to this, we should think about officially releasing the spec and
> properly updating global spec versions for this purpose. One important use
> case of a spec is that there are engines that implement against the spec
> directly, instead of using any provided SDKs in Java/Python/Rust. For
> example, we implemented all the format v2 support directly in Amazon
> Redshift against the table v2 spec without using any existing Iceberg
> library. I cannot imagine how such engines could implement against the
> catalog spec if somehow a field name can just change silently, and we don't
> know when we can actually take a dependency on it. A spec release seems
> like the proper way to inform that external parties can start to take a
> dependency on it, and future changes will be backwards compatible in minor
> version updates, or will require a deprecation cycle until the next major
> version update.
>
> -Jack
>
>
>
>
>
> On Tue, Jul 9, 2024 at 11:25 AM Robert Stupp <sn...@snazy.de> wrote:
>
>> I think it is needed, because of the reasons emphasized either by Daniel
>> or you yesterday in the call: people have to be aware of changes in
>> specifications.
>>
>> Maybe I'm alone and maybe it's perceived "pedantic", but I think you
>> missed the point: the rule mentioned in yesterday's call (not sure where
>> it's written tho) is to communicate every spec change. If Iceberg
>> committers and PMCs don't follow this rule, you cannot expect others to do
>> so.
>>
>> This PR changes the schema in the REST spec. Clients that have been
>> implemented relying on the REST spec (I think pyiceberg generates code from
>> it) are impacted. Other implementers might have just relied on the
>> _specification_.
>>
>>
>> On 09.07.24 17:28, Ryan Blue wrote:
>>
>> I think it's fine to have a vote for this if anyone thinks that it is
>> needed. But since this is just fixing the part of the REST spec that 
>> duplicates
>> the table spec and correcting a typo
>> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>,
>> it seems like more of a correction than a substantive change.
>>
>> On Tue, Jul 9, 2024 at 3:14 AM Robert Stupp <sn...@snazy.de> wrote:
>>
>>> Hi Eduard,
>>>
>>> this needs to be a formal code-change vote, because it's a change to a
>>> spec (this was emphasized during yesterday's call). Can you add some
>>> background about the change?
>>>
>>> Robert
>>>
>>>
>>> On 09.07.24 11:26, Eduard Tudenhöfner wrote:
>>>
>>> Hey everyone,
>>>
>>> I've opened #10662 <https://github.com/apache/iceberg/pull/10662> to
>>> fix property names for statistics / partition statistics in the REST spec.
>>> I can start a separate VOTE thread if there is agreement around the
>>> proposed Spec change.
>>>
>>> Thanks
>>> Eduard
>>>
>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>>
>>
>> --
>> Ryan Blue
>> Databricks
>>
>> --
>> Robert Stupp
>> @snazy
>>
>>

-- 
Ryan Blue
Databricks

Reply via email to