I just want to point out that the Iceberg Improvement Proposal
<https://iceberg.apache.org/contribute/#apache-iceberg-improvement-proposals>
does use the language ". . . or significant additions/changes to any of the
existing Iceberg implementations", which is consistent with other similar
Apache improvement proposals, so that every minor
clarification/fix/adjustment doesn't require a heavy weight process.

I'm sure we can argue what "significant" means in that context, but I'm
fine if we want to hold a vote for something like this just to be
consistent.  We've made small clarifications to the Iceberg table spec
without a vote, but I'm open to either handling these on a case by case
basis or just holding a quick vote.

-Dan



On Tue, Jul 9, 2024 at 1:36 PM Jack Ye <yezhao...@gmail.com> wrote:

> I propose two things:
>
> 1. We officially release the specs with major and minor versions. With
> some experience of implementing against both the table and the catalog
> spec, I would say that such error is not as obvious as we think when there
> are hundreds of models and property fields you need to implement, and you
> just work with the latest text spec at a point of time to implement it.
> Having released spec versions marks that the spec text you are looking at
> is a version that can be consumed, and if there are any changes or fixes in
> existing parts of the spec, it will be in a compatible and predictable way.
>
> 2. We have an official reference implementation of the REST server. There
> have been a lot of debates about it already, but I am feeling increasingly
> strong about it to ensure the quality of the spec. This is another case
> where the error would have been easily caught if there was validation of
> end-to-end client-server interaction using the generated models. We had a
> similar issue previously with error models that the definition was not
> matching the implementation for a long time:
> https://github.com/apache/iceberg/pull/8914. As the spec evolves to be
> increasingly complex, it will be more and more challenging to spot errors
> just by looking at the spec text and the generated model shape without
> actually using it.
>
> -Jack
>
>
> On Tue, Jul 9, 2024 at 1:10 PM Ryan Blue <b...@databricks.com.invalid>
> wrote:
>
>> What I'm asking is whether we want to have a vote for a substantive
>> change and not for bug fixes. I'm not suggesting anything beyond that, like
>> there is some ability for committers and PMC members to not follow the same
>> rules.
>>
>> While this is a change to a property name, the name that is being fixed
>> conflicts with both the implementation
>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L111-L112>
>>  and
>> the table spec
>> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>.
>> I think that's fairly clearly a bug or typo in the spec that is being
>> updated. (The generated Python code is an example)
>>
>> Raising awareness is definitely something we should do. It's also clear
>> that we want to make this change -- I don't think there are arguments
>> against it -- and I'm saying we may not want or need a vote to confirm.
>> Those can be separate things, like sending a [ANNOUNCE] message on this
>> list rather than having a [VOTE] thread open for 3 days before fixing it.
>>
>> Again, I'm asking a question about how we want to handle situations like
>> this moving forward. As I said originally, I think it's fine to have a
>> vote when people think it's needed.
>>
>> On Tue, Jul 9, 2024 at 12:44 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> I agree with Robert here that we need to get into the habit of doing
>>> votes on the spec changes.
>>>
>>> There are typos that could be in sections like description, that does
>>> not affect the overall spec usage at all, maybe those changes do not need
>>> an official vote. However, this is a change of an existing field name in a
>>> data model, and we do not know if there is a dependency of the model
>>> somewhere. We are assuming that the feature is likely not used, which might
>>> be true in this case, but might not be true going forward.
>>>
>>> Related to this, we should think about officially releasing the spec and
>>> properly updating global spec versions for this purpose. One important use
>>> case of a spec is that there are engines that implement against the spec
>>> directly, instead of using any provided SDKs in Java/Python/Rust. For
>>> example, we implemented all the format v2 support directly in Amazon
>>> Redshift against the table v2 spec without using any existing Iceberg
>>> library. I cannot imagine how such engines could implement against the
>>> catalog spec if somehow a field name can just change silently, and we don't
>>> know when we can actually take a dependency on it. A spec release seems
>>> like the proper way to inform that external parties can start to take a
>>> dependency on it, and future changes will be backwards compatible in minor
>>> version updates, or will require a deprecation cycle until the next major
>>> version update.
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 9, 2024 at 11:25 AM Robert Stupp <sn...@snazy.de> wrote:
>>>
>>>> I think it is needed, because of the reasons emphasized either by
>>>> Daniel or you yesterday in the call: people have to be aware of changes in
>>>> specifications.
>>>>
>>>> Maybe I'm alone and maybe it's perceived "pedantic", but I think you
>>>> missed the point: the rule mentioned in yesterday's call (not sure where
>>>> it's written tho) is to communicate every spec change. If Iceberg
>>>> committers and PMCs don't follow this rule, you cannot expect others to do
>>>> so.
>>>>
>>>> This PR changes the schema in the REST spec. Clients that have been
>>>> implemented relying on the REST spec (I think pyiceberg generates code from
>>>> it) are impacted. Other implementers might have just relied on the
>>>> _specification_.
>>>>
>>>>
>>>> On 09.07.24 17:28, Ryan Blue wrote:
>>>>
>>>> I think it's fine to have a vote for this if anyone thinks that it is
>>>> needed. But since this is just fixing the part of the REST spec that 
>>>> duplicates
>>>> the table spec and correcting a typo
>>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#table-metadata-fields>,
>>>> it seems like more of a correction than a substantive change.
>>>>
>>>> On Tue, Jul 9, 2024 at 3:14 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>
>>>>> Hi Eduard,
>>>>>
>>>>> this needs to be a formal code-change vote, because it's a change to a
>>>>> spec (this was emphasized during yesterday's call). Can you add some
>>>>> background about the change?
>>>>>
>>>>> Robert
>>>>>
>>>>>
>>>>> On 09.07.24 11:26, Eduard Tudenhöfner wrote:
>>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I've opened #10662 <https://github.com/apache/iceberg/pull/10662> to
>>>>> fix property names for statistics / partition statistics in the REST spec.
>>>>> I can start a separate VOTE thread if there is agreement around the
>>>>> proposed Spec change.
>>>>>
>>>>> Thanks
>>>>> Eduard
>>>>>
>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Databricks
>>>>
>>>> --
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Databricks
>>
>

Reply via email to