> An alternative view: Would it make sense to start releasing the table
specification on a regular cadence (e.g. quarterly, every 6 months or
yearly)?

I have been a big advocate for releasing all the Iceberg specs regularly,
and just follow a normal product release cycle with major and minor
releases. I touched a bit of the reasoning in the thread for fixing stats
fields in REST spec [1]. This helps a lot with engines that do not use any
Iceberg open source library and just look at a spec and implement it. With
a regular release, they can have a stable version to look into, rather than
a spec that is changing all the time within the same version.

It is important to note that minor spec versions will not be leveraged in
implementations like how we have logics right now for switching behaviors
depending on major versions. It is purely for the purpose of making more
incremental progress on the spec, and providing stable spec versions for
other reference implementations. Otherwise, the branches in the codebase to
handle different versions easily get out of control.

I think Fokko brought up a point that "this will introduce a process that
will slow the evolution down", which is true because you need to spend
additional effort and release it. And without a reference implementation,
it is hard to say if the spec is mature enough to be released, which again
makes it potentially tied to the release cycle of at least the Java library.

Curious what people think.

Best,
Jack Ye

[1] https://lists.apache.org/thread/v6x772v9sgo0xhpwmh4br756zhbgomtf

On Wed, Jul 31, 2024 at 10:19 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> It sounds like most of the opinions so far are waiting for the scope of
> work to finish before finalizing the specification.
>
> An alternative view: Would it make sense to start releasing the table
> specification on a regular cadence (e.g. quarterly, every 6 months or
> yearly)?  I think the problem with waiting for features to get in is that
> priorities change and things take longer than expected, thus leaving the
> actual finalization of the specification in limbo and probably adds to
> project management overhead.   If the specification is released regularly
> then it means features can always be included in the next release without
> too much delay hopefully.  The main downside I can think of in this
> approach is having to have more branches in code to handle different
> versions.
>
> One corollary to this approach is spec changes shouldn't be merged before
> their implementations are ready.
>
>   - At least one complete reference implementation should exist.
>
>
> For more complicated features I think at some point soon it might be worth
> considering two implementations (or at least 1 full implementation and 1
> read only implementation) to make sure there aren't compatibility
> issues/misunderstandings in the specification (e.g. I think Variant and
> Geography fall into this category).
>
> Cheers,
> Micah
>
> On Wed, Jul 31, 2024 at 12:47 PM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> I think this all sounds good, the real question is whether or not we have
>> someone to actively work on the proposals. I think for things like Default
>> Values and Geo Types we have folks actively working on them so it's not a
>> big deal.
>>
>> On Wed, Jul 31, 2024 at 2:09 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Sorry I missed the sync this morning (sick), I'd like to push for geo
>>> too.
>>>
>>> I think on this front as per the last sync, Ryan recommended to wait for
>>> Parquet support to land, to avoid having two versions on Iceberg side
>>> (Iceberg-native vs Parquet-native).  Parquet support is being actively
>>> worked on iiuc: https://github.com/apache/parquet-format/pull/240 .
>>> But it would bind V3 to the parquet-format release timeline, unless we
>>> start with iceberg-native support first and move later (as we originally
>>> proposed).
>>>
>>> Thanks,
>>> Szehon
>>>
>>> On Wed, Jul 31, 2024 at 10:58 AM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Another feature that was planned for V3 is support for default values.
>>>> Spec doc update was already merged a while ago [1]. Implementation is
>>>> ongoing in this PR [2].
>>>>
>>>> [1] https://iceberg.apache.org/spec/#default-values
>>>> [2] https://github.com/apache/iceberg/pull/9502
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>> On Wed, Jul 31, 2024 at 10:52 AM Russell Spitzer
>>>> <russell.spit...@gmail.com> wrote:
>>>> >
>>>> > Thanks for bringing this up, I would say that from my perspective I
>>>> have time to really push through hopefully two things
>>>> >
>>>> > Variant Type and
>>>> > Row Lineage (which I will have a proposal for on the mailing list
>>>> next week)
>>>> >
>>>> > I'm using the Project to try to track logistics and minutia required
>>>> for the new spec version but I would like to bring other work in there as
>>>> well so we can get a clear picture of what is actually being actively
>>>> worked on.
>>>> >
>>>> > On Wed, Jul 31, 2024 at 12:27 PM Jacob Marble <
>>>> jacobmar...@influxdata.com> wrote:
>>>> >>
>>>> >> Good morning,
>>>> >>
>>>> >> To continue the community sync today when format version 3 was
>>>> discussed.
>>>> >>
>>>> >> Questions answered by consensus:
>>>> >> - Format version releases should _not_ be tied to Iceberg version
>>>> releases.
>>>> >> - Several planned features will require format version releases; the
>>>> process shouldn't be onerous.
>>>> >>
>>>> >> Unanswered questions:
>>>> >> - What will be included in format version 3?
>>>> >>   - What is a reasonable target date?
>>>> >>   - How to track progress? Today, there are two public lists:
>>>> >>     - GH milestone: https://github.com/apache/iceberg/milestone/42
>>>> >>     - GH project: https://github.com/orgs/apache/projects/377
>>>> >> - What is required of a feature in order to be included in any
>>>> adopted format version?
>>>> >>   - At least one complete reference implementation should exist.
>>>> >>     - Java is the reference implementation by convention; that's OK,
>>>> but not perfect. Should Java be the reference implementation by mandate?
>>>> >>
>>>> >> Have I missed anything?
>>>> >>
>>>> >> --
>>>> >> Jacob Marble
>>>>
>>>

Reply via email to