It sounds like most of the opinions so far are waiting for the scope of
work to finish before finalizing the specification.

An alternative view: Would it make sense to start releasing the table
specification on a regular cadence (e.g. quarterly, every 6 months or
yearly)?  I think the problem with waiting for features to get in is that
priorities change and things take longer than expected, thus leaving the
actual finalization of the specification in limbo and probably adds to
project management overhead.   If the specification is released regularly
then it means features can always be included in the next release without
too much delay hopefully.  The main downside I can think of in this
approach is having to have more branches in code to handle different
versions.

One corollary to this approach is spec changes shouldn't be merged before
their implementations are ready.

  - At least one complete reference implementation should exist.


For more complicated features I think at some point soon it might be worth
considering two implementations (or at least 1 full implementation and 1
read only implementation) to make sure there aren't compatibility
issues/misunderstandings in the specification (e.g. I think Variant and
Geography fall into this category).

Cheers,
Micah

On Wed, Jul 31, 2024 at 12:47 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think this all sounds good, the real question is whether or not we have
> someone to actively work on the proposals. I think for things like Default
> Values and Geo Types we have folks actively working on them so it's not a
> big deal.
>
> On Wed, Jul 31, 2024 at 2:09 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Sorry I missed the sync this morning (sick), I'd like to push for geo
>> too.
>>
>> I think on this front as per the last sync, Ryan recommended to wait for
>> Parquet support to land, to avoid having two versions on Iceberg side
>> (Iceberg-native vs Parquet-native).  Parquet support is being actively
>> worked on iiuc: https://github.com/apache/parquet-format/pull/240 .  But
>> it would bind V3 to the parquet-format release timeline, unless we start
>> with iceberg-native support first and move later (as we originally
>> proposed).
>>
>> Thanks,
>> Szehon
>>
>> On Wed, Jul 31, 2024 at 10:58 AM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Another feature that was planned for V3 is support for default values.
>>> Spec doc update was already merged a while ago [1]. Implementation is
>>> ongoing in this PR [2].
>>>
>>> [1] https://iceberg.apache.org/spec/#default-values
>>> [2] https://github.com/apache/iceberg/pull/9502
>>>
>>> Thanks,
>>> Walaa.
>>>
>>> On Wed, Jul 31, 2024 at 10:52 AM Russell Spitzer
>>> <russell.spit...@gmail.com> wrote:
>>> >
>>> > Thanks for bringing this up, I would say that from my perspective I
>>> have time to really push through hopefully two things
>>> >
>>> > Variant Type and
>>> > Row Lineage (which I will have a proposal for on the mailing list next
>>> week)
>>> >
>>> > I'm using the Project to try to track logistics and minutia required
>>> for the new spec version but I would like to bring other work in there as
>>> well so we can get a clear picture of what is actually being actively
>>> worked on.
>>> >
>>> > On Wed, Jul 31, 2024 at 12:27 PM Jacob Marble <
>>> jacobmar...@influxdata.com> wrote:
>>> >>
>>> >> Good morning,
>>> >>
>>> >> To continue the community sync today when format version 3 was
>>> discussed.
>>> >>
>>> >> Questions answered by consensus:
>>> >> - Format version releases should _not_ be tied to Iceberg version
>>> releases.
>>> >> - Several planned features will require format version releases; the
>>> process shouldn't be onerous.
>>> >>
>>> >> Unanswered questions:
>>> >> - What will be included in format version 3?
>>> >>   - What is a reasonable target date?
>>> >>   - How to track progress? Today, there are two public lists:
>>> >>     - GH milestone: https://github.com/apache/iceberg/milestone/42
>>> >>     - GH project: https://github.com/orgs/apache/projects/377
>>> >> - What is required of a feature in order to be included in any
>>> adopted format version?
>>> >>   - At least one complete reference implementation should exist.
>>> >>     - Java is the reference implementation by convention; that's OK,
>>> but not perfect. Should Java be the reference implementation by mandate?
>>> >>
>>> >> Have I missed anything?
>>> >>
>>> >> --
>>> >> Jacob Marble
>>>
>>

Reply via email to