FWIW, iceberg-cpp also produces a date type for the day transform so
we are happy with the consensus here.

On Sat, May 23, 2026 at 12:14 AM Kevin Liu <[email protected]> wrote:
>
> Good to know about the Avro spec behavior, thanks Ryan.
>
> And thank you Andrei for driving the spec clarification. I'll comment on the 
> PR. I don't think we need a vote since this is a clarification and not a 
> change.
>
> On Thu, May 21, 2026 at 1:42 PM Andrei Tserakhau via dev 
> <[email protected]> wrote:
>>
>> Thanks Kevin, Fokko, and Ryan, looks like we've converged.
>>
>> Summary of where this lands:
>>
>>   - Result type for day becomes date, matching Java/PyIceberg/Rust's
>>   default behavior and the Avro types table in Appendix A.
>>   - Reader tolerance for historical plain-int manifests is inherited
>>   from the Avro spec itself (thanks Ryan for surfacing that saves
>>   us an Iceberg-side MUST clause).
>>   - A short note is added under the partition transforms table
>>   capturing the historical context, so this doesn't get re-litigated
>>   the next time someone reads the spec without the back-story.
>>
>> PR is updated accordingly: https://github.com/apache/iceberg/pull/16446
>>
>> Fokko, Kevin, Ryan -- would appreciate a look when you have a moment.
>> Happy to iterate further on the note wording if anything reads off.
>>
>> For iceberg-go, I'll follow up with the writer + reader alignment
>> (PR #915 in iceberg-go is already in flight) once the spec change
>> lands.
>>
>> Best,
>> Andrei
>>
>> On Thu, May 21, 2026 at 9:41 PM Ryan Blue <[email protected]> wrote:
>>>
>>> Ugh, I think I sent from the wrong email address and my reply didn't go 
>>> through.
>>>
>>> Other people have covered the same things here, except for one point: the 
>>> Avro spec states that readers that don't support an annotation are required 
>>> to ignore it. So the behavior to read either date or int correctly is 
>>> inherited from the Avro spec.
>>>
>>> Ryan
>>>
>>> On Thu, May 21, 2026 at 10:17 AM Kevin Liu <[email protected]> wrote:
>>>>
>>>> I wasn’t aware of the previous back-and-forth changes to this line in the 
>>>> spec. Thanks for the extra context!
>>>>
>>>> A couple of points I want to align on:
>>>> 1. All implementations except Go, including Java, Python, and Rust, write 
>>>> the day transform result as an Iceberg date type. That maps to the Avro 
>>>> date type and is serialized as { "type": "int", "logicalType": "date" }.
>>>> 2. The Go implementation writes the day transform result an Iceberg int 
>>>> type. That maps to the Avro int type and is serialized as { "type": "int" 
>>>> }.
>>>> 3. Java, Python, and Rust can read Avro manifest partition values as 
>>>> either an Avro int type or an Avro date type.
>>>> 4. The Go implementation can currently read Avro manifest partition values 
>>>> only as an Avro int type. This is the original issue that sparked this 
>>>> conversation.
>>>>
>>>> Since the spec has gone back and forth between writing this as an Iceberg 
>>>> int and an Iceberg date, I think readers must accept both. We can include 
>>>> that as an implementation note.
>>>>
>>>> I support changing the spec back to date so it matches the default 
>>>> behavior for day partition values in our implementations. Go is also 
>>>> making the change to write date instead of int.
>>>> The other approach, updating all implementations to match the current 
>>>> spec, would be a lot of work for little value.
>>>>
>>>> Hopefully this is the last time we make this change to the spec :)
>>>> Would love to hear from others.
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Wed, May 20, 2026 at 10:39 AM Fokko Driesprong <[email protected]> wrote:
>>>>>
>>>>> > It wouldn't be the first time we've retroactively updated the spec when 
>>>>> > finding inconsistencies with the current implementations :P
>>>>>
>>>>> I think generally we try to avoid this, but in this case it was changed 
>>>>> to few times :P Maybe we should revert the spec change:
>>>>>
>>>>> https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a
>>>>>
>>>>> Curious to hear what other think.
>>>>>
>>>>> Kind regards,
>>>>> Fokko
>>>>>
>>>>>
>>>>> On 2026/05/20 17:24:22 Matt Topol wrote:
>>>>> > It wouldn't be the first time we've retroactively updated the spec
>>>>> > when finding inconsistencies with the current implementations :P
>>>>> >
>>>>> > Particularly, in this case even the "reference implementation" (i.e.
>>>>> > Java) is technically not spec-compliant since the spec says that it
>>>>> > should be an "int", not an Avro "date" type. If all the
>>>>> > implementations currently write a "date" type, then it's silly to have
>>>>> > to say that every implementation is violating the spec.
>>>>> >
>>>>> > If we want the spec to say it should be an int, but tolerate reading
>>>>> > an Avro "date" type, that's fine. But that would mean we should update
>>>>> > Java, Rust, and PyIceberg to all write plain "int" and no longer write
>>>>> > the "date" type, again: it would be silly to say that the reference
>>>>> > implementation and 2 other implementations are not following the spec.
>>>>> > :P
>>>>> >
>>>>> > I agree that it would be a big change for little value to update the
>>>>> > implementations, so my opinion is that the spec should be updated to
>>>>> > either say that "either" is allowed to be written, or that "date"
>>>>> > should be written but "int" should be allowed to be read.
>>>>> >
>>>>> > --Matt
>>>>> >
>>>>> > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <[email protected]> 
>>>>> > wrote:
>>>>> > >
>>>>> > > Thanks for the quick PR Andrei.
>>>>> > >
>>>>> > > The problem is that the note conflicts with the Avro/Iceberg types 
>>>>> > > table: https://iceberg.apache.org/spec/#avro
>>>>> > >
>>>>> > > I don't think we want to update the implementations as I agree that 
>>>>> > > it would be a big change for little value. At the same time, I don't 
>>>>> > > think we can retroactively update the spec. Maybe an implementation 
>>>>> > > note would be a better solution to halt the tradition?
>>>>> > >
>>>>> > > Kind regards,
>>>>> > > Fokko
>>>>> > >
>>>>> > >
>>>>> > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote:
>>>>> > > > Thanks Fokko, the historical context!
>>>>> > > >
>>>>> > > > Quick check that we're aligned, since I think we may be closer than
>>>>> > > > it reads:
>>>>> > > >
>>>>> > > > My PR leaves the result type table as `int` -- no change to the
>>>>> > > > transform table, no impact on hour/month/etc., no change to the
>>>>> > > > type model.
>>>>> > > >
>>>>> > > > What the PR clarifies is the Avro encoding used when serializing a
>>>>> > > > `day` partition field into a manifest. Empirically today, Java,
>>>>> > > > PyIceberg, and Rust all write `{ "type": "int", "logicalType": 
>>>>> > > > "date" }`
>>>>> > > > there (TypeToSchema in Java, DayTransform.result_type in PyIceberg,
>>>>> > > > Transform::Day.result_type in Rust all produce a Date). Only
>>>>> > > > iceberg-go produces plain Avro `int`. The PR codifies the de facto
>>>>> > > > writer behavior as SHOULD and makes reader tolerance MUST.
>>>>> > > >
>>>>> > > > If your "stick with int" also covers the Avro annotation, then we'd
>>>>> > > > effectively be reverting three writers and orphaning every existing
>>>>> > > > manifest, which I don't think decent path, it's quite a big change
>>>>> > > > for small benefits.
>>>>> > > >
>>>>> > > > Either way, super happy to adjust the spec adjustment, the goal is 
>>>>> > > > to
>>>>> > > > stop this tradition of re-litigating issue every year, by misreading
>>>>> > > > this part of the spec.
>>>>> > > >
>>>>> > > > Best,
>>>>> > > > Andrei
>>>>> > > >
>>>>> > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong <[email protected]> 
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > > > Thanks for briging this up Kevin, a gift that keeps on giving :)
>>>>> > > > > https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427
>>>>> > > > >
>>>>> > > > > 1. I think we should stick with the int type as defined in the 
>>>>> > > > > spec.
>>>>> > > > > 2. It feels to me that some readers are more permissive here than 
>>>>> > > > > others.
>>>>> > > > > I believe some allow reading date as an int without throwing. 
>>>>> > > > > Practically,
>>>>> > > > > readers should read both.
>>>>> > > > > 3. Unfortunally, I think this is water under the bridge. As shown 
>>>>> > > > > above in
>>>>> > > > > the GitHub Issue, we went back and forth, so I don't see a lot of 
>>>>> > > > > value in
>>>>> > > > > switching this to date. All OSS implementations handle this as an 
>>>>> > > > > int
>>>>> > > > > internally, and this also aligns with hour/month/etc.
>>>>> > > > >
>>>>> > > > > Hope this historical context helps.
>>>>> > > > >
>>>>> > > > > Kind regards,
>>>>> > > > > Fokko
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote:
>>>>> > > > > > Here is a fast follow with a PR:
>>>>> > > > > > https://github.com/apache/iceberg/pull/16446
>>>>> > > > > >
>>>>> > > > > > Best,
>>>>> > > > > > Andrei
>>>>> > > > > >
>>>>> > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau <
>>>>> > > > > > [email protected]> wrote:
>>>>> > > > > >
>>>>> > > > > > > Thanks for raising this, Kevin.
>>>>> > > > > > >
>>>>> > > > > > > Speaking as an iceberg-go maintainer, even though Go is the
>>>>> > > > > > > implementation that has to move, I'd vote:
>>>>> > > > > > >
>>>>> > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": "date" 
>>>>> > > > > > > }.
>>>>> > > > > > > 2. Readers MUST accept both plain `int` and `int` annotated 
>>>>> > > > > > > with
>>>>> > > > > > >    `logicalType: date`.
>>>>> > > > > > > 3. Keep the transform result type table as-is (`int` as the 
>>>>> > > > > > > logical
>>>>> > > > > > >    Iceberg type). Don't change it to `date`. Add a separate, 
>>>>> > > > > > > normative
>>>>> > > > > > >    manifest-encoding clause so projection and 
>>>>> > > > > > > expression-evaluation
>>>>> > > > > > >    semantics that depend on the type model stay untouched.
>>>>> > > > > > >
>>>>> > > > > > > Reasoning: when Java, PyIceberg, and Rust all write logical 
>>>>> > > > > > > `date`,
>>>>> > > > > > > that's the de facto wire format. Forcing them to switch to 
>>>>> > > > > > > plain `int`
>>>>> > > > > > > to match a literal reading of the transform table would churn 
>>>>> > > > > > > three
>>>>> > > > > > > implementations and leave every existing manifest 
>>>>> > > > > > > "non-conforming"
>>>>> > > > > > > forever. Aligning Go with the dominant writer convention 
>>>>> > > > > > > costs one
>>>>> > > > > > > implementation change (PR #915 already proposes it) and zero 
>>>>> > > > > > > historical
>>>>> > > > > > > churn.
>>>>> > > > > > >
>>>>> > > > > > > The underlying ambiguity is that "result type" (logical 
>>>>> > > > > > > Iceberg type)
>>>>> > > > > > > and "Avro manifest encoding" (wire format) were conflated. 
>>>>> > > > > > > Separating
>>>>> > > > > > > them in spec text removes the ambiguity without changing the 
>>>>> > > > > > > type
>>>>> > > > > > > system.
>>>>> > > > > > >
>>>>> > > > > > > Happy to drive the spec PR and then iceberg-go writer + reader
>>>>> > > > > > > alignment.
>>>>> > > > > > >
>>>>> > > > > > > Best,
>>>>> > > > > > > Andrei
>>>>> > > > > > >
>>>>> > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu 
>>>>> > > > > > > <[email protected]>
>>>>> > > > > wrote:
>>>>> > > > > > >
>>>>> > > > > > >> Hi all,
>>>>> > > > > > >>
>>>>> > > > > > >> I'd like to invite the community to discuss a spec ambiguity 
>>>>> > > > > > >> in Apache
>>>>> > > > > > >> Iceberg that has caused some confusion across 
>>>>> > > > > > >> implementations. We've
>>>>> > > > > seen
>>>>> > > > > > >> this come up in Python, Rust, and now Go.
>>>>> > > > > > >>
>>>>> > > > > > >> The issue: the spec documents the `day` partition 
>>>>> > > > > > >> transform's result
>>>>> > > > > type
>>>>> > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all write 
>>>>> > > > > > >> manifest
>>>>> > > > > partition
>>>>> > > > > > >> fields using Avro's logical `date` type. Go currently writes 
>>>>> > > > > > >> plain
>>>>> > > > > `int`,
>>>>> > > > > > >> which is the strict reading of the spec. Since both forms 
>>>>> > > > > > >> have the
>>>>> > > > > same
>>>>> > > > > > >> physical representation, the difference is only the Avro 
>>>>> > > > > > >> schema
>>>>> > > > > annotation
>>>>> > > > > > >> -- but it's worth clarifying the spec so all implementations 
>>>>> > > > > > >> are
>>>>> > > > > aligned.
>>>>> > > > > > >>
>>>>> > > > > > >> The full analysis, including a breakdown of each 
>>>>> > > > > > >> implementation's
>>>>> > > > > > >> writer/reader behavior and proposed resolution options, is 
>>>>> > > > > > >> here:
>>>>> > > > > > >> https://github.com/apache/iceberg/issues/16414
>>>>> > > > > > >>
>>>>> > > > > > >> At a high level, the questions for the community are:
>>>>> > > > > > >> 1. What should implementations write: Avro `int` (plain 
>>>>> > > > > > >> integer) or
>>>>> > > > > Avro
>>>>> > > > > > >> `date` (integer with a date logical type)?
>>>>> > > > > > >> 2. Should implementations be required to read both forms, or 
>>>>> > > > > > >> just
>>>>> > > > > > >> encouraged to?
>>>>> > > > > > >> 3. Should the spec's transform result type table be updated 
>>>>> > > > > > >> from
>>>>> > > > > `int` to
>>>>> > > > > > >> `date`?
>>>>> > > > > > >>
>>>>> > > > > > >> I'd love to hear your thoughts. Thanks!
>>>>> > > > > > >>
>>>>> > > > > > >> Best,
>>>>> > > > > > >> Kevin Liu
>>>>> > > > > > >>
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> >

Reply via email to