Hi,

This would not be retrofitting existing partition transforms, but just
allowing for the creation of new multi-arg transforms.  Is the concern that
some implementations are never expecting new transforms to be added?  Old
implementations would indeed not be able to read Iceberg tables created
with the new transforms (this is the case even today without allowing
multi-arg transforms).

For the change, there were discussions about this from:
1. Github:  https://github.com/apache/iceberg/issues/8258 (author made an
end-to-end reference implementation there for a sample transform)
2. Google Doc Dicussion:
https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit#heading=h.si1nr6ftu79b
3. August 2023 meetup :
https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.5jsm89ozy58
(there was a general consensus there)

As there seemed to be enough consensus at the design discussion and meetup,
we did not think to go for a vote.  Though I realize this will have missed
some in the community who did not attend.  I am still not entirely sure
this is a large enough spec change for a vote (given my understanding of
the impact), but we definitely should have sent an email to dev list to get
more eyes on the above discussions to collect further concerns.

Thanks,
Szehon


On Sat, Jan 27, 2024 at 10:06 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I think this is a good idea but I have concerns about compatibility. IMO,
> I think changing the cardinality of input columns is a large enough change
> that trying to retrofit it into V1 or V2 of the specification will cause
> pain for implementations not relying on reference implementation.  I
>
> As a secondary concern, I think it would be worthwhile for PMC to
> formalize the process around specification changes as these have broader
> implications for Iceberg adoption.  A model that I've seen work reasonably
> well in other communities is the following:
>
> 1.  Discussion of overall features on the mailing list (this can also be a
> pointer to the GitHub issue).
> 2.  2 reference implementations demonstrating the change is viable (it
> seems like PyIceberg is close to being fully functional enough that this
> will be viable in the near term).
> 3.  A formal vote adopting the change.
>
> But really any statement of policy around how specification changes
> occur (and what changes will be considered for backporting to finalized
> specifications) would be useful.
>
> Thanks,
> Micah
>
> On Sat, Jan 27, 2024 at 2:55 AM 叶先进 <advance...@gmail.com> wrote:
>
>> Hi,
>>
>> This is just a heads up. Szehon and I just make a spec change to include
>> multi-arg transform: https://github.com/apache/iceberg/pull/8579 recently.
>> I am sending this to get input from others who did not review the pr before
>> Iceberg 1.5 release. Any concerns/suggestions are appreciated.
>>
>> After this change, we are working to get the API/Core and engine changes
>> into the iceberg and more importantly the concrete multi-arg transforms,
>> such as bucketV2 or zorder, etc.
>>
>

Reply via email to