Thanks for working on this, Szehon and AdvanceXY! I'm glad to see this
picking up for the v3 work.

I also want to address Micah's comments and suggest how we can do better
next time. From Micah's suggestion, there are 3 steps: 1. Discuss the
feature, 2. Build 2 reference implementations, and 3. hold a vote. That's
very similar to what we typically do. The only difference is that for step
2, we typically just build one reference implementation in the Java
library. We do vote on the large spec updates, but in this case you haven't
seen one since we haven't built the reference implementation yet.

I think the confusion here comes from updating the spec markdown doc
prematurely. I think the PR that was merged is missing clarity around the
version of the spec that requires these changes and how to handle them in
v1 and v2 tables. It should be clear that this is a v3 feature and that v3
has not been formally adopted by a vote. We'll clean that up.

While this is a v3 feature and must be supported for v3 compatibility, the
community usually also has guidelines for using features like this with
older spec versions. For example, before releasing v2 we allowed snapshot
ID inheritance in v1 by enabling a table property. That allowed people that
could ensure their versions supported it or who were okay with errors to
use the feature before v2 was released. I think we'd want to do the same
thing here. The reference implementation can read but not write tables that
have unknown partition transforms. We need to be clear about the details,
but I think this is generally a good idea -- I'm curious what you think
about it, Micah.

Ryan

On Sun, Jan 28, 2024 at 8:01 AM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Hi,
>
> This would not be retrofitting existing partition transforms, but just
> allowing for the creation of new multi-arg transforms.  Is the concern that
> some implementations are never expecting new transforms to be added?  Old
> implementations would indeed not be able to read Iceberg tables created
> with the new transforms (this is the case even today without allowing
> multi-arg transforms).
>
> For the change, there were discussions about this from:
> 1. Github:  https://github.com/apache/iceberg/issues/8258 (author made an
> end-to-end reference implementation there for a sample transform)
> 2. Google Doc Dicussion:
> https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit#heading=h.si1nr6ftu79b
> 3. August 2023 meetup :
> https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.5jsm89ozy58
> (there was a general consensus there)
>
> As there seemed to be enough consensus at the design discussion and
> meetup, we did not think to go for a vote.  Though I realize this will have
> missed some in the community who did not attend.  I am still not entirely
> sure this is a large enough spec change for a vote (given my understanding
> of the impact), but we definitely should have sent an email to dev list to
> get more eyes on the above discussions to collect further concerns.
>
> Thanks,
> Szehon
>
>
> On Sat, Jan 27, 2024 at 10:06 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> I think this is a good idea but I have concerns about compatibility. IMO,
>> I think changing the cardinality of input columns is a large enough change
>> that trying to retrofit it into V1 or V2 of the specification will cause
>> pain for implementations not relying on reference implementation.  I
>>
>> As a secondary concern, I think it would be worthwhile for PMC to
>> formalize the process around specification changes as these have broader
>> implications for Iceberg adoption.  A model that I've seen work reasonably
>> well in other communities is the following:
>>
>> 1.  Discussion of overall features on the mailing list (this can also be
>> a pointer to the GitHub issue).
>> 2.  2 reference implementations demonstrating the change is viable (it
>> seems like PyIceberg is close to being fully functional enough that this
>> will be viable in the near term).
>> 3.  A formal vote adopting the change.
>>
>> But really any statement of policy around how specification changes
>> occur (and what changes will be considered for backporting to finalized
>> specifications) would be useful.
>>
>> Thanks,
>> Micah
>>
>> On Sat, Jan 27, 2024 at 2:55 AM 叶先进 <advance...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> This is just a heads up. Szehon and I just make a spec change to include
>>> multi-arg transform: https://github.com/apache/iceberg/pull/8579 recently.
>>> I am sending this to get input from others who did not review the pr before
>>> Iceberg 1.5 release. Any concerns/suggestions are appreciated.
>>>
>>> After this change, we are working to get the API/Core and engine changes
>>> into the iceberg and more importantly the concrete multi-arg transforms,
>>> such as bucketV2 or zorder, etc.
>>>
>>

-- 
Ryan Blue
Tabular

Reply via email to