Hey Wing Yu

I see that you have been updating the Google doc containing the proposal.


That's correct, I've been working with Talat to update the doc based on
feedback from the comments and first round of discussion we had on this
topic.

Looking through it now, as far as I can tell, the basic idea (from the
> original proposal) of inferring the table location from the path to the
> current metadata.json has not changed. Is my reading correct?


So far, nothing has changed about table location inference, but we will
probably be revisiting this with respect to other updates/clarifications.
There are still a couple open comments related to this point, but it is one
of the main goals.

You have added clarification around how the path to the metadata is
> constructed from table location (from which the table location is thus
> reverse engineered) and around path relativization, but the original idea
> does not appear to have changed. In that case, the use case of having a
> single copy of metadata but more than one copy of data (two or more
> locations) is not supported by the proposal. This was the sticking point in
> the last sync to discuss the proposal.


I don't believe this was the sticking point from the original discussion.
Having multiple copies/locations of the same data files under a single
table's management is explicitly a non-goal.  It was discussed in the
comments of the doc for caching/fallback use cases, but I think that's
better handled by specific engine/fileio implementations.

The main sticking points were confusion around the complexity of how paths
are constructed/persisted and the interplay between table/metadata/data
locations depending on how those values are set in the table metadata.
Based on that feedback, we're suggesting some changes, which is primarily
consist of: 1) defining path construction, resolution, and relativization
separately, 2) making all paths relative to the table location (which
simplifies resolution/relativization, 3) address confusing/complex issues
like path separators and expectations around separators.

We're still in the process of updating the document, but we will schedule
another sync to discuss these updates in detail and address a few points
that are still outstanding.

Thanks,
Dan

On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon <[email protected]>
wrote:

> Hi Daniel Weeks,
> I see that you have been updating the Google doc containing the proposal.
> Looking through it now, as far as I can tell, the basic idea (from the
> original proposal) of inferring the table location from the path to the
> current metadata.json has not changed. Is my reading correct?
> You have added clarification around how the path to the metadata is
> constructed from table location (from which the table location is thus
> reverse engineered) and around path relativization, but the original idea
> does not appear to have changed. In that case, the use case of having a
> single copy of metadata but more than one copy of data (two or more
> locations) is not supported by the proposal. This was the sticking point in
> the last sync to discuss the proposal.
> Do you intend to have another sync to continue the discussion?
> Thanks,
> Wing Yew
>
>
> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada
> <[email protected]> wrote:
>
>> Thanks Kevin, yes, I see the recording link too but don’t have access. I
>> have requested access.
>>
>>
>> ~ Anurag Mantripragada
>>
>>
>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]> wrote:
>>
>> Yes it was recorded. Dan or Talat should have the recording. I see
>> there's already a link for the recording associated with the gcal event but
>> I dont have access to it.
>>
>> Best,
>> Kevin Liu
>>
>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada
>> <[email protected]> wrote:
>>
>>> Hey folks, was the sync recorded? I missed it due to calendar sync
>>> issues :(
>>>
>>>
>>> ~ Anurag Mantripragada
>>>
>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote:
>>>
>>> Thanks. I can see it now
>>>
>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]> wrote:
>>>
>>>>
>>>> I can see the new event on the dev calendar.
>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png]
>>>>
>>>> Subscribe to the "Iceberg Dev Events" calendar here:
>>>> https://iceberg.apache.org/community/#iceberg-community-events
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>>
>>>>
>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]> wrote:
>>>>
>>>>> Hey Ally (and everyone else).
>>>>>
>>>>> We hadn't scheduled the discussion for relative paths, but I just
>>>>> added an event to the dev calendar for Thursday at 9am (PT).
>>>>>
>>>>> Let me know if you still don't see it on the calendar.
>>>>>
>>>>> -Dan
>>>>>
>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Talat
>>>>>>
>>>>>> Thanks for the update. I will do a new pass on the doc.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer
>>>>>> <[email protected]> wrote:
>>>>>> >
>>>>>> > Hi, Iceberg Community,
>>>>>> >
>>>>>> > As mentioned at the last sync, Dan and I have been working on a
>>>>>> proposal to add support for relative paths, which has been a long 
>>>>>> requested
>>>>>> feature. There have been a number of discussions/proposals over the 
>>>>>> years,
>>>>>> but we'd like to scope down and refocus effort to make some meaningful
>>>>>> progress on this issue.
>>>>>> >
>>>>>> > Please take a look at the linked doc and provide feedback. We'd
>>>>>> love to open up discussion on this topic at the next community sync and 
>>>>>> we
>>>>>> can hold one-off syncs on the topic if there's a lot of interest.
>>>>>> >
>>>>>> > You can access Iceberg's First V4 Spec change from here :)
>>>>>> >
>>>>>> > Proposal Issue: https://github.com/apache/iceberg/issues/13141
>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path
>>>>>> >
>>>>>> > Talat
>>>>>>
>>>>>
>>>
>>

Reply via email to