Agreed with Peter. I will bring relative paths changes up in the next community 
sync. I will help drive this.


~ Anurag Mantripragada






> On Jul 8, 2024, at 10:50 PM, Péter Váry <peter.vary.apa...@gmail.com> wrote:
> 
> I think in most cases the copy table action doesn't require a query engine to 
> read and generate the new metadata files. This means, that it would be nice 
> to provide a pure Java implementation in the core, and it could be 
> extended/reused by different engines, like Spark, to execute it in a 
> distributed manner, when distributed execution is needed.
> 
> About the copy vs. relative path debate:
> - I have seen the relative path requirement coming up multiple times in the 
> past. Seems like a feature requested by multiple users, so I think it would 
> be the best to discuss it in a different thread. The Copy Table Action might 
> be used to move absolute path tables to relative path tables when migration 
> is needed.
> 
> On Mon, Jul 8, 2024, 21:52 Anurag Mantripragada 
> <amantriprag...@apple.com.invalid> wrote:
>> Hi Yufei. 
>> 
>> Thanks for the proposal. While the actions are great, they still need to do 
>> a lot of work which can be reduced if we have the relative path changes. I 
>> still support adding these actions as moving data was out of scope for the 
>> relative path design and we can use these actions as helpers when the spec 
>> change is done. 
>> 
>> Anurag Mantripragada
>> 
>>> On Jul 8, 2024, at 10:55 AM, Pucheng Yang <pucheng.yo...@gmail.com 
>>> <mailto:pucheng.yo...@gmail.com>> wrote:
>>> 
>>> Thanks for picking this up, I think this is a very valuable addition.
>>> 
>>> On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu <flyrain...@gmail.com 
>>> <mailto:flyrain...@gmail.com>> wrote:
>>>> Hi folks,
>>>> 
>>>> I'd like to share a recent progress of adding actions to copy tables 
>>>> across different places.
>>>> 
>>>> There is a constant need to copy tables across different places for 
>>>> purposes such as disaster recovery and testing. Due to the absolute file 
>>>> paths in Iceberg metadata, it doesn't work automatically. There are three 
>>>> generic solutions:
>>>> 1. Rebuild the metadata: This is a proven approach widely used across 
>>>> various companies.
>>>> 2. S3 access point: Effective when both the source and target locations 
>>>> are in S3, but not applicable to other storage systems.
>>>> 3. Relative path: It requires changes to the table specification.
>>>> 
>>>> We focus on the first approach in this thread. While the code has been 
>>>> shared 2 years ago here <https://github.com/apache/iceberg/pull/4705>, it 
>>>> has never been merged. We picked it up recently. Here are the active PRs 
>>>> related to this action. Would really appreciate any feedback and review:
>>>> PR to add CopyTable action: https://github.com/apache/iceberg/pull/10024
>>>> PR to add CheckSnapshotIntegrity action: 
>>>> https://github.com/apache/iceberg/pull/10642
>>>> PR to add RemoveExpiredFiles 
>>>> action:https://github.com/apache/iceberg/pull/10643
>>>> Here is a google doc with more details to clarify the goals and approach: 
>>>> https://docs.google.com/document/d/15oPj7ylgWQG8bhk_5aTjzHl7mlc-9f4OAH-oEpKavSc/edit?usp=sharing
>>>> 
>>>> Yufei
>> 

Reply via email to