Thanks for picking this up, I think this is a very valuable addition.

On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu <flyrain...@gmail.com> wrote:

> Hi folks,
>
> I'd like to share a recent progress of adding actions to copy tables
> across different places.
>
> There is a constant need to copy tables across different places for
> purposes such as disaster recovery and testing. Due to the absolute file
> paths in Iceberg metadata, it doesn't work automatically. There are three
> generic solutions:
> 1. Rebuild the metadata: This is a proven approach widely used across
> various companies.
> 2. S3 access point: Effective when both the source and target locations
> are in S3, but not applicable to other storage systems.
> 3. Relative path: It requires changes to the table specification.
>
> We focus on the first approach in this thread. While the code has been
> shared 2 years ago here <https://github.com/apache/iceberg/pull/4705>, it
> has never been merged. We picked it up recently. Here are the active PRs
> related to this action. Would really appreciate any feedback and review:
>
>    - PR to add CopyTable action:
>    https://github.com/apache/iceberg/pull/10024
>    - PR to add CheckSnapshotIntegrity action:
>    https://github.com/apache/iceberg/pull/10642
>    - PR to add RemoveExpiredFiles action:
>    https://github.com/apache/iceberg/pull/10643
>
> Here is a google doc with more details to clarify the goals and approach:
> https://docs.google.com/document/d/15oPj7ylgWQG8bhk_5aTjzHl7mlc-9f4OAH-oEpKavSc/edit?usp=sharing
>
> Yufei
>

Reply via email to