Just to put another alternative solution on the table. In S3FileIO, we
implemented the support for S3 access point and bucket alias, which
actually accidentally enabled "relative path" if you are just switching
bucket name.

At read time, you can supply a catalog property
"s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in
<bucket-name> should be read using <bucket-alias-name> which comes from an
access point. However, bucket alias name is basically the same as bucket
name, so there is nothing preventing me to say something like
"s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2".

If I configure that, then any file path like
"s3://my-bucket-us-east-1/some/path" will be converted to
"s3://my-bucket-us-west-2/some/path" during read, achieving technically the
same effect without the need to change the Iceberg spec.

Is it possible to extend this feature, so instead of supporting relative
path, we can support some form of replacing absolute path, so the Iceberg
metadata tree is still self-complete without the need to reference external
information like a prefix in a catalog?

For example, user can provide a map saying that any path with prefix
"my-bucket-us-east-1/table1" should now be read through
"my-bucket-us-west-2/table1-backup". And we already have built-in
integration for catalog to set customized catalog properties per table. For
example, this is achieved in REST through the config field in
LoadTableResponse, which is used to vend S3 access credentials today. There
were also thoughts about allowing similar features in Glue to provide these
configs through Glue table parameters, as an implementation for non-REST
catalogs. We just did not add that feature because Glue already supports S3
access credentials vending through LakeFormation.

Has this option been considered? I quickly scanned through the linked doc,
it seems to be not discussed, but I might have missed it.

Best,
Jack Ye








On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Ryan
>
> Ah ok, I thought that an Iceberg release is "based"/implement a spec
> (I assumed the opposite is wrong).
>
> Thanks for the explanation!
>
> Regards
> JB
>
> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote:
> >
> > JB,
> >
> > The spec and the reference implementation are released separately so v3
> and 2.0 are independent. There's no requirement that v3 is completed for
> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to
> deprecate and remove things so that we don't continue to carry forward and
> maintain older interfaces.
> >
> > Ryan
> >
> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >>
> >> Hi Manu
> >>
> >> Thanks for the reminder. It sounds like a good feature and worth
> >> discussing it :).
> >>
> >> It was my intention to define what we plan to include (or not) in Spec
> >> v3 / Iceberg 2.0.0 (I sent a message about that last week).
> >>
> >> Regards
> >> JB
> >>
> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
> >> >
> >> > Do we still want to move forward with this feature? It's on the
> roadmap for Spec V3 but it hasn't appeared in our discussion for a while.
> >> >
> >> > Manu
> >> >
> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com>
> wrote:
> >> >>
> >> >> hi
> >> >>
> >> >> Please review the approach captured here Iceberg Table Portability
> This is a continuation from the previous effort here - Support relative
> paths and multiple root locations.
> >> >>
> >> >> --
> >> >>
> >> >> kind regards
> >> >> Mohit
> >
> >
> >
> > --
> > Ryan Blue
> > Tabular
>

Reply via email to