Hi Jack,

Thanks for sharing this idea.

Our typical usage of "relative path" is distcp between two HDFS clusters
for disaster recovery. It looks to me that by extending this feature, we
should always take the authority and scheme from HDFS configurations in
that cluster for any path.
The downside is there could be confusion when we read files directly. I'm
not sure about other side effects and how much effort it will take to
implement. It would be best verified with a PoC.

Regards,
Manu

On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote:

> Just to put another alternative solution on the table. In S3FileIO, we
> implemented the support for S3 access point and bucket alias, which
> actually accidentally enabled "relative path" if you are just switching
> bucket name.
>
> At read time, you can supply a catalog property
> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in
> <bucket-name> should be read using <bucket-alias-name> which comes from an
> access point. However, bucket alias name is basically the same as bucket
> name, so there is nothing preventing me to say something like
> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2".
>
> If I configure that, then any file path like
> "s3://my-bucket-us-east-1/some/path" will be converted to
> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the
> same effect without the need to change the Iceberg spec.
>
> Is it possible to extend this feature, so instead of supporting relative
> path, we can support some form of replacing absolute path, so the Iceberg
> metadata tree is still self-complete without the need to reference external
> information like a prefix in a catalog?
>
> For example, user can provide a map saying that any path with prefix
> "my-bucket-us-east-1/table1" should now be read through
> "my-bucket-us-west-2/table1-backup". And we already have built-in
> integration for catalog to set customized catalog properties per table. For
> example, this is achieved in REST through the config field in
> LoadTableResponse, which is used to vend S3 access credentials today. There
> were also thoughts about allowing similar features in Glue to provide these
> configs through Glue table parameters, as an implementation for non-REST
> catalogs. We just did not add that feature because Glue already supports S3
> access credentials vending through LakeFormation.
>
> Has this option been considered? I quickly scanned through the linked doc,
> it seems to be not discussed, but I might have missed it.
>
> Best,
> Jack Ye
>
>
>
>
>
>
>
>
> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Ryan
>>
>> Ah ok, I thought that an Iceberg release is "based"/implement a spec
>> (I assumed the opposite is wrong).
>>
>> Thanks for the explanation!
>>
>> Regards
>> JB
>>
>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote:
>> >
>> > JB,
>> >
>> > The spec and the reference implementation are released separately so v3
>> and 2.0 are independent. There's no requirement that v3 is completed for
>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to
>> deprecate and remove things so that we don't continue to carry forward and
>> maintain older interfaces.
>> >
>> > Ryan
>> >
>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> >>
>> >> Hi Manu
>> >>
>> >> Thanks for the reminder. It sounds like a good feature and worth
>> >> discussing it :).
>> >>
>> >> It was my intention to define what we plan to include (or not) in Spec
>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week).
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>> >> >
>> >> > Do we still want to move forward with this feature? It's on the
>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while.
>> >> >
>> >> > Manu
>> >> >
>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com>
>> wrote:
>> >> >>
>> >> >> hi
>> >> >>
>> >> >> Please review the approach captured here Iceberg Table Portability
>> This is a continuation from the previous effort here - Support relative
>> paths and multiple root locations.
>> >> >>
>> >> >> --
>> >> >>
>> >> >> kind regards
>> >> >> Mohit
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Tabular
>>
>

Reply via email to