We took a different approach by modifying the metadata. It is a bit heavy
compared to the relative path and s3 access point, but it can be used for
any types of storage and any locations. I shared it here,
https://github.com/apache/iceberg/pull/4705.

Yufei


On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Jack,
>
> Thanks for sharing this idea.
>
> Our typical usage of "relative path" is distcp between two HDFS clusters
> for disaster recovery. It looks to me that by extending this feature, we
> should always take the authority and scheme from HDFS configurations in
> that cluster for any path.
> The downside is there could be confusion when we read files directly. I'm
> not sure about other side effects and how much effort it will take to
> implement. It would be best verified with a PoC.
>
> Regards,
> Manu
>
> On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote:
>
>> Just to put another alternative solution on the table. In S3FileIO, we
>> implemented the support for S3 access point and bucket alias, which
>> actually accidentally enabled "relative path" if you are just switching
>> bucket name.
>>
>> At read time, you can supply a catalog property
>> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in
>> <bucket-name> should be read using <bucket-alias-name> which comes from an
>> access point. However, bucket alias name is basically the same as bucket
>> name, so there is nothing preventing me to say something like
>> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2".
>>
>> If I configure that, then any file path like
>> "s3://my-bucket-us-east-1/some/path" will be converted to
>> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the
>> same effect without the need to change the Iceberg spec.
>>
>> Is it possible to extend this feature, so instead of supporting relative
>> path, we can support some form of replacing absolute path, so the Iceberg
>> metadata tree is still self-complete without the need to reference external
>> information like a prefix in a catalog?
>>
>> For example, user can provide a map saying that any path with prefix
>> "my-bucket-us-east-1/table1" should now be read through
>> "my-bucket-us-west-2/table1-backup". And we already have built-in
>> integration for catalog to set customized catalog properties per table. For
>> example, this is achieved in REST through the config field in
>> LoadTableResponse, which is used to vend S3 access credentials today. There
>> were also thoughts about allowing similar features in Glue to provide these
>> configs through Glue table parameters, as an implementation for non-REST
>> catalogs. We just did not add that feature because Glue already supports S3
>> access credentials vending through LakeFormation.
>>
>> Has this option been considered? I quickly scanned through the linked
>> doc, it seems to be not discussed, but I might have missed it.
>>
>> Best,
>> Jack Ye
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Ryan
>>>
>>> Ah ok, I thought that an Iceberg release is "based"/implement a spec
>>> (I assumed the opposite is wrong).
>>>
>>> Thanks for the explanation!
>>>
>>> Regards
>>> JB
>>>
>>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote:
>>> >
>>> > JB,
>>> >
>>> > The spec and the reference implementation are released separately so
>>> v3 and 2.0 are independent. There's no requirement that v3 is completed for
>>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to
>>> deprecate and remove things so that we don't continue to carry forward and
>>> maintain older interfaces.
>>> >
>>> > Ryan
>>> >
>>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>> >>
>>> >> Hi Manu
>>> >>
>>> >> Thanks for the reminder. It sounds like a good feature and worth
>>> >> discussing it :).
>>> >>
>>> >> It was my intention to define what we plan to include (or not) in Spec
>>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week).
>>> >>
>>> >> Regards
>>> >> JB
>>> >>
>>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <owenzhang1...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Do we still want to move forward with this feature? It's on the
>>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while.
>>> >> >
>>> >> > Manu
>>> >> >
>>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> hi
>>> >> >>
>>> >> >> Please review the approach captured here Iceberg Table Portability
>>> This is a continuation from the previous effort here - Support relative
>>> paths and multiple root locations.
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >> kind regards
>>> >> >> Mohit
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Tabular
>>>
>>

Reply via email to