We took a different approach by modifying the metadata. It is a bit heavy compared to the relative path and s3 access point, but it can be used for any types of storage and any locations. I shared it here, https://github.com/apache/iceberg/pull/4705.
Yufei On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <[email protected]> wrote: > Hi Jack, > > Thanks for sharing this idea. > > Our typical usage of "relative path" is distcp between two HDFS clusters > for disaster recovery. It looks to me that by extending this feature, we > should always take the authority and scheme from HDFS configurations in > that cluster for any path. > The downside is there could be confusion when we read files directly. I'm > not sure about other side effects and how much effort it will take to > implement. It would be best verified with a PoC. > > Regards, > Manu > > On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <[email protected]> wrote: > >> Just to put another alternative solution on the table. In S3FileIO, we >> implemented the support for S3 access point and bucket alias, which >> actually accidentally enabled "relative path" if you are just switching >> bucket name. >> >> At read time, you can supply a catalog property >> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in >> <bucket-name> should be read using <bucket-alias-name> which comes from an >> access point. However, bucket alias name is basically the same as bucket >> name, so there is nothing preventing me to say something like >> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2". >> >> If I configure that, then any file path like >> "s3://my-bucket-us-east-1/some/path" will be converted to >> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the >> same effect without the need to change the Iceberg spec. >> >> Is it possible to extend this feature, so instead of supporting relative >> path, we can support some form of replacing absolute path, so the Iceberg >> metadata tree is still self-complete without the need to reference external >> information like a prefix in a catalog? >> >> For example, user can provide a map saying that any path with prefix >> "my-bucket-us-east-1/table1" should now be read through >> "my-bucket-us-west-2/table1-backup". And we already have built-in >> integration for catalog to set customized catalog properties per table. For >> example, this is achieved in REST through the config field in >> LoadTableResponse, which is used to vend S3 access credentials today. There >> were also thoughts about allowing similar features in Glue to provide these >> configs through Glue table parameters, as an implementation for non-REST >> catalogs. We just did not add that feature because Glue already supports S3 >> access credentials vending through LakeFormation. >> >> Has this option been considered? I quickly scanned through the linked >> doc, it seems to be not discussed, but I might have missed it. >> >> Best, >> Jack Ye >> >> >> >> >> >> >> >> >> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <[email protected]> >> wrote: >> >>> Hi Ryan >>> >>> Ah ok, I thought that an Iceberg release is "based"/implement a spec >>> (I assumed the opposite is wrong). >>> >>> Thanks for the explanation! >>> >>> Regards >>> JB >>> >>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <[email protected]> wrote: >>> > >>> > JB, >>> > >>> > The spec and the reference implementation are released separately so >>> v3 and 2.0 are independent. There's no requirement that v3 is completed for >>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to >>> deprecate and remove things so that we don't continue to carry forward and >>> maintain older interfaces. >>> > >>> > Ryan >>> > >>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >> >>> >> Hi Manu >>> >> >>> >> Thanks for the reminder. It sounds like a good feature and worth >>> >> discussing it :). >>> >> >>> >> It was my intention to define what we plan to include (or not) in Spec >>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week). >>> >> >>> >> Regards >>> >> JB >>> >> >>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <[email protected]> >>> wrote: >>> >> > >>> >> > Do we still want to move forward with this feature? It's on the >>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while. >>> >> > >>> >> > Manu >>> >> > >>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <[email protected]> >>> wrote: >>> >> >> >>> >> >> hi >>> >> >> >>> >> >> Please review the approach captured here Iceberg Table Portability >>> This is a continuation from the previous effort here - Support relative >>> paths and multiple root locations. >>> >> >> >>> >> >> -- >>> >> >> >>> >> >> kind regards >>> >> >> Mohit >>> > >>> > >>> > >>> > -- >>> > Ryan Blue >>> > Tabular >>> >>
