Re: Questions regarding RestCatalog adoption

Ryan Blue Thu, 09 Feb 2023 17:15:04 -0800

There were a few different reasons for building the REST catalog:

   1. Standardize an interface for catalogs, like the Hive Thrift API. This
   makes customization easy: you don’t need to get Dremio or Athena to put
   your custom catalog’s Jar in their classpath. It also makes catalogs across
   languages more reliable and easier to manage. For example, having a
   slightly different JDBC implementation in every language is going to lead
   to inconsistencies and problems.
   2. Enable new catalog features: It has been difficult to fit Iceberg
   into existing catalogs and this has limited better features. For example,
   we want to be able to un-drop tables for a certain period of time, to
   support multi-table transactions, and to enable authentication and
   authorization. I don’t think the right strategy is to try to adapt the Hive
   MetaStore for it.
   3. Fix some problems: There are some boring things we can fix like the
   time it takes to load metadata files with lots of snapshots. There are also
   issues with the way catalogs handle metadata locations and FileIO with
   respect to tables. Ideally, these would be configured for each table, but
   you have to create a FileIO to read the metadata file.

I think Jack’s answer validates that the first goal has a lot of value for
the community. But your question sounds like it is mainly addressing the
second goal, where new features and investment will be.

No one intends to stop development on other catalogs or to limit new
features to only the REST catalog. In fact, the upcoming 1.2.0 release has
a new catalog contributed by Snowflake. But, I think that it will be much,
much easier to develop new features for the REST catalog because one of the
goals was to avoid needing to go to crazy lengths to make things fit with
the Hive MetaStore. That will naturally lead to new features that can’t or
won’t be implemented.

Ryan

On Thu, Feb 9, 2023 at 1:34 PM Jack Ye <[email protected]> wrote:

> Most of the development of REST catalog comes from Tabular at this moment,
> I will let them comment more about this.
>
> Speaking from AWS perspective, we have been recommending REST catalog for
> organizations that have their internal in house catalog systems. The REST
> catalog provides a really well-designed and standardized API spec that
> organizations can translate requests to and from their existing catalog
> system, so that it can (1) work with Iceberg tables, or (2) even translate
> their non-Iceberg tables to be exposed as an Iceberg table for query, so
> they can standardize their readers and writers just to Iceberg and reduce
> maintenance burden of multiple readers and writers of different table and
> file formats.
>
> I cannot say ONLY for new feature development, because for example we will
> likely continue to support AWS Glue catalog integration and it has an
> active roadmap. But the interest in REST is overall strong compared to the
> other catalog types like Hive, JDBC and DynamoDB.
>
> Best,
> Jack Ye
>
>
>
>
>
>
>
>
> On Thu, Feb 9, 2023 at 1:24 PM Xinyi Lu <[email protected]>
> wrote:
>
>> Hi Community,
>>
>> We’ve been evaluating the RestCatalog and want to know your feedback on
>> what’s the best scenarios for using RestCatalog and the current adoption
>> status in the industry. Will this be the iceberg catalog standard going
>> forward to encourage users to move metadata transactions to the server
>> side? Are we looking to adding more features which are only supported by
>> the RestCatalog?
>>
>>
>>
>> Thanks,
>> Xinyi
>
>

-- 
Ryan Blue
Tabular

Re: Questions regarding RestCatalog adoption

Reply via email to