Re: [Spark Catalog API] Support for metadata Backup/Restore

Wenchen Fan Fri, 07 May 2021 12:27:48 -0700

If a catalog implements backup/restore, it can easily expose some client
APIs to the end-users (e.g. REST API), I don't see a strong reason to
expose the APIs to Spark. Do you plan to add new SQL commands in Spark to
backup/restore a catalog?


On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang <dustinzhang2...@gmail.com>
wrote:

> Hi all,
>
> Currently the user-facing Catalog API doesn't support backup/restore
> metadata. Our customers are asking for such functionalities. Here is a
> usage example:
> 1. Read all metadata of one Spark cluster
> 2. Save them into a Parquet file on DFS
> 3. Read the Parquet file and restore all metadata in another Spark cluster
>
> From the current implementation, Catalog API has the list methods
> (listDatabases, listFunctions, etc.) but they don't return enough
> information in order to restore an entity (for example, listDatabases lose
> "properties" of the database and we need "describe database extended" to
> get them). And it only supports createTable (not any other entity
> creations). The only way we can backup/restore an entity is using Spark SQL.
>
> We want to introduce the backup and restore from an API level. We are
> thinking of doing this simply by adding backup() and restore() in
> CatalogImpl, as ExternalCatalog already includes all the methods we need to
> retrieve and recreate entities. We are wondering if there is any concern or
> drawback of this approach. Please advise.
>
> Thank you in advance,
> Tianchen
>

Re: [Spark Catalog API] Support for metadata Backup/Restore

Reply via email to