For now we are thinking about adding two methods in Catalog API, not SQL
1. spark.catalog.backup, which backs up the current catalog.
2. spark.catalog.restore(file), which reads the DFS file and recreates the
entities described in that file.

Can you please give an example of exposing client APIs to the end users in
this approach? The users can only call backup or restore, right?


On Fri, May 7, 2021 at 12:27 PM Wenchen Fan <> wrote:

> If a catalog implements backup/restore, it can easily expose some client
> APIs to the end-users (e.g. REST API), I don't see a strong reason to
> expose the APIs to Spark. Do you plan to add new SQL commands in Spark to
> backup/restore a catalog?
> On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang <>
> wrote:
>> Hi all,
>> Currently the user-facing Catalog API doesn't support backup/restore
>> metadata. Our customers are asking for such functionalities. Here is a
>> usage example:
>> 1. Read all metadata of one Spark cluster
>> 2. Save them into a Parquet file on DFS
>> 3. Read the Parquet file and restore all metadata in another Spark cluster
>> From the current implementation, Catalog API has the list methods
>> (listDatabases, listFunctions, etc.) but they don't return enough
>> information in order to restore an entity (for example, listDatabases lose
>> "properties" of the database and we need "describe database extended" to
>> get them). And it only supports createTable (not any other entity
>> creations). The only way we can backup/restore an entity is using Spark SQL.
>> We want to introduce the backup and restore from an API level. We are
>> thinking of doing this simply by adding backup() and restore() in
>> CatalogImpl, as ExternalCatalog already includes all the methods we need to
>> retrieve and recreate entities. We are wondering if there is any concern or
>> drawback of this approach. Please advise.
>> Thank you in advance,
>> Tianchen

Reply via email to