If a catalog implements backup/restore, it can easily expose some client APIs to the end-users (e.g. REST API), I don't see a strong reason to expose the APIs to Spark. Do you plan to add new SQL commands in Spark to backup/restore a catalog?
On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang <dustinzhang2...@gmail.com> wrote: > Hi all, > > Currently the user-facing Catalog API doesn't support backup/restore > metadata. Our customers are asking for such functionalities. Here is a > usage example: > 1. Read all metadata of one Spark cluster > 2. Save them into a Parquet file on DFS > 3. Read the Parquet file and restore all metadata in another Spark cluster > > From the current implementation, Catalog API has the list methods > (listDatabases, listFunctions, etc.) but they don't return enough > information in order to restore an entity (for example, listDatabases lose > "properties" of the database and we need "describe database extended" to > get them). And it only supports createTable (not any other entity > creations). The only way we can backup/restore an entity is using Spark SQL. > > We want to introduce the backup and restore from an API level. We are > thinking of doing this simply by adding backup() and restore() in > CatalogImpl, as ExternalCatalog already includes all the methods we need to > retrieve and recreate entities. We are wondering if there is any concern or > drawback of this approach. Please advise. > > Thank you in advance, > Tianchen >