Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Tianchen Zhang
Thanks everyone for the input. Yes it makes sense that metadata backup/restore should be done outside Spark. We will update the customers with documentations about how that can be done and leave the implementations to them. Thanks, Tianchen On Tue, May 11, 2021 at 1:14 AM Mich Talebzadeh wrote:

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Mich Talebzadeh
>From my experience of dealing with metadata for other applications like Hive if needed an external database for Spark metadata would be useful. However, the maintenance and upgrade of that database should be external to Spark (left to the user) and as usual some form of reliable API or JDBC

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Wenchen Fan
That's my expectation as well. Spark needs a reliable catalog. backup/restore is just implementation details about how you make your catalog reliable, which should be transparent to Spark. On Sat, May 8, 2021 at 6:54 AM ayan guha wrote: > Just a consideration: > > Is there a value in

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread ayan guha
Just a consideration: Is there a value in backup/restore metadata within spark? I would strongly argue if the metadata is valuable enough and persistent enough, why dont just use external metastore? It is fairly straightforward process. Also regardless you are in cloud or not, database bkp is a

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread Tianchen Zhang
For now we are thinking about adding two methods in Catalog API, not SQL commands: 1. spark.catalog.backup, which backs up the current catalog. 2. spark.catalog.restore(file), which reads the DFS file and recreates the entities described in that file. Can you please give an example of exposing

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread Wenchen Fan
If a catalog implements backup/restore, it can easily expose some client APIs to the end-users (e.g. REST API), I don't see a strong reason to expose the APIs to Spark. Do you plan to add new SQL commands in Spark to backup/restore a catalog? On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang wrote:

[Spark Catalog API] Support for metadata Backup/Restore

2021-05-03 Thread Tianchen Zhang
Hi all, Currently the user-facing Catalog API doesn't support backup/restore metadata. Our customers are asking for such functionalities. Here is a usage example: 1. Read all metadata of one Spark cluster 2. Save them into a Parquet file on DFS 3. Read the Parquet file and restore all metadata in

[Spark Catalog API] Support for metadata Backup/Restore

2021-04-30 Thread Tianchen Zhang
Hi, Currently the user-facing Catalog API doesn't support backup/restore metadata. Our customers are asking for such functionalities. Here is a usage example: 1. Read all metadata of one Spark cluster 2. Save them into a Parquet file on DFS 3. Read the Parquet file and restore all metadata in