Hi Everyone,
Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files *per
table.
Today, if someone has to run GCs on an entire catalog they will have to
manually run these procedures for every table.
Is it a good idea to do it in bulk as per catalog or with multiple tables ?
Current syntax:
CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>)
Proposed Syntax something similar:
Per Namespace/Database
CALL hive_prod.system.expire_snapshots(database => 'db', <Options>)
Per Catalog
CALL hive_prod.system.expire_snapshots(<Options>)
Multiple Tables
CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1',
'db2.table2), <Options>)
PS: There could be exceptions for individual catalogs. Like Nessie doesn't
support GC other than Nessie CLI. Hadoop can't list all the Namespaces.
Regards,
Naveen Kumar