rdblue commented on a change in pull request #2129: URL: https://github.com/apache/iceberg/pull/2129#discussion_r572300491
########## File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java ########## @@ -36,4 +38,11 @@ public static HiveCatalog loadCatalog(Configuration conf) { String metastoreUri = conf.get(HiveConf.ConfVars.METASTOREURIS.varname, ""); return CATALOG_CACHE.get(metastoreUri, uri -> new HiveCatalog(conf)); } + + public static HiveCatalog loadCatalog(String catalogName, Map<String, String> properties, Configuration conf) { + // metastore URI can be null in local mode + String metastoreUri = conf.get(HiveConf.ConfVars.METASTOREURIS.varname, ""); + return CATALOG_CACHE.get(metastoreUri, uri -> (HiveCatalog) CatalogUtil.loadCatalog(HiveCatalog.class.getName(), + catalogName, properties, conf)); + } Review comment: > For writes TezAM (for Tez) or YARN Application Master (for MR) calls the commitJob which still needs to create the Catalog. This is later in the job, though. I wonder if we should introduce a thread to close connections if they are not used for some period of time. Garbage collecting connections could help if we think that the extra connection from the HS2 side is a problem. The AM connection probably wouldn't last long though? > I think sharing HMS Clients between the Catalogs is not really a good practice. Sounds reasonable. > I think the best solution would be a different HiveCatalog implementation reusing the HMS Client from the Hive session. This sounds okay, but should probably be done in a follow-up instead of here. It also doesn't affect name-based caching. The only catalog that could share a connection with the Hive session is the unnamed catalog used when there is no catalog set. We cannot share connections to named catalogs because there is no reason to think those are hitting the same metastore. > ll that said, I think the name based caching is flawed if the configuration can be changed I don't think that configuration can be changed, for the reason you cite. Once a catalog is created, it is configured and that can't be changed. We _could_ look up settings each time they are used but that seems like a lot of work for a problem we fundamentally don't need to solve. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org