deniskuzZ commented on code in PR #6020:
URL: https://github.com/apache/hive/pull/6020#discussion_r2278589972
##########
standalone-metastore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java:
##########
@@ -102,21 +104,47 @@ public List<TableName> getTables() throws Exception {
List<String> databases = client.getDatabases(catalogName, dbPattern);
for (String db : databases) {
- Database database = client.getDatabase(catalogName, db);
- if (MetaStoreUtils.checkIfDbNeedsToBeSkipped(database)) {
- LOG.debug("Skipping table under database: {}", db);
- continue;
- }
- if (MetaStoreUtils.isDbBeingPlannedFailedOver(database)) {
- LOG.info("Skipping table that belongs to database {} being failed
over.", db);
- continue;
- }
- List<String> tablesNames = client.listTableNamesByFilter(catalogName,
db, tableFilter, -1);
+ List<String> tablesNames = getTableNamesForDatabase(catalogName, db);
tablesNames.forEach(tablesName ->
candidates.add(TableName.fromString(tablesName, catalogName, db)));
}
return candidates;
}
+ public List<Table> getTables(int maxBatchSize) throws Exception {
+ List<Table> candidates = new ArrayList<>();
+
+ // if tableTypes is empty, then a list with single empty string has to
specified to scan no tables.
+ if (tableTypes.isEmpty()) {
+ LOG.info("Table fetcher returns empty list as no table types specified");
+ return candidates;
+ }
+
+ List<String> databases = client.getDatabases(catalogName, dbPattern);
+
+ for (String db : databases) {
+ List<String> tablesNames = getTableNamesForDatabase(catalogName, db);
Review Comment:
@Neer393 I don't understand what have you optimized here.
You are still doing multiple calls: 1 to get table names and another to get
table objects. Why not get table objects directly?
Also, have you considered memory here when you load everything into a heap?
I don't think that is a robust solution, it can potentially lead to OOM.
cc @dengzhhu653
##########
standalone-metastore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java:
##########
@@ -102,21 +104,47 @@ public List<TableName> getTables() throws Exception {
List<String> databases = client.getDatabases(catalogName, dbPattern);
for (String db : databases) {
- Database database = client.getDatabase(catalogName, db);
- if (MetaStoreUtils.checkIfDbNeedsToBeSkipped(database)) {
- LOG.debug("Skipping table under database: {}", db);
- continue;
- }
- if (MetaStoreUtils.isDbBeingPlannedFailedOver(database)) {
- LOG.info("Skipping table that belongs to database {} being failed
over.", db);
- continue;
- }
- List<String> tablesNames = client.listTableNamesByFilter(catalogName,
db, tableFilter, -1);
+ List<String> tablesNames = getTableNamesForDatabase(catalogName, db);
tablesNames.forEach(tablesName ->
candidates.add(TableName.fromString(tablesName, catalogName, db)));
}
return candidates;
}
+ public List<Table> getTables(int maxBatchSize) throws Exception {
+ List<Table> candidates = new ArrayList<>();
+
+ // if tableTypes is empty, then a list with single empty string has to
specified to scan no tables.
+ if (tableTypes.isEmpty()) {
+ LOG.info("Table fetcher returns empty list as no table types specified");
+ return candidates;
+ }
+
+ List<String> databases = client.getDatabases(catalogName, dbPattern);
+
+ for (String db : databases) {
+ List<String> tablesNames = getTableNamesForDatabase(catalogName, db);
Review Comment:
@Neer393 I don't understand what have you optimized here.
You are still doing multiple calls: 1 to get table names and another to get
table objects. Why not get table objects directly?
Also, have you considered memory here when you load everything into a heap?
I don't think that is a robust solution, it can potentially lead to OOM.
cc @dengzhhu653
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]