Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23155 )
Change subject: IMPALA-14074: Warmup metadata cache in catalogd for critical tables ...................................................................... Patch Set 4: (12 comments) Addressed the comments and add support for * wildcards like "tpch.*". * ignore comments in config file * schedule table loading based on their order in the config file. http://gerrit.cloudera.org:8080/#/c/23155/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23155/2//COMMIT_MSG@34 PS2, Line 34: The flag is --warmup_tables_config_file. Yeah, table list could be huge so using a config file is better. I'll add the wildcard support to specify all tables in a db. > It looks like we're only supporting loading tables, what about functions? Functions are always loaded and there is no way to invalidate a function from the cache. http://gerrit.cloudera.org:8080/#/c/23155/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/23155/2/be/src/catalog/catalog-server.cc@298 PS2, Line 298: keeps_warmup_tables_loaded > Create validator for this flag against invalidate_tables_on_memory_pressure It depends on the heap size and how large of the memory the warmup tables need. If users are confident that JVM heap size is large enough, they can turn this on. http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3325 PS2, Line 3325: msDb = msClient.getHiveClient().getDatabase(dbName); : catalogTimeline.markEvent(FETCHED_HMS_DB); : Preconditions.checkNotNull(msDb); : addDb(dbName, msDb); : > Create common function for both invalidateTable() and invalidateTableIfExis Done http://gerrit.cloudera.org:8080/#/c/23155/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/23155/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2613 PS3, Line 2613: LOG.info("Scheduled warmup on table {}.{}", tTblName.db_name, > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/23155/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3351 PS3, Line 3351: // The database should always have a lower catalog version than the table because > line too long (113 > 90) Done http://gerrit.cloudera.org:8080/#/c/23155/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3390 PS3, Line 3390: if (loadInBackground_ > line too long (113 > 90) Done http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1281 PS2, Line 1281: line = line.trim(); > skip a line starting with '#' so that user could temporarily comment out a Good point! Done. http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java File fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java: http://gerrit.cloudera.org:8080/#/c/23155/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java@243 PS2, Line 243: for (String table : tables) { > could we read table names from warmup_table_list.txt? otherwise this tables We need to parse the config file which is actually the code we are testing here. So I think it's OK to hard-code the results. http://gerrit.cloudera.org:8080/#/c/23155/2/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/23155/2/testdata/bin/create-load-data.sh@229 PS2, Line 229: This should be ${TMP_DIR}/warmup_table_list.txt. The wrong patch causes test failures in https://jenkins.impala.io/job/gerrit-verify-dryrun/12138/ http://gerrit.cloudera.org:8080/#/c/23155/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/23155/2/tests/common/impala_test_suite.py@612 PS2, Line 612: def get_var_current_val(self, var): : """Returns the current value of a given Impalad flag variable.""" : # Parse the /varz endpoint to get the flag information. : varz = self.get_debug_page(VARZ_URL) : assert 'flags' in varz.keys() : filtered_varz = [flag for flag in varz['flags'] if flag['name'] == var] : assert len(filtered_varz) == 1 : assert 'current' in filtered_varz[0].keys() : return filtered_varz[0]['current'].strip() > Can you embed this into CatalogdService in impala_cluster.py? Good point! Done. http://gerrit.cloudera.org:8080/#/c/23155/2/tests/custom_cluster/test_catalogd_ha.py File tests/custom_cluster/test_catalogd_ha.py: http://gerrit.cloudera.org:8080/#/c/23155/2/tests/custom_cluster/test_catalogd_ha.py@533 PS2, Line 533: " > use FILESYSTEM_PREFIX ? Good catch! http://gerrit.cloudera.org:8080/#/c/23155/2/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/23155/2/tests/custom_cluster/test_restart_services.py@1132 PS2, Line 1132: > What if test environment is against Ozone or S3? Replaced this with FILESYSTEM_PREFIX. -- To view, visit http://gerrit.cloudera.org:8080/23155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d09eae1f12a8acd2de945984d956d11eeee1ab6 Gerrit-Change-Number: 23155 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Fri, 11 Jul 2025 12:03:00 +0000 Gerrit-HasComments: Yes
