[ https://issues.apache.org/jira/browse/IMPALA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971035#comment-16971035 ]
Jiawei Wang commented on IMPALA-9139: ------------------------------------- cc [~stigahuang], Please correct me if I am wrong. So I read the code here...It seems like the logic here is correct. It's definitely confusing if we did not dive into the code... I will add some comments here in the IMPALA-9110 patch. The flag checking loadInBackground_ actually happens... It's just happened in invalidateDb() function... So it seems like this invalidateDb() function returns all the tables under a DB that need to be reloaded. {code:java} Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient, dbName, oldDb); if (invalidatedDb == null) continue; newDbCache.put(dbName, invalidatedDb.first); tblsToBackgroundLoad.addAll(invalidatedDb.second);{code} In that function. It checks whether we need to reload tables here: [https://github.com/apache/impala/blob/f9cf70d0352196e7b1b8465d6507ae4b16a3e82c/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1421] So if the loadInBackground_ is set to false, this method invalidateDb() will always return null. So the background loading in reset() actually did not add any tables to the tableLoadingDeque_... > Invalidate metadata adds all the tables to background loading pool > unnecessarily > -------------------------------------------------------------------------------- > > Key: IMPALA-9139 > URL: https://issues.apache.org/jira/browse/IMPALA-9139 > Project: IMPALA > Issue Type: Bug > Reporter: Vihang Karajgaonkar > Priority: Major > > I see the following code in the reset() method of CatalogServiceCatalog > {code:java} > // Build a new DB cache, populate it, and replace the existing cache in > one > // step. > Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>(); > List<TTableName> tblsToBackgroundLoad = new ArrayList<>(); > try (MetaStoreClient msClient = getMetaStoreClient()) { > List<String> allDbs = msClient.getHiveClient().getAllDatabases(); > int numComplete = 0; > for (String dbName: allDbs) { > if (isBlacklistedDb(dbName)) { > LOG.info("skip blacklisted db: " + dbName); > continue; > } > String annotation = String.format("invalidating metadata - %s/%s > dbs complete", > numComplete++, allDbs.size()); > try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation)) > { > dbName = dbName.toLowerCase(); > Db oldDb = oldDbCache.get(dbName); > Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient, > dbName, oldDb); > if (invalidatedDb == null) continue; > newDbCache.put(dbName, invalidatedDb.first); > tblsToBackgroundLoad.addAll(invalidatedDb.second); > } > } > } > dbCache_.set(newDbCache); > // Identify any deleted databases and add them to the delta log. > Set<String> oldDbNames = oldDbCache.keySet(); > Set<String> newDbNames = newDbCache.keySet(); > oldDbNames.removeAll(newDbNames); > for (String dbName: oldDbNames) { > Db removedDb = oldDbCache.get(dbName); > updateDeleteLog(removedDb); > } > // Submit tables for background loading. > for (TTableName tblName: tblsToBackgroundLoad) { > tableLoadingMgr_.backgroundLoad(tblName); > } > {code} > If you notice above, the tables are being added to the backgroundLoad with > checking the flag {{loadInBackground_}}. This means that even if the flag is > unset, after we issue a invalidate metadata command, all the tables in the > system are being loaded in the background. Note that this code is only > loading the tables, not adding the loaded tables to the catalog which is good > otherwise the memory footprint of catalog would be increased after every > invalidate metadata command. > This bug has 2 implications: > 1. We are obviously wasting a lot of cpu cycles without getting anything out > of it. > 2. The more subtle side-effect is that this would fill up the > {{tableLoadingDeque_}}. This means any other background load task will take a > longer duration to complete. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org