[ 
https://issues.apache.org/jira/browse/IMPALA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971035#comment-16971035
 ] 

Jiawei Wang commented on IMPALA-9139:
-------------------------------------

cc [~stigahuang], Please correct me if I am wrong.

So I read the code here...It seems like the logic here is correct. It's 
definitely confusing if we did not dive into the code... I will add some 
comments here in the IMPALA-9110 patch. 

The flag checking loadInBackground_ actually happens... It's just happened in 
invalidateDb() function... So it seems like this invalidateDb() function 
returns all the tables under a DB that need to be reloaded.
{code:java}
            Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
                dbName, oldDb);
            if (invalidatedDb == null) continue;
            newDbCache.put(dbName, invalidatedDb.first);
            tblsToBackgroundLoad.addAll(invalidatedDb.second);{code}
In that function. It checks whether we need to reload tables here:

[https://github.com/apache/impala/blob/f9cf70d0352196e7b1b8465d6507ae4b16a3e82c/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1421]

So if the loadInBackground_ is set to false, this method invalidateDb() will 
always return null. So the background loading in reset() actually did not add 
any tables to the tableLoadingDeque_...

> Invalidate metadata adds all the tables to background loading pool 
> unnecessarily
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-9139
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9139
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> I see the following code in the reset() method of CatalogServiceCatalog
> {code:java}
>       // Build a new DB cache, populate it, and replace the existing cache in 
> one
>       // step.
>       Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>();
>       List<TTableName> tblsToBackgroundLoad = new ArrayList<>();
>       try (MetaStoreClient msClient = getMetaStoreClient()) {
>         List<String> allDbs = msClient.getHiveClient().getAllDatabases();
>         int numComplete = 0;
>         for (String dbName: allDbs) {
>           if (isBlacklistedDb(dbName)) {
>             LOG.info("skip blacklisted db: " + dbName);
>             continue;
>           }
>           String annotation = String.format("invalidating metadata - %s/%s 
> dbs complete",
>               numComplete++, allDbs.size());
>           try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation)) 
> {
>             dbName = dbName.toLowerCase();
>             Db oldDb = oldDbCache.get(dbName);
>             Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
>                 dbName, oldDb);
>             if (invalidatedDb == null) continue;
>             newDbCache.put(dbName, invalidatedDb.first);
>             tblsToBackgroundLoad.addAll(invalidatedDb.second);
>           }
>         }
>       }
>       dbCache_.set(newDbCache);
>       // Identify any deleted databases and add them to the delta log.
>       Set<String> oldDbNames = oldDbCache.keySet();
>       Set<String> newDbNames = newDbCache.keySet();
>       oldDbNames.removeAll(newDbNames);
>       for (String dbName: oldDbNames) {
>         Db removedDb = oldDbCache.get(dbName);
>         updateDeleteLog(removedDb);
>       }
>       // Submit tables for background loading.
>       for (TTableName tblName: tblsToBackgroundLoad) {
>         tableLoadingMgr_.backgroundLoad(tblName);
>       }
> {code}
> If you notice above, the tables are being added to the backgroundLoad with 
> checking the flag {{loadInBackground_}}. This means that even if the flag is 
> unset, after we issue a invalidate metadata command, all the tables in the 
> system are being loaded in the background. Note that this code is only 
> loading the tables, not adding the loaded tables to the catalog which is good 
> otherwise the memory footprint of catalog would be increased after every 
> invalidate metadata command.
> This bug has 2 implications:
> 1. We are obviously wasting a lot of cpu cycles without getting anything out 
> of it.
> 2. The more subtle side-effect is that this would fill up the 
> {{tableLoadingDeque_}}. This means any other background load task will take a 
> longer duration to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to