Balazs Jeszenszky created IMPALA-8606:
-----------------------------------------

             Summary: GET_TABLES performance in local catalog mode
                 Key: IMPALA-8606
                 URL: https://issues.apache.org/jira/browse/IMPALA-8606
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 3.2.0
            Reporter: Balazs Jeszenszky


With local catalog mode enabled, GET_TABLES JDBC requests will return more than 
the always available table information. Any request for more metadata about a 
table will trigger a full load of that table on the catalogd side, meaning that 
GET_TABLES triggers the load of the entire catalog. Also, as far as I can see, 
the requests for more metadata are made one table at a time. 

Once the tables are loaded, the coordinator needs 3 roundtrips to the catalog 
to fetch all the details about a single table. My test case had around 57k 
tables, 1700 DBs, and ~120k partitions. 
GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
impalad, it still takes ~70 seconds.

Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to