Nezih Yigitbasi created HIVE-10319:
--------------------------------------

             Summary: Hive CLI startup takes a long time with a large number of 
databases
                 Key: HIVE-10319
                 URL: https://issues.apache.org/jira/browse/HIVE-10319
             Project: Hive
          Issue Type: Improvement
          Components: CLI
    Affects Versions: 1.0.0
            Reporter: Nezih Yigitbasi


The Hive CLI takes a long time to start when there is a large number of 
databases in the DW. I think the root cause is the way permanent UDFs are 
loaded from the metastore. When I looked at the logs and the source code I see 
that at startup Hive first gets all the databases from the metastore and then 
for each database it makes a metastore call to get the permanent functions for 
that database [see Hive.java | 
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
 So the number of metastore calls made is in the order of the number of 
databases. In production we have several hundreds of databases so Hive makes 
several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to