[ https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036677#comment-15036677 ]
Jinfeng Ni commented on DRILL-4127: ----------------------------------- For a hive storage plugin with about 8 schema/databases, if I run a simple query like this: select count(*) from hive.table1; >From hive.log, we saw that the # of hive metastore api call as following: Without the patch. Impersonation is turned on. 1. # of get_all_databases API call: 31 2. # of get_all_tables API call: 30 3. # of get_table API call: 2 That explains that why some Drill users report that they saw Drill spent 20-30 seconds on planning for such simple query, making the query not "interactive" at all. > HiveSchema.getSubSchema() should use lazy loading of all the table names > ------------------------------------------------------------------------ > > Key: DRILL-4127 > URL: https://issues.apache.org/jira/browse/DRILL-4127 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > > Currently, HiveSchema.getSubSchema() will pre-load all the table names when > it constructs the subschema, even though those tables names are not requested > at all. This could cause considerably big performance overhead, especially > when the hive schema contains large # of objects (thousands of tables/views > are not un-common in some use case). > In stead, we should change the loading of table names to on-demand. Only when > there is a request of get all table names, we load them into hive schema. > This should help "show schemas", since it only requires the schema name, not > the table names in the schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)