[ 
https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036677#comment-15036677
 ] 

Jinfeng Ni commented on DRILL-4127:
-----------------------------------

For a hive storage plugin with about 8 schema/databases, if I run a simple 
query like this:

select count(*) from hive.table1;

>From hive.log, we saw that the # of hive metastore api call as following:

Without the patch. Impersonation is turned on.
1. # of get_all_databases API call: 31
2. # of get_all_tables API call: 30
3. # of get_table API call: 2

That explains that why some Drill users report that they saw Drill spent 20-30 
seconds on planning for such simple query,  making the query not "interactive" 
at all.

 


> HiveSchema.getSubSchema() should use lazy loading of all the table names
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4127
>                 URL: https://issues.apache.org/jira/browse/DRILL-4127
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when 
> it constructs the subschema, even though those tables names are not requested 
> at all. This could cause considerably big performance overhead, especially 
> when the hive schema contains large # of objects (thousands of tables/views 
> are not un-common in some use case). 
> In stead, we should change the loading of table names to on-demand. Only when 
> there is a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not 
> the table names in the schema. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to