Padma Penumarthy created DRILL-5972:
---------------------------------------
Summary: Slow performance for query on INFORMATION_SCHEMA.TABLE
Key: DRILL-5972
URL: https://issues.apache.org/jira/browse/DRILL-5972
Project: Apache Drill
Issue Type: Bug
Components: Storage - Information Schema
Affects Versions: 1.11.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
Fix For: 1.13.0
A query like the following on INFORMATION_SCHEMA takes a long time to execute.
select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from
INFORMATION_SCHEMA.`TABLES` WHERE TABLE_NAME LIKE '%' AND ( TABLE_SCHEMA =
'hive.default' ) ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME;
Reason being we fetch table information for all schemas instead of just
'hive.default' schema.
If we change the predicate like this, it executes very fast.
select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from
INFORMATION_SCHEMA.`TABLES` WHERE ( TABLE_SCHEMA = 'hive.default' ) AND
TABLE_NAME LIKE '%' ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA,
TABLE_NAME;
The difference is in the order in which we evaluate the expressions in the
predicate.
In the first case, we first evaluate TABLE_NAME LIKE '%' and decide that it is
inconclusive (since we do not know the schema). So, we go get all tables for
all the schemas.
In the second case, we first evaluate TABLE_SCHEMA = 'hive.default' and decide
that we need to fetch only tables for that schema.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)