[ https://issues.apache.org/jira/browse/DRILL-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Volodymyr Vysotskyi resolved DRILL-5773. ---------------------------------------- Resolution: Fixed Fix Version/s: 1.16.0 Looks like it was fixed in the scope of DRILL-6118. > Project pushdown into a subquery with select * > ---------------------------------------------- > > Key: DRILL-5773 > URL: https://issues.apache.org/jira/browse/DRILL-5773 > Project: Apache Drill > Issue Type: Improvement > Reporter: Jinfeng Ni > Assignee: Hanumath Rao Maduri > Priority: Major > Fix For: 1.16.0 > > > If a subquery / table expression/ view has a `select *` and out query is > requesting a subset of columns/fields, Drill currently does not do project > pushdown into the subquery. As a result, the scan operator will return every > column/field in the table, this would significantly impact query performance, > especially if # of column/field is large. > For instance, > {code} > SELECT n_regionkey, count(*) AS cnt > FROM (SELECT * FROM cp.`tpch/nation.parquet`) AS n > GROUP BY n_regionkey; > {code} > Here is the plan > {code} > 00-00 Screen > 00-01 Project(n_regionkey=[$0], cnt=[$1]) > 00-02 Project(n_regionkey=[$0], cnt=[$1]) > 00-03 HashAgg(group=[{0}], cnt=[COUNT()]) > 00-04 Project(n_regionkey=[ITEM($0, 'n_regionkey')]) > 00-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], > selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, > usedMetadataFile=false, columns=[`*`]]]) > {code} > Notice that in Scan operator `columns = *`, indicating that it will read > every column. > From performance perspective, Drill should push project into subquery with > select *. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)