[ https://issues.apache.org/jira/browse/DRILL-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617062#comment-14617062 ]
Jinfeng Ni commented on DRILL-2802: ----------------------------------- Two places seem to need fix for this incorrect query result issue. 1) The execution side (Parquet Reader) seems to ignore the columns list in the query plan, and put * column in stead, which lead to all the regular columns in the final query result. {code} Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/test/bigtable/2015/01/4_0_0.parquet], ..... selectionRoot=/test/bigtable, numFiles=11, columns=[`dir0`]]]) {code} 2) Even if Parquet Reader return all the regular columns, plus dir0, dir1, if the query planner inserts a final Project to get the desired columns, the query will still get the correct result. For now, the query planner did not do that if only one column is requested. That explains why " select dir0 " would produce wrong result, while "select dir0, dir1" would get correct result. {code} explain plan for select dir0 from parquet limit 1; Screen 00-01 SelectionVectorRemover 00-02 Limit(fetch=[1]) 00-03 Scan {code} {code} explain plan for select dir0, dir1 from parquet limit 1; Screen 00-01 Project(dir0=[$0], dir1=[$1]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[1]) 00-04 Scan {code} Once query planner fixes the issue by adding the final Project, the Parquet Reader issue becomes performance-related issue, since it's reading more than necessary columns, which will be pruned out in the final Project. ( Essentially, the project pushdown does not work, in this case). I'm going to use this JIRA to fix the planner side. We may need file another JIRA for the Parquet Reader issue. > Projecting dir[n] by itself results in projecting of all columns > ---------------------------------------------------------------- > > Key: DRILL-2802 > URL: https://issues.apache.org/jira/browse/DRILL-2802 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.9.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Priority: Critical > Fix For: 1.2.0 > > > {code} > 0: jdbc:drill:schema=dfs> select dir1 from bigtable limit 1; > +------------+------------+------------+------------+ > | a1 | b1 | c1 | dir1 | > +------------+------------+------------+------------+ > | 1 | aaaaa | 2015-01-01 | 01 | > +------------+------------+------------+------------+ > 1 row selected (0.189 seconds) > 0: jdbc:drill:schema=dfs> select dir0 from bigtable limit 1; > +------------+------------+------------+------------+ > | a1 | b1 | c1 | dir0 | > +------------+------------+------------+------------+ > | 1 | aaaaa | 2015-01-01 | 2015 | > +------------+------------+------------+------------+ > 1 row selected (0.193 seconds) > {code} > In explain plan, I don't see project: > {code} > 0: jdbc:drill:schema=dfs> explain plan for select dir0 from bigtable; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/4_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/3_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/5_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/1_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/2_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/01/0_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/02/0_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/03/0_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2015/04/0_0_0.parquet], ReadEntryWithPath > [path=maprfs:/test/bigtable/2016/01/parquet.file], ReadEntryWithPath > [path=maprfs:/test/bigtable/2016/parquet.file]], > selectionRoot=/test/bigtable, numFiles=11, columns=[`dir0`]]]) > {code} > If you project both dir0 and dir1, both columns are projected with the > correct result: > {code} > 0: jdbc:drill:schema=dfs> select dir0, dir1 from bigtable; > +------------+------------+ > | dir0 | dir1 | > +------------+------------+ > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > | 2015 | 01 | > {code} > {code} > [Wed Apr 15 14:09:47 root@/mapr/vmarkman.cluster.com/test/bigtable ] # ls -R > .: > 2015 2016 > ./2015: > 01 02 03 04 > ./2015/01: > 0_0_0.parquet 1_0_0.parquet 2_0_0.parquet 3_0_0.parquet 4_0_0.parquet > 5_0_0.parquet > ./2015/02: > 0_0_0.parquet > ./2015/03: > 0_0_0.parquet > ./2015/04: > 0_0_0.parquet > ./2016: > 01 parquet.file > ./2016/01: > parquet.file > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)