[ 
https://issues.apache.org/jira/browse/DRILL-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617062#comment-14617062
 ] 

Jinfeng Ni commented on DRILL-2802:
-----------------------------------

Two places seem to need fix for this incorrect query result issue.

1) The execution side (Parquet Reader) seems to ignore the columns list in the 
query plan, and put * column in stead, which lead to all the regular columns in 
the final query result.

{code}
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/test/bigtable/2015/01/4_0_0.parquet], ..... 
selectionRoot=/test/bigtable, numFiles=11, columns=[`dir0`]]])
{code}

2) Even if Parquet Reader return all the regular columns, plus dir0, dir1,  if 
the query planner inserts a final Project to get the desired columns, the query 
will still get the correct result.  For now, the query planner did not do that 
if only one column is requested. That explains why " select dir0 " would 
produce wrong result, while "select dir0, dir1" would get correct result.

{code}
 explain plan for select dir0 from parquet limit 1;

Screen
00-01      SelectionVectorRemover
00-02        Limit(fetch=[1])
00-03          Scan
{code}

{code}
explain plan for select dir0, dir1 from parquet limit 1;

Screen
00-01      Project(dir0=[$0], dir1=[$1])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1])
00-04            Scan
{code}

Once query planner fixes the issue by adding the final Project, the Parquet 
Reader issue becomes performance-related issue, since it's reading more than 
necessary columns, which will be pruned out in the final Project. ( 
Essentially, the project pushdown does not work, in this case). 

I'm going to use this JIRA to fix the planner side.  We may need file another 
JIRA for the Parquet Reader issue.


> Projecting dir[n] by itself results in projecting of all columns
> ----------------------------------------------------------------
>
>                 Key: DRILL-2802
>                 URL: https://issues.apache.org/jira/browse/DRILL-2802
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select dir1 from bigtable limit 1;
> +------------+------------+------------+------------+
> |     a1     |     b1     |     c1     |    dir1    |
> +------------+------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 | 01         |
> +------------+------------+------------+------------+
> 1 row selected (0.189 seconds)
> 0: jdbc:drill:schema=dfs> select dir0 from bigtable limit 1;
> +------------+------------+------------+------------+
> |     a1     |     b1     |     c1     |    dir0    |
> +------------+------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 | 2015       |
> +------------+------------+------------+------------+
> 1 row selected (0.193 seconds)
> {code}
> In explain plan, I don't see project:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select dir0 from bigtable;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/4_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/3_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/5_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/1_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/2_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/01/0_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/02/0_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/03/0_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2015/04/0_0_0.parquet], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2016/01/parquet.file], ReadEntryWithPath 
> [path=maprfs:/test/bigtable/2016/parquet.file]], 
> selectionRoot=/test/bigtable, numFiles=11, columns=[`dir0`]]])
> {code}
> If you project both dir0 and dir1, both columns are projected with the 
> correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select dir0, dir1 from bigtable;
> +------------+------------+
> |    dir0    |    dir1    |
> +------------+------------+
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> | 2015       | 01         |
> {code}
> {code}
> [Wed Apr 15 14:09:47 root@/mapr/vmarkman.cluster.com/test/bigtable ] # ls -R
> .:
> 2015  2016
> ./2015:
> 01  02  03  04
> ./2015/01:
> 0_0_0.parquet  1_0_0.parquet  2_0_0.parquet  3_0_0.parquet  4_0_0.parquet  
> 5_0_0.parquet
> ./2015/02:
> 0_0_0.parquet
> ./2015/03:
> 0_0_0.parquet
> ./2015/04:
> 0_0_0.parquet
> ./2016:
> 01  parquet.file
> ./2016/01:
> parquet.file
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to