[ 
https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697079#comment-16697079
 ] 

Anton Gozhiy edited comment on DRILL-6865 at 11/26/18 10:30 AM:
----------------------------------------------------------------

The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey/2 < 5
{code}


was (Author: angozhiy):
The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multy` where n_nationkey > 5 and n_nationkey/2 < 5
{code}


> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
>                 Key: DRILL-6865
>                 URL: https://issues.apache.org/jira/browse/DRILL-6865
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.14.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Blocker
>             Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or 
> all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate 
> does not support, they can be omitted for some cases from the resulting 
> filter predicate. When row groups fully match predicates which left in the 
> filter, the whole filter is removed from the plan and the wrong result is 
> returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey > 5;
> create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey < 5;
> select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like 
> '%10%';
> {code}
> returns
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0  | n_nationkey  |     n_name      | n_regionkey  |                     
>                                  n_comment                                    
>                   |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1    | 6            | FRANCE          | 3            | refully final 
> requests. regular, ironi                                                      
>                         |
> | t1    | 7            | GERMANY         | 3            | l platelets. 
> regular accounts x-ray: unusual, regular acco                                 
>                          |
> | t1    | 8            | INDIA           | 2            | ss excuses cajole 
> slyly across the packages. deposits print aroun                               
>                     |
> | t1    | 9            | INDONESIA       | 2            |  slyly express 
> asymptotes. regular deposits haggle slyly. carefully ironic hockey players 
> sleep blithely. carefull  |
> | t1    | 10           | IRAN            | 4            | efully alongside of 
> the slyly final dependencies.                                                 
>                   |
> | t1    | 11           | IRAQ            | 4            | nic deposits boost 
> atop the quickly final requests? quickly regula                               
>                    |
> | t1    | 12           | JAPAN           | 2            | ously. final, 
> express gifts cajole a                                                        
>                         |
> | t1    | 13           | JORDAN          | 4            | ic deposits are 
> blithely about the carefully regular pa                                       
>                       |
> | t1    | 14           | KENYA           | 0            |  pending excuses 
> haggle furiously deposits. pending, express pinto beans wake fluffily past t  
>                      |
> | t1    | 15           | MOROCCO         | 0            | rns. blithely bold 
> courts among the closely regular packages use furiously bold platelets?       
>                    |
> | t1    | 16           | MOZAMBIQUE      | 0            | s. ironic, unusual 
> asymptotes wake blithely r                                                    
>                    |
> | t1    | 17           | PERU            | 1            | platelets. blithely 
> pending dependencies use fluffily across the even pinto beans. carefully 
> silent accoun          |
> | t1    | 18           | CHINA           | 2            | c dependencies. 
> furiously express notornis sleep slyly regular accounts. ideas sleep. depos   
>                       |
> | t1    | 19           | ROMANIA         | 3            | ular asymptotes are 
> about the furious multipliers. express dependencies nag above the ironically 
> ironic account     |
> | t1    | 20           | SAUDI ARABIA    | 4            | ts. silent requests 
> haggle. closely express packages sleep across the blithely                    
>                   |
> | t1    | 21           | VIETNAM         | 2            | hely enticingly 
> express accounts. even, final                                                 
>                       |
> | t1    | 22           | RUSSIA          | 3            |  requests against 
> the platelets use never according to the quickly regular pint                 
>                     |
> | t1    | 23           | UNITED KINGDOM  | 3            | eans boost 
> carefully special requests. accounts are. carefull                            
>                            |
> | t1    | 24           | UNITED STATES   | 1            | y final packages. 
> slow foxes cajole quickly. quickly silent platelets breach ironic accounts. 
> unusual pinto be      |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0  | n_nationkey  | n_name  | n_regionkey  |                      
> n_comment                      |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1    | 10           | IRAN    | 4            | efully alongside of the 
> slyly final dependencies.   |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contains a predicate which wasn't 
> applied:
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, 
> cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, 
> id = 400
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount 
> = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0 
> memory}, id = 399
> 00-02        Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR 
> T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 
> 0.0 network, 0.0 memory}, id = 398
> 00-03          Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType = 
> RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04            Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/multi/t1/0_0_0.parquet]], 
> selectionRoot=file:/tmp/multi, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = 
> RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with 
> single row group is queried:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from 
> cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where 
> n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00    Screen
> 00-01      Project(**=[$0])
> 00-02        Project(T0¦¦**=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[>($1, -1)])
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/singleRowGroupTable]], 
> selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}
> *Also, for the case when a table has multiple files, and filter matches all 
> the table, it is not removed from the plan:*
> {code:sql}
> select * from dfs.tmp.`multi` where n_nationkey > -1;
> {code}
> has plan
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 12.0, 
> cumulative cost = {109.2 rows, 277.2 cpu, 48.0 io, 0.0 network, 0.0 memory}, 
> id = 196
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount 
> = 12.0, cumulative cost = {108.0 rows, 276.0 cpu, 48.0 io, 0.0 network, 0.0 
> memory}, id = 195
> 00-02        Project(T0¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR 
> T0¦¦**): rowcount = 12.0, cumulative cost = {96.0 rows, 264.0 cpu, 48.0 io, 
> 0.0 network, 0.0 memory}, id = 194
> 00-03          SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR 
> T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative cost = {84.0 rows, 
> 252.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 193
> 00-04            Filter(condition=[>($1, -1)]) : rowType = 
> RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative 
> cost = {72.0 rows, 240.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 192
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1]) : rowType = 
> RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 24.0, cumulative 
> cost = {48.0 rows, 96.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 191
> 00-06                Scan(table=[[dfs, tmp, multi]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/multi/t2/0_0_0.parquet], ReadEntryWithPath 
> [path=file:/tmp/multi/t1/0_0_0.parquet]], selectionRoot=file:/tmp/multi, 
> numFiles=2, numRowGroups=2, usedMetadataFile=false, columns=[`**`, 
> `n_nationkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY n_nationkey): 
> rowcount = 24.0, cumulative cost = {24.0 rows, 48.0 cpu, 48.0 io, 0.0 
> network, 0.0 memory}, id = 190
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to