[
https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697079#comment-16697079
]
Anton Gozhiy edited comment on DRILL-6865 at 11/26/18 10:30 AM:
----------------------------------------------------------------
The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey/2 < 5
{code}
was (Author: angozhiy):
The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multy` where n_nationkey > 5 and n_nationkey/2 < 5
{code}
> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
> Key: DRILL-6865
> URL: https://issues.apache.org/jira/browse/DRILL-6865
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.14.0
> Reporter: Volodymyr Vysotskyi
> Assignee: Volodymyr Vysotskyi
> Priority: Blocker
> Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or
> all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate
> does not support, they can be omitted for some cases from the resulting
> filter predicate. When row groups fully match predicates which left in the
> filter, the whole filter is removed from the plan and the wrong result is
> returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet`
> where n_nationkey > 5;
> create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet`
> where n_nationkey < 5;
> select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like
> '%10%';
> {code}
> returns
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0 | n_nationkey | n_name | n_regionkey |
> n_comment
> |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1 | 6 | FRANCE | 3 | refully final
> requests. regular, ironi
> |
> | t1 | 7 | GERMANY | 3 | l platelets.
> regular accounts x-ray: unusual, regular acco
> |
> | t1 | 8 | INDIA | 2 | ss excuses cajole
> slyly across the packages. deposits print aroun
> |
> | t1 | 9 | INDONESIA | 2 | slyly express
> asymptotes. regular deposits haggle slyly. carefully ironic hockey players
> sleep blithely. carefull |
> | t1 | 10 | IRAN | 4 | efully alongside of
> the slyly final dependencies.
> |
> | t1 | 11 | IRAQ | 4 | nic deposits boost
> atop the quickly final requests? quickly regula
> |
> | t1 | 12 | JAPAN | 2 | ously. final,
> express gifts cajole a
> |
> | t1 | 13 | JORDAN | 4 | ic deposits are
> blithely about the carefully regular pa
> |
> | t1 | 14 | KENYA | 0 | pending excuses
> haggle furiously deposits. pending, express pinto beans wake fluffily past t
> |
> | t1 | 15 | MOROCCO | 0 | rns. blithely bold
> courts among the closely regular packages use furiously bold platelets?
> |
> | t1 | 16 | MOZAMBIQUE | 0 | s. ironic, unusual
> asymptotes wake blithely r
> |
> | t1 | 17 | PERU | 1 | platelets. blithely
> pending dependencies use fluffily across the even pinto beans. carefully
> silent accoun |
> | t1 | 18 | CHINA | 2 | c dependencies.
> furiously express notornis sleep slyly regular accounts. ideas sleep. depos
> |
> | t1 | 19 | ROMANIA | 3 | ular asymptotes are
> about the furious multipliers. express dependencies nag above the ironically
> ironic account |
> | t1 | 20 | SAUDI ARABIA | 4 | ts. silent requests
> haggle. closely express packages sleep across the blithely
> |
> | t1 | 21 | VIETNAM | 2 | hely enticingly
> express accounts. even, final
> |
> | t1 | 22 | RUSSIA | 3 | requests against
> the platelets use never according to the quickly regular pint
> |
> | t1 | 23 | UNITED KINGDOM | 3 | eans boost
> carefully special requests. accounts are. carefull
> |
> | t1 | 24 | UNITED STATES | 1 | y final packages.
> slow foxes cajole quickly. quickly silent platelets breach ironic accounts.
> unusual pinto be |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0 | n_nationkey | n_name | n_regionkey |
> n_comment |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1 | 10 | IRAN | 4 | efully alongside of the
> slyly final dependencies. |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contains a predicate which wasn't
> applied:
> {noformat}
> 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0,
> cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory},
> id = 400
> 00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount
> = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0
> memory}, id = 399
> 00-02 Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR
> T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io,
> 0.0 network, 0.0 memory}, id = 398
> 00-03 Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType =
> RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative
> cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04 Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=/tmp/multi/t1/0_0_0.parquet]],
> selectionRoot=file:/tmp/multi, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType =
> RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative
> cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with
> single row group is queried:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from
> cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where
> n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00 Screen
> 00-01 Project(**=[$0])
> 00-02 Project(T0¦¦**=[$0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[>($1, -1)])
> 00-05 Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06 Scan(table=[[dfs, tmp, singleRowGroupTable]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=file:/tmp/singleRowGroupTable]],
> selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}
> *Also, for the case when a table has multiple files, and filter matches all
> the table, it is not removed from the plan:*
> {code:sql}
> select * from dfs.tmp.`multi` where n_nationkey > -1;
> {code}
> has plan
> {noformat}
> 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 12.0,
> cumulative cost = {109.2 rows, 277.2 cpu, 48.0 io, 0.0 network, 0.0 memory},
> id = 196
> 00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount
> = 12.0, cumulative cost = {108.0 rows, 276.0 cpu, 48.0 io, 0.0 network, 0.0
> memory}, id = 195
> 00-02 Project(T0¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR
> T0¦¦**): rowcount = 12.0, cumulative cost = {96.0 rows, 264.0 cpu, 48.0 io,
> 0.0 network, 0.0 memory}, id = 194
> 00-03 SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR
> T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative cost = {84.0 rows,
> 252.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 193
> 00-04 Filter(condition=[>($1, -1)]) : rowType =
> RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative
> cost = {72.0 rows, 240.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 192
> 00-05 Project(T0¦¦**=[$0], n_nationkey=[$1]) : rowType =
> RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 24.0, cumulative
> cost = {48.0 rows, 96.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 191
> 00-06 Scan(table=[[dfs, tmp, multi]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=file:/tmp/multi/t2/0_0_0.parquet], ReadEntryWithPath
> [path=file:/tmp/multi/t1/0_0_0.parquet]], selectionRoot=file:/tmp/multi,
> numFiles=2, numRowGroups=2, usedMetadataFile=false, columns=[`**`,
> `n_nationkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY n_nationkey):
> rowcount = 24.0, cumulative cost = {24.0 rows, 48.0 cpu, 48.0 io, 0.0
> network, 0.0 memory}, id = 190
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)