[
https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Volodymyr Vysotskyi updated DRILL-6865:
---------------------------------------
Description:
In DRILL-5796 was implemented removing the filter from the plan when some (or
all) row groups of parquet table fully match the filter.
For the case when filter has some predicates which parquet filter predicate
does not support, they can be omitted for some cases from the resulting filter
predicate. When row groups fully match predicates which left in the filter, the
whole filter is removed from the plan and the wrong result is returned.
Example of the query for reproducing this bug:
{code:sql}
create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet` where
n_nationkey > 5;
create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet` where
n_nationkey < 5;
select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like '%10%';
{code}
returns
{noformat}
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| dir0 | n_nationkey | n_name | n_regionkey |
n_comment
|
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| t1 | 6 | FRANCE | 3 | refully final
requests. regular, ironi
|
| t1 | 7 | GERMANY | 3 | l platelets. regular
accounts x-ray: unusual, regular acco
|
| t1 | 8 | INDIA | 2 | ss excuses cajole
slyly across the packages. deposits print aroun
|
| t1 | 9 | INDONESIA | 2 | slyly express
asymptotes. regular deposits haggle slyly. carefully ironic hockey players
sleep blithely. carefull |
| t1 | 10 | IRAN | 4 | efully alongside of
the slyly final dependencies.
|
| t1 | 11 | IRAQ | 4 | nic deposits boost
atop the quickly final requests? quickly regula
|
| t1 | 12 | JAPAN | 2 | ously. final, express
gifts cajole a
|
| t1 | 13 | JORDAN | 4 | ic deposits are
blithely about the carefully regular pa
|
| t1 | 14 | KENYA | 0 | pending excuses
haggle furiously deposits. pending, express pinto beans wake fluffily past t
|
| t1 | 15 | MOROCCO | 0 | rns. blithely bold
courts among the closely regular packages use furiously bold platelets?
|
| t1 | 16 | MOZAMBIQUE | 0 | s. ironic, unusual
asymptotes wake blithely r
|
| t1 | 17 | PERU | 1 | platelets. blithely
pending dependencies use fluffily across the even pinto beans. carefully silent
accoun |
| t1 | 18 | CHINA | 2 | c dependencies.
furiously express notornis sleep slyly regular accounts. ideas sleep. depos
|
| t1 | 19 | ROMANIA | 3 | ular asymptotes are
about the furious multipliers. express dependencies nag above the ironically
ironic account |
| t1 | 20 | SAUDI ARABIA | 4 | ts. silent requests
haggle. closely express packages sleep across the blithely
|
| t1 | 21 | VIETNAM | 2 | hely enticingly
express accounts. even, final
|
| t1 | 22 | RUSSIA | 3 | requests against the
platelets use never according to the quickly regular pint
|
| t1 | 23 | UNITED KINGDOM | 3 | eans boost carefully
special requests. accounts are. carefull
|
| t1 | 24 | UNITED STATES | 1 | y final packages.
slow foxes cajole quickly. quickly silent platelets breach ironic accounts.
unusual pinto be |
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
{noformat}
but single row should be returned:
{noformat}
+-------+--------------+---------+--------------+-----------------------------------------------------+
| dir0 | n_nationkey | n_name | n_regionkey |
n_comment |
+-------+--------------+---------+--------------+-----------------------------------------------------+
| t1 | 10 | IRAN | 4 | efully alongside of the slyly
final dependencies. |
+-------+--------------+---------+--------------+-----------------------------------------------------+
{noformat}
Filter is removed from the plan, but it contains a predicate which wasn't
applied:
{noformat}
00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0,
cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, id
= 400
00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount =
19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0
memory}, id = 399
00-02 Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T1¦¦**):
rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 0.0 network,
0.0 memory}, id = 398
00-03 Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType =
RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative
cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
00-04 Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=/tmp/multi/t1/0_0_0.parquet]],
selectionRoot=file:/tmp/multi, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType =
RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative cost
= {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
{noformat}
----
Additionally, a filter is not removed from the plan when parquet table with
single row group is queried
But for the case, when parquet table has a single row group, the filter is left
in the plan:
{code:sql}
create table dfs.tmp.`singleRowGroupTable` as select * from
cp.`tpch/nation.parquet`;
explain plan for select * from dfs.tmp.`singleRowGroupTable` where n_nationkey
> -1;
{code}
returns plan
{noformat}
00-00 Screen
00-01 Project(**=[$0])
00-02 Project(T0¦¦**=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[>($1, -1)])
00-05 Project(T0¦¦**=[$0], n_nationkey=[$1])
00-06 Scan(table=[[dfs, tmp, singleRowGroupTable]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/tmp/singleRowGroupTable]],
selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
{noformat}
was:
In DRILL-5796 was implemented removing the filter from the plan when some (or
all) row groups of parquet table fully match the filter.
For the case when filter has some predicates which parquet filter predicate
does not support, they can be omitted for some cases from the resulting filter
predicate. When row groups fully match predicates which left in the filter, the
whole filter is removed from the plan and the wrong result is returned.
Example of the query for reproducing this bug:
{code:sql}
create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet` where
n_nationkey > 5;
create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet` where
n_nationkey < 5;
select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like '%10%';
{code}
returns
{noformat}
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| dir0 | n_nationkey | n_name | n_regionkey |
n_comment
|
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| t1 | 6 | FRANCE | 3 | refully final
requests. regular, ironi
|
| t1 | 7 | GERMANY | 3 | l platelets. regular
accounts x-ray: unusual, regular acco
|
| t1 | 8 | INDIA | 2 | ss excuses cajole
slyly across the packages. deposits print aroun
|
| t1 | 9 | INDONESIA | 2 | slyly express
asymptotes. regular deposits haggle slyly. carefully ironic hockey players
sleep blithely. carefull |
| t1 | 10 | IRAN | 4 | efully alongside of
the slyly final dependencies.
|
| t1 | 11 | IRAQ | 4 | nic deposits boost
atop the quickly final requests? quickly regula
|
| t1 | 12 | JAPAN | 2 | ously. final, express
gifts cajole a
|
| t1 | 13 | JORDAN | 4 | ic deposits are
blithely about the carefully regular pa
|
| t1 | 14 | KENYA | 0 | pending excuses
haggle furiously deposits. pending, express pinto beans wake fluffily past t
|
| t1 | 15 | MOROCCO | 0 | rns. blithely bold
courts among the closely regular packages use furiously bold platelets?
|
| t1 | 16 | MOZAMBIQUE | 0 | s. ironic, unusual
asymptotes wake blithely r
|
| t1 | 17 | PERU | 1 | platelets. blithely
pending dependencies use fluffily across the even pinto beans. carefully silent
accoun |
| t1 | 18 | CHINA | 2 | c dependencies.
furiously express notornis sleep slyly regular accounts. ideas sleep. depos
|
| t1 | 19 | ROMANIA | 3 | ular asymptotes are
about the furious multipliers. express dependencies nag above the ironically
ironic account |
| t1 | 20 | SAUDI ARABIA | 4 | ts. silent requests
haggle. closely express packages sleep across the blithely
|
| t1 | 21 | VIETNAM | 2 | hely enticingly
express accounts. even, final
|
| t1 | 22 | RUSSIA | 3 | requests against the
platelets use never according to the quickly regular pint
|
| t1 | 23 | UNITED KINGDOM | 3 | eans boost carefully
special requests. accounts are. carefull
|
| t1 | 24 | UNITED STATES | 1 | y final packages.
slow foxes cajole quickly. quickly silent platelets breach ironic accounts.
unusual pinto be |
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
{noformat}
but single row should be returned:
{noformat}
+-------+--------------+---------+--------------+-----------------------------------------------------+
| dir0 | n_nationkey | n_name | n_regionkey |
n_comment |
+-------+--------------+---------+--------------+-----------------------------------------------------+
| t1 | 10 | IRAN | 4 | efully alongside of the slyly
final dependencies. |
+-------+--------------+---------+--------------+-----------------------------------------------------+
{noformat}
Filter is removed from the plan, but it contains a predicate which wasn't
applied:
{noformat}
00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0,
cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, id
= 400
00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount =
19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0
memory}, id = 399
00-02 Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T1¦¦**):
rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 0.0 network,
0.0 memory}, id = 398
00-03 Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType =
RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative
cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
00-04 Scan(table=[[dfs, tmp, multy]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=/tmp/multy/t1/0_0_0.parquet]],
selectionRoot=file:/tmp/multy, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType =
RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative cost
= {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
{noformat}
----
Additionally, a filter is not removed from the plan when parquet table with
single row group is queried
But for the case, when parquet table has a single row group, the filter is left
in the plan:
{code:sql}
create table dfs.tmp.`singleRowGroupTable` as select * from
cp.`tpch/nation.parquet`;
explain plan for select * from dfs.tmp.`singleRowGroupTable` where n_nationkey
> -1;
{code}
returns plan
{noformat}
00-00 Screen
00-01 Project(**=[$0])
00-02 Project(T0¦¦**=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[>($1, -1)])
00-05 Project(T0¦¦**=[$0], n_nationkey=[$1])
00-06 Scan(table=[[dfs, tmp, singleRowGroupTable]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/tmp/singleRowGroupTable]],
selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
{noformat}
> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
> Key: DRILL-6865
> URL: https://issues.apache.org/jira/browse/DRILL-6865
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.14.0
> Reporter: Volodymyr Vysotskyi
> Assignee: Volodymyr Vysotskyi
> Priority: Blocker
> Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or
> all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate
> does not support, they can be omitted for some cases from the resulting
> filter predicate. When row groups fully match predicates which left in the
> filter, the whole filter is removed from the plan and the wrong result is
> returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet`
> where n_nationkey > 5;
> create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet`
> where n_nationkey < 5;
> select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like
> '%10%';
> {code}
> returns
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0 | n_nationkey | n_name | n_regionkey |
> n_comment
> |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1 | 6 | FRANCE | 3 | refully final
> requests. regular, ironi
> |
> | t1 | 7 | GERMANY | 3 | l platelets.
> regular accounts x-ray: unusual, regular acco
> |
> | t1 | 8 | INDIA | 2 | ss excuses cajole
> slyly across the packages. deposits print aroun
> |
> | t1 | 9 | INDONESIA | 2 | slyly express
> asymptotes. regular deposits haggle slyly. carefully ironic hockey players
> sleep blithely. carefull |
> | t1 | 10 | IRAN | 4 | efully alongside of
> the slyly final dependencies.
> |
> | t1 | 11 | IRAQ | 4 | nic deposits boost
> atop the quickly final requests? quickly regula
> |
> | t1 | 12 | JAPAN | 2 | ously. final,
> express gifts cajole a
> |
> | t1 | 13 | JORDAN | 4 | ic deposits are
> blithely about the carefully regular pa
> |
> | t1 | 14 | KENYA | 0 | pending excuses
> haggle furiously deposits. pending, express pinto beans wake fluffily past t
> |
> | t1 | 15 | MOROCCO | 0 | rns. blithely bold
> courts among the closely regular packages use furiously bold platelets?
> |
> | t1 | 16 | MOZAMBIQUE | 0 | s. ironic, unusual
> asymptotes wake blithely r
> |
> | t1 | 17 | PERU | 1 | platelets. blithely
> pending dependencies use fluffily across the even pinto beans. carefully
> silent accoun |
> | t1 | 18 | CHINA | 2 | c dependencies.
> furiously express notornis sleep slyly regular accounts. ideas sleep. depos
> |
> | t1 | 19 | ROMANIA | 3 | ular asymptotes are
> about the furious multipliers. express dependencies nag above the ironically
> ironic account |
> | t1 | 20 | SAUDI ARABIA | 4 | ts. silent requests
> haggle. closely express packages sleep across the blithely
> |
> | t1 | 21 | VIETNAM | 2 | hely enticingly
> express accounts. even, final
> |
> | t1 | 22 | RUSSIA | 3 | requests against
> the platelets use never according to the quickly regular pint
> |
> | t1 | 23 | UNITED KINGDOM | 3 | eans boost
> carefully special requests. accounts are. carefull
> |
> | t1 | 24 | UNITED STATES | 1 | y final packages.
> slow foxes cajole quickly. quickly silent platelets breach ironic accounts.
> unusual pinto be |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0 | n_nationkey | n_name | n_regionkey |
> n_comment |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1 | 10 | IRAN | 4 | efully alongside of the
> slyly final dependencies. |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contains a predicate which wasn't
> applied:
> {noformat}
> 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0,
> cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory},
> id = 400
> 00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount
> = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0
> memory}, id = 399
> 00-02 Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR
> T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io,
> 0.0 network, 0.0 memory}, id = 398
> 00-03 Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType =
> RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative
> cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04 Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=/tmp/multi/t1/0_0_0.parquet]],
> selectionRoot=file:/tmp/multi, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType =
> RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative
> cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with
> single row group is queried
> But for the case, when parquet table has a single row group, the filter is
> left in the plan:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from
> cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where
> n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00 Screen
> 00-01 Project(**=[$0])
> 00-02 Project(T0¦¦**=[$0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[>($1, -1)])
> 00-05 Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06 Scan(table=[[dfs, tmp, singleRowGroupTable]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=file:/tmp/singleRowGroupTable]],
> selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)