Anisha Reddy created DRILL-7129:
-----------------------------------
Summary: Join with more than 1 condition is not using stats to
compute row count estimate
Key: DRILL-7129
URL: https://issues.apache.org/jira/browse/DRILL-7129
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Anisha Reddy
Fix For: 1.17.0
Below are the details:
{code:java}
0: jdbc:drill:drillbit=10.10.101.108> select count(*) from
`table_stats/Tpch0.01/parquet/lineitem`; +---------+ | EXPR$0 | +---------+ |
57068 | +---------+ 1 row selected (0.179 seconds)
0: jdbc:drill:drillbit=10.10.101.108> select count(*) from
`table_stats/Tpch0.01/parquet/partsupp`; +---------+ | EXPR$0 | +---------+ |
7474 | +---------+ 1 row selected (0.171 seconds)
0: jdbc:drill:drillbit=10.10.101.108> select count(*) from
`table_stats/Tpch0.01/parquet/lineitem` l,
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey
and l.l_suppkey = ps.ps_suppkey; +---------+ | EXPR$0 | +---------+ | 53401 |
+---------+ 1 row selected (0.769 seconds)
0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes
for select * from `table_stats/Tpch0.01/parquet/lineitem` l,
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey
and l.l_suppkey = ps.ps_suppkey;
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| text | json |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0):
rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0
io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0],
**0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount =
57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0
network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0],
T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR
T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 cpu,
193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03
HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join:
=[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY
l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount =
57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0
network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0],
l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**,
ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {114136.0
rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107572 00-07
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/lineitem]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]],
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem,
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `l_partkey`,
`l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY l_partkey, ANY
l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 rows, 171204.0 cpu,
171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 Project(T11¦¦**=[$0],
ps_partkey=[$1], ps_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T11¦¦**,
ANY ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {14948.0
rows, 44844.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107574 00-06
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/partsupp]],
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]],
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp,
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`,
`ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY
ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 rows,
22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
{code}
The ndv for l_partkey = 2000
ps_partkey = 1817
l_supkey = 100
ps_suppkey = 100
We see that such joins is just taking the max of left side and the right side
table.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)