alamb commented on pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#issuecomment-981570431
@xudong963 thank you for the examples
Given this input:
```sql
❯ create table table1 as SELECT * FROM (VALUES (1), (2), (null)) as t;
0 rows in set. Query took 0.006 seconds.
❯ create table table2 as SELECT * FROM (VALUES (1), (3), (null)) as t;
0 rows in set. Query took 0.003 seconds.
```
The plan that comes out is incorrect:
```sql
❯ explain SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
WHERE table2.column1 IS NOT NULL;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #table1.column1, #table2.column1
|
| | Join: #table1.column1 = #table2.column1
|
| | Filter: #table1.column1 IS NOT NULL
|
| | TableScan: table1 projection=Some([0])
|
| | Filter: #table2.column1 IS NOT NULL
|
| | TableScan: table2 projection=Some([0])
|
```
I think the problem is this line:
```
| | Filter: #table1.column1 IS NOT NULL
|
```
Namely something has added a predicate to `table1.column1` (where the
original query has only a predicate on `table2`).
The same problem happens for `>5` type predicates too:
```sql
explain SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
WHERE table2.column1 > 5;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #table1.column1, #table2.column1
|
| | Join: #table1.column1 = #table2.column1
|
| | Filter: #table1.column1 > Int64(5)
|
| | TableScan: table1 projection=Some([0])
|
| | Filter: #table2.column1 > Int64(5)
|
| | TableScan: table2 projection=Some([0])
```
The equivalent plan from postgres only pushes the filter down for `table2`
(there is no filter on `table1.column1`):
```sql
alamb=# explain SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
WHERE table2.column1 > 5;
QUERY PLAN
----------------------------------------------------------------------
Merge Join (cost=263.02..429.84 rows=10838 width=8)
Merge Cond: (table2.column1 = table1.column1)
-> Sort (cost=83.23..85.36 rows=850 width=4)
Sort Key: table2.column1
-> Seq Scan on table2 (cost=0.00..41.88 rows=850 width=4)
Filter: (column1 > 5)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: table1.column1
-> Seq Scan on table1 (cost=0.00..35.50 rows=2550 width=4)
(9 rows)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]