sarahyurick opened a new issue, #6432:
URL: https://github.com/apache/arrow-datafusion/issues/6432
### Describe the bug
In the Dask-SQL project, we have relied on DataFusion to create `IS NOT
NULL` filters at the `TableScan` level whenever a column is involved in a join.
However, it looks like recent changes may have removed this feature?
### To Reproduce
The query
```
SELECT d_col
FROM c_table
JOIN d_table ON d_col=c_col
```
has the `LogicalPlan`
```
Projection: d_table.d_col
Inner Join: Filter: d_table.d_col = c_table.c_col
TableScan: c_table projection=[c_col]
TableScan: d_table projection=[d_col]
```
### Expected behavior
It still works when we write the query with a `WHERE` clause.
```
SELECT d_col
FROM c_table, d_table WHERE d_col=c_col
```
produces
```
Projection: d_table.d_col
Inner Join: c_table.c_col= d_table.d_col
TableScan: c_table projection=[c_col], full_filters=[c_table.c_col IS
NOT NULL]
TableScan: d_table projection=[d_col], full_filters=[d_table.d_col IS
NOT NULL]
```
### Additional context
I'm not quite sure when this change was introduced and if so, why? Is this
something that DataFusion would be willing to fix, or would it be preferred
that Dask-SQL re-adds the optimizer rule on our side?
cc @ayushdg @jdye64
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]