suneet-s commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-623834181
I was able to re-produce this on the sample wikipedia dataset.
```
with abc as
(
SELECT page, "__time", sum_added, sum_deleted from wikipedia WHERE
"countryName" = 'United States' and "__time" > '2016'
)
, klm AS
(
SELECT
t1.page
from
abc as t1 left join abc as t2 on t1.page = t2.page
where t1.sum_added < t2.sum_added
group by 1
)
SELECT count(*) from klm
```
The query above [count(*)] returns 306 results. This is the same as the
number of results if there is no filter applied (ie remove where t1.sum_added <
t2.sum_added)
If we change the query to select *, it returns 18 results, which appears to
be correct.
This issue only happens if the filter contains an expression from t2. For
example, using a filter `t1.sum_added > t1.sum_deleted` or `t1.sum_added >
1000` returns the expected consistent results, but `t2.sum_added > 1999`
returns inconsistent results.
My hypothesis is that something in calcite is trying to deciding that we
don't need to apply the filters for the aggregate query. I believe this to be
the case because I ran the queries with `enableJoinFilterPushDown` set to false.
I could not reproduce this against a druid datasource
```
SELECT t1.page from wikipedia as t1 left join wikipedia as t2 on t1.page =
t2.page WHERE t1."countryName" = 'United States' and t1.sum_added <
t2.sum_added group by 1
```
Next steps:
* Look at the calcite plans more closely to find the difference in the
generated plans
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]