pepijnve opened a new issue, #18496:
URL: https://github.com/apache/datafusion/issues/18496
### Is your feature request related to a problem or challenge?
Histograms in SQL can be computed fairly easily using `case` and `order by`.
An example of this kind of query is:
```
select
case
when salary between 75000 and 90000 then '75000-90000'
when salary between 90000 and 120000 then '90000-120000'
else '120000+'
end as salary_band,
count(*)
from employee_salary
group by 1
```
The default evaluation logic (which is logically equivalent to a chain of
`if/else` statements) does not leverage the fact that the ranges being tested
here are static and disjunct. In the example above it would be more optimal to
compute the boundaries as an array `[75000, 90000, 120000]` and then try to
find the appropriate bucket index by scanning that array.
It would be useful to have a dedicated case evaluation code path for this
that does this more optimally. This is similar in idea to the work that's being
done in https://github.com/apache/datafusion/pull/18183 for equality checks.
### Describe the solution you'd like
In the `case` physical expression implementation, analyse the when
conditions. If they are all range tests for the same input value using literals
for the range boundaries and the order of the when expressions permits it,
extract the range boundaries to an array and test the input value against the
array to determine the appropriate then expression index.
If the then expressions are all literals, then there's also no need to
actually evaluate those. Instead a lookup table can be used to map each row to
the appropriate output value.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]