adriangb opened a new issue, #14054:
URL: https://github.com/apache/datafusion/issues/14054

   Follow up to #507.
   
   Predicate pruning is a powerful technique to speed up queries by skipping 
entire files / pieces of work based on summary statistics of the data.
   
   This issue proposes implementing predicate pruning for expressions such as 
`lower(col) = 'abc'`. The idea is that if we have a min stat such as `AbC` we 
should be able to transform it to `'abc'` and push down the predicate (in this 
case it might match). Or given the min/max `YYY`/`ZZZ` then `lower(col) = 
'abc'` could never match so the file can be skipped.
   
   To implement this you'll need to make a PR similar to 
https://github.com/apache/datafusion/pull/12978 and add fuzz tests (see 
https://github.com/apache/datafusion/pull/13253).
   
   One thing to think about is how we can make this work in concert with other 
predicate push down. That is, it would be ideal if something like this could be 
pushed down: `lower(col) like 'abc%'`. That may require a lot of refactoring 
and might need to be done in a series of PRs, an initial PR that just 
implements the `=` case would be a good start to prove that it's possible. But 
it may also be worth exploring generalizing e.g. `lower(col) like 'abc%' 
becomes `col ilike 'abc%'` which we then push down? A discussion of pros and 
cons is warranted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to