adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917416683
> 15 of the regressing ClickBench queries (Q10-Q22, Q25, Q27) filter on a column that is also in the `SELECT` projection. When all filter columns are already projected, the RowFilter provides no I/O savings, those columns must be decoded regardless. The overhead is pure loss. Is this true if there are more than 1 column selected and the filter is very selective? E.g. `select id, long_message from t where id = 123` and long_message like '%foo%'`. If we push `id` down as a row filter we can avoid 99% of the decode (we only have to decode 1 row / page / minimum unit of `long_message`. IMO in a case like this the ideal would be to evaluate `id = 123` as a row filter and then `long_message like '%foo%'` as a remainder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
