alamb commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3274751989
> However to get the filter value (it doesn't have to be super accurate, just close to reduce the reading scope) it is possible to scan `select min(ts) from t1` first, and this refers to a single column which might be cheap, and even cheaper if min/max can be derived from the footer, and then apply the value for TopK filter. I think one major idea is to *reuse state / information that is already present in the operators* -- so for example the TopK operator already has a topK heap, and the dynamic filter concept allows this information to be passed down to the scan. > How it makes sure we dont need to scan 100M rows as before, is it for any scenario, or when underlying files data are sorted? I don't think the dynamic filter has any guarantees that it will filter rows -- for example, in the pathalogical case where the data is scanned in reverse order, it will not filter any However, the idea is that updaing the dynamic filter is cheap and it does help in many real world settings, so it is overall a good optimization to do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
