adriangb commented on PR #18449: URL: https://github.com/apache/datafusion/pull/18449#issuecomment-3534978594
> One possibility is that you replace the hashes buffer with a "hasher" similar to the comparator and then hash the value inside the iteration in `StaticFilter::contains` after the null check. (More similar to the original implementation which just called `.hash_one`). Yeah I think that makes a ton of sense here & in other places as well, but it would be a new thing to add into `hash_utils.rs` and I think more code than we have here. I also think there are going to be 3 patterns in the end: 1) You want to hash the whole thing and have a buffer 2) You want to hash the whole thing and don't have a buffer, re-using a thread local is nice 3) You want to hash one at a time (i.e. using a hasher like you suggest) I'd also guess if there are few nulls (2) will beat (3) but if there are a lot of nulls (3) will beat 2... So for now I'm going to stick with the current approach if perf is acceptable, this is already a tangent on a larger change, we can make a ticket to consider doing (3) in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
