tobixdev opened a new pull request, #16977:
URL: https://github.com/apache/datafusion/pull/16977

   ## Which issue does this PR close?
   
   Tries to improve the hashing performance for UDFs.
   
   ## Rationale for this change
   
   We had some regressions in planning time when updating to DF 49.0 which can 
be resolved by improving the hashing performance.
   
   ## What changes are included in this PR?
   
   - Do not hash the signature of UDFs as it usually is constant per `UDFImpl`, 
hashing must not produce unique values, and hashing the data types can take 
some time.
   - Switching from `DefaultHasher` to `AHasher` (also for consistency reasons).
   
   ## Are these changes tested?
   
   Existing tests via `HashMap` etc.
   
   ## Are there any user-facing changes?
   
   Users could observe new hash values.
   
   In theory, the `HashMap` performance for UDFs with a dynamic signature can 
worsen as this discrepancy will only be detected in the equals check. However, 
UDFs should provide a custom hash/eq implementation if they contain state so I 
think this is fine (especially with #16677 coming)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to