alamb opened a new issue, #18824:
URL: https://github.com/apache/datafusion/issues/18824

   The idea is to improve the INLIST performance by using specialized HashSets 
for different data types, and thus avoiding dynamic dispatch for different types
   
   in https://github.com/apache/datafusion/pull/18449 we implemented such a 
specialization for `Int32` but we should probably do it for all the types that 
had a 
[specialization](https://github.com/apache/datafusion/pull/18449/files#diff-ff8086fafbfe5021e5f7d51d96aaae2cf65f779ac3fae5fc182f87e956bb0550L186)
 previously
   1. All primitive types  (Int8, Int32, etc)
   2. Boolean
   3. Utf8/LargeUtf8/Utf8View
   4. Binary/LargeBinary/BinaryView
   
   
   As @adriangb says:
   
   I'm surprised that doing dynamic dispatch once per batch we evaluate as 
opposed to twice per batch we evaluate makes that much of a difference. What 
would make sense that makes a difference to me is doing it once per element vs. 
once per batch. But I guess that's what benchmarks say!
   
   That does leave me with a question... could we squeeze out even more 
performance if we specialize for ~ all scalar types? It wouldn't be that hard 
to write a macro and have AI do the copy pasta of implementing it for all of 
the types... I'll open a follow up ticket.
   
   _Originally posted by @adriangb in 
https://github.com/apache/datafusion/issues/18449#issuecomment-3546450771_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to