tustvold opened a new issue, #1851:
URL: https://github.com/apache/arrow-rs/issues/1851

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   A while back I implemented an optimized string dictionary builder for 
[IOx](https://github.com/influxdata/influxdb_iox/blob/main/arrow_util/src/dictionary.rs).
 This contains two major tricks to provide better performance:
   
   * Use ahash instead of SipHash - this alone provides a 40% speedup
   * Use hashbrown's `raw_entry_mut` to not duplicate string values into the 
hashmap
   
   I have an implementation of this for arrow that needs a bit more polish, but 
leads to a 60% speedup over the current implementation in arrow. Unfortunately 
it depends on #1850 as it needs to be able to read the string data from an 
in-progress `StringBuilder`
   
   **Describe the solution you'd like**
   
   Implement #1850 and then add this functionality
   
   **Describe alternatives you've considered**
   
   We could not do this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to