Dandandan commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063754335

   > > is it using a hash table or open addressing (df doesn't have the latter)
   > 
   > [@XiangpengHao](https://github.com/XiangpengHao) has mentioned several 
times that we think DuckDB uses radix trees (which work like hash tables but 
save the expensive hash).
   > 
   > He has an implementation here https://github.com/XiangpengHao/congee
   > 
   > Maybe we could investigate using that structure rather than a hash table
   > 
   > > is the hash-repartition taking considerable time (RepartitionExec, which 
involves 2 copies of the input) - what are the other engines doing?
   > 
   > Not only is it copying the values twice it _hash_es them twice which can 
be quite expensive
   
   Ah nice, wasn't aware. That seems a nice direction!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to