Dandandan commented on pull request #786:
URL: https://github.com/apache/arrow-datafusion/pull/786#issuecomment-887731997


   > In any event we will likely need more than the existing 16 bytes per group 
key to handle null values, but we probably don't need an extra 56 bytes per key.
   
   I think in the longer run, we can keep the keys in a contiguous (mutable) 
array instead and keep offsets/pointers to the values in this array (and null 
values can be stored in a bitmap, so only 1 bit per value). This will only need 
roughly 8 bytes for the pointer + the key value in Arrow format. This will also 
enable other optimizations.
   
   The worst case is now something like `max(id) from t group by id` where the 
id is unique and has a key like `u64`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to