Dandandan commented on pull request #786: URL: https://github.com/apache/arrow-datafusion/pull/786#issuecomment-887731997
> In any event we will likely need more than the existing 16 bytes per group key to handle null values, but we probably don't need an extra 56 bytes per key. I think in the longer run, we can keep the keys in a contiguous (mutable) array instead and keep offsets/pointers to the values in this array (and null values can be stored in a bitmap, so only 1 bit per value). This will only need roughly 8 bytes for the pointer + the key value in Arrow format. This will also enable other optimizations. The worst case is now something like `max(id) from t group by id` where the id is unique and has a key like `u64`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org