seddonm1 commented on a change in pull request #9233:
URL: https://github.com/apache/arrow/pull/9233#discussion_r575535635
##########
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -398,97 +405,165 @@ fn group_aggregate_batch(
Ok(accumulators)
}
-/// Create a key `Vec<u8>` that is used as key for the hashmap
-pub(crate) fn create_key(
- group_by_keys: &[ArrayRef],
+/// Appends a sequence of [u8] bytes for the value in `col[row]` to
+/// `vec` to be used as a key into the hash map for a dictionary type
+///
+/// Note that ideally, for dictionary encoded columns, we would be
+/// able to simply use the dictionary idicies themselves (no need to
+/// look up values) or possibly simply build the hash table entirely
+/// on the dictionary indexes.
+///
+/// This aproach would likely work (very) well for the common case,
+/// but it also has to to handle the case where the dictionary itself
+/// is not the same across all record batches (and thus indexes in one
+/// record batch may not correspond to the same index in another)
+fn dictionary_create_key_for_col<K: ArrowDictionaryKeyType>(
Review comment:
this makes sense and the comment really helps so :+1:
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]