alamb commented on code in PR #7016:
URL: https://github.com/apache/arrow-datafusion/pull/7016#discussion_r1268044930


##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -60,6 +60,151 @@ pub(crate) enum ExecutionState {
 
 use super::AggregateExec;
 
+/// An interning store for group keys
+trait GroupValues: Send {
+    /// Calculates the `groups` for each input row of `cols`
+    fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec<usize>) -> 
Result<()>;
+
+    /// Returns the number of bytes used by this [`GroupValues`]
+    fn size(&self) -> usize;
+
+    /// Returns true if this [`GroupValues`] is empty
+    fn is_empty(&self) -> bool;
+
+    /// The number of values stored in this [`GroupValues`]
+    fn len(&self) -> usize;
+
+    /// Flushes the unique [`GroupValues`]
+    fn flush(&mut self) -> Result<Vec<ArrayRef>>;
+}
+
+/// A [`GroupValues`] making use of [`Rows`]
+struct GroupValuesRows {
+    /// Converter for the group values
+    row_converter: RowConverter,
+
+    /// Logically maps group values to a group_index in
+    /// [`Self::group_values`] and in each accumulator
+    ///
+    /// Uses the raw API of hashbrown to avoid actually storing the
+    /// keys (group values) in the table
+    ///
+    /// keys: u64 hashes of the GroupValue
+    /// values: (hash, group_index)
+    map: RawTable<(u64, usize)>,

Review Comment:
   I also was thinking this structure will allow for an easy insertion of the 
FixedWidthRowFormat if that turns out to be better as well. So I think this PR 
is a step in the right direction regardless of which approach we choose to take



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to