[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

GitBox Mon, 31 May 2021 08:12:39 -0700


Dandandan commented on a change in pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642554876




##########
File path: datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -339,6 +325,36 @@ pin_project! {
     }
 }
 
+fn hash_(group_values: &[ArrayRef]) -> Result<MutableBuffer<u64>> {
+    // compute the hashes
+    // todo: we should be able to use `MutableBuffer<u64>` to compute the hash 
and ^ them without
+    // allocating all the hashes before ^ them
+    let hashes = group_values
+        .iter()
+        .map(|x| {
+            let a = match x.data_type() {
+                DataType::Dictionary(_, d) => {
+                    // todo: think about how to perform this more efficiently
+                    // * first hash, then unpack
+                    // * do not unpack at all, and instead figure out a way to 
leverage dictionary-encoded.
+                    let unpacked = arrow2::compute::cast::cast(x.as_ref(), d)?;
+                    arrow2::compute::hash::hash(unpacked.as_ref())
+                }
+                _ => arrow2::compute::hash::hash(x.as_ref()),
+            };
+            Ok(a?)
+        })
+        .collect::<Result<Vec<_>>>()?;
+    let hash = MutableBuffer::<u64>::from(hashes[0].values());
+
+    Ok(hashes.iter().skip(1).fold(hash, |mut acc, x| {
+        acc.iter_mut()
+            .zip(x.values().iter())
+            .for_each(|(hash, other)| *hash ^= other);

Review comment:
       Simple example is that combining two same hashes always map to `0` - 
regardless of which value they have and in which order they appear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

Reply via email to