james727 commented on a change in pull request #1579:
URL: https://github.com/apache/arrow-datafusion/pull/1579#discussion_r788302507



##########
File path: datafusion/src/physical_plan/expressions/distinct_expressions.rs
##########
@@ -705,4 +844,151 @@ mod tests {
 
         Ok(())
     }
+
+    // Ordering is unpredictable when using ARRAY_AGG(DISTINCT). Thus we 
cannot test by simply
+    // checking for equality of output, and it is difficult to sort since ORD 
is not implemented
+    // for ScalarValue. Thus we check for equality via the following:
+    //   1. `expected` and `actual` have the same number of elements.
+    //   2. `expected` contains no duplicates.
+    //   3. `expected` and `actual` contain the same unique elements.
+    fn check_distinct_array_agg(
+        input: ArrayRef,
+        expected: ScalarValue,
+        datatype: DataType,
+    ) -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", datatype.clone(), 
false)]);
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), 
vec![input])?;
+
+        let agg = Arc::new(DistinctArrayAgg::new(
+            col("a", &schema)?,
+            "bla".to_string(),
+            datatype,
+        ));
+        let actual = aggregate(&batch, agg)?;
+
+        match (expected, actual) {
+            (ScalarValue::List(Some(e), _), ScalarValue::List(Some(a), _)) => {
+                // Check that the inputs are the same length.
+                assert_eq!(e.len(), a.len());
+
+                let h1: HashSet<ScalarValue> = 
HashSet::from_iter(e.clone().into_iter());
+                let h2: HashSet<ScalarValue> = 
HashSet::from_iter(a.into_iter());
+
+                // Check that e's elements are unique.
+                assert_eq!(h1.len(), e.len());
+
+                // Check that a contains the same unique elements as e.
+                assert_eq!(h1, h2);

Review comment:
       Thank you! This is much nicer - I noticed the `PartialOrd` 
implementation but was unsure of how to actually use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to