alamb commented on code in PR #7131:
URL: https://github.com/apache/arrow-rs/pull/7131#discussion_r1957085257
##########
arrow-ord/src/partition.rs:
##########
@@ -156,7 +157,14 @@ fn find_boundaries(v: &dyn Array) -> Result<BooleanBuffer,
ArrowError> {
let slice_len = v.len() - 1;
let v1 = v.slice(0, slice_len);
let v2 = v.slice(1, slice_len);
- Ok(distinct(&v1, &v2)?.values().clone())
+
+ if !v.data_type().is_nested() {
+ return Ok(distinct(&v1, &v2)?.values().clone());
+ }
+ // Given that we're only comparing values, null ordering in the input or
Review Comment:
I wonder if using `eq` would be faster 🤔
https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html
##########
arrow-ord/src/partition.rs:
##########
@@ -298,4 +306,23 @@ mod tests {
vec![(0..1), (1..2), (2..4), (4..5), (5..7), (7..8), (8..9)],
);
}
+
+ #[test]
+ fn test_partition_nested() {
Review Comment:
Could you also please add a test for multi-column comparison (e.g. more than
one column)?
Maybe an Int32Array and a StructArray?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]