albertlockett commented on code in PR #9373:
URL: https://github.com/apache/arrow-rs/pull/9373#discussion_r2787977015
##########
arrow-select/src/concat.rs:
##########
@@ -1900,4 +1900,90 @@ mod tests {
assert_eq!(values.values(), &[10, 20, 30]);
assert_eq!(&[2, 3, 5], run_ends);
}
+
+ #[test]
+ fn test_concat_u8_dictionary_256_values() {
+ // Integration test: concat should work with exactly 256 unique values
+ let values = StringArray::from((0..256).map(|i| format!("v{}",
i)).collect::<Vec<_>>());
+ let keys = UInt8Array::from((0..256).map(|i| i as
u8).collect::<Vec<_>>());
+ let dict = DictionaryArray::<UInt8Type>::try_new(keys,
Arc::new(values)).unwrap();
+
+ // Concatenate with itself - should succeed
+ let result = concat(&[&dict as &dyn Array, &dict as &dyn Array]);
+ assert!(
+ result.is_ok(),
+ "Concat should succeed with 256 unique values for u8"
+ );
+
+ let concatenated = result.unwrap();
+ assert_eq!(
+ concatenated.len(),
+ 512,
+ "Should have 512 total elements (256 * 2)"
+ );
+ }
Review Comment:
By way of explanation for why this only happens for certain types - we only
seem to hit this bug in the case where we don't merge dictionary keys, and end
up in `concat_fallback` here:
https://github.com/apache/arrow-rs/blob/fb775011f9e98f7eb84c8df006f8bd9e040ec505/arrow-select/src/concat.rs#L110-L114
The reason this test fails w/out the fix is b/c we don't merge dictionary
keys what we're concatenating both use the same values array.
I'm thinking about the case of future maintenance -- if someone came along
and saw that this test, which concatenates the same array with itself, is
implemented differently than the tests below, and changed the implementation to
be consistent, then they might accidentally introduce a change such that the
test doesn't protect against regressions of the original issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]