albertlockett commented on code in PR #9373:
URL: https://github.com/apache/arrow-rs/pull/9373#discussion_r2788835747


##########
arrow-select/src/concat.rs:
##########
@@ -1900,4 +1900,90 @@ mod tests {
         assert_eq!(values.values(), &[10, 20, 30]);
         assert_eq!(&[2, 3, 5], run_ends);
     }
+
+    #[test]
+    fn test_concat_u8_dictionary_256_values() {
+        // Integration test: concat should work with exactly 256 unique values
+        let values = StringArray::from((0..256).map(|i| format!("v{}", 
i)).collect::<Vec<_>>());
+        let keys = UInt8Array::from((0..256).map(|i| i as 
u8).collect::<Vec<_>>());
+        let dict = DictionaryArray::<UInt8Type>::try_new(keys, 
Arc::new(values)).unwrap();
+
+        // Concatenate with itself - should succeed
+        let result = concat(&[&dict as &dyn Array, &dict as &dyn Array]);
+        assert!(
+            result.is_ok(),
+            "Concat should succeed with 256 unique values for u8"
+        );
+
+        let concatenated = result.unwrap();
+        assert_eq!(
+            concatenated.len(),
+            512,
+            "Should have 512 total elements (256 * 2)"
+        );
+    }

Review Comment:
   I see that this commit 
https://github.com/apache/arrow-rs/pull/9373/commits/4439cc5568f5d0529ff28e6ce35640ffc5dd2197
 now removes the test 
   ```rs
       #[test]
       fn test_concat_u8_dictionary_257_values_fails() {
   ```
   > - Removed 257-value overflow test (was panicking in infallible function)
   
   I assume this is because the test was panicking when we changed it to use 
the type `FixedSizeBinary` instead of `StringArray` for the dictionary values? 
If so, I think that means this fix doesn't completely solve issue.
   
   Seems like we've corrected the boundary condition for calculating when the 
maximum number of elements, but we haven't corrected issue where concatenating 
dictionaries panics when it overflows.
   
   FWIW - this code still seems to panic, even with the fix introduced to 
`build_extend_dictionary`:
   ```rs
       #[test]
        fn test_concat_u8_dictionary_257_values_fails_fsb() {
            let values = FixedSizeBinaryArray::try_from_iter((0..128).map(|i| 
vec![i as u8])).unwrap();
            let keys = UInt8Array::from((0..128).map(|i| i as 
u8).collect::<Vec<_>>());
            let dict1 = DictionaryArray::<UInt8Type>::try_new(keys, 
Arc::new(values)).unwrap();
   
            let values = FixedSizeBinaryArray::try_from_iter((128..257).map(|i| 
vec![i as u8])).unwrap();
            let keys = UInt8Array::from((0..129).map(|i| i as 
u8).collect::<Vec<_>>());
            let dict2 = DictionaryArray::<UInt8Type>::try_new(keys, 
Arc::new(values)).unwrap();
   
            // Should fail with 257 distinct values
            let result = concat(&[&dict1 as &dyn Array, &dict2 as &dyn Array]);
            assert!(
                result.is_err(),
                "Concat should fail with 257 distinct values for u8"
            );
        }
        ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to