gyanranjanpanda commented on code in PR #9373:
URL: https://github.com/apache/arrow-rs/pull/9373#discussion_r2786453766
##########
arrow-select/src/dictionary.rs:
##########
@@ -275,12 +275,12 @@ pub(crate) fn merge_dictionary_values<K:
ArrowDictionaryKeyType>(
for (value_idx, value) in values {
mapping[value_idx] =
- *interner.intern(value, || match
K::Native::from_usize(indices.len()) {
- Some(idx) => {
- indices.push((dictionary_idx, value_idx));
- Ok(idx)
- }
- None => Err(ArrowError::DictionaryKeyOverflowError),
+ *interner.intern(value, || -> Result<K::Native,
ArrowError> {
Review Comment:
Yes, this has been verified.
Previously, concatenating dictionary arrays at the exact boundary (e.g., 256
values for u8, 65,536 for u16) incorrectly returned a
DictionaryKeyOverflowError.
With this change, the full native key range is now allowed, and overflow
only occurs when exceeding the boundary.
The following tests confirm the behavior:
test_concat_u8_dictionary_256_values — now succeeds at the exact u8 limit
(previously failed).
test_concat_u8_dictionary_257_values_fails — correctly returns an error when
exceeding the limit.
test_concat_u16_dictionary_65536_values — now succeeds at the exact u16
limit (previously failed).
test_concat_returns_error_not_panic — verifies overflow returns an
ArrowError instead of panicking.
Please let me know if you’d like additional cases covered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]