joaquinhuigomez opened a new pull request, #9623:
URL: https://github.com/apache/arrow-rs/pull/9623

   # Which issue does this PR close?
   
   - Closes #9595
   
   # Rationale for this change
   
   The [IPC 
specification](https://arrow.apache.org/docs/format/Columnar.html#format-ipc) 
states:
   
   > An edge-case for interleaved dictionary and record batches occurs when the 
record batches contain dictionary encoded arrays that are completely null. In 
this case, the dictionary for the encoded column might appear after the first 
record batch.
   
   Arrow C++ (v17+) relies on this and does not emit a dictionary batch when 
all values in a dictionary-encoded column are null.  The Rust IPC reader 
currently fails with `"Cannot find a dictionary batch with dict id: ..."` when 
reading such streams, making cross-language interop broken for this edge case.
   
   # What changes are included in this PR?
   
   When the IPC reader encounters a `Dictionary`-typed column whose `dict_id` 
has no corresponding entry in `dictionaries_by_id`, it now synthesizes an empty 
values array of the appropriate type (via `new_empty_array`) instead of 
returning an error.  This matches the spec's allowance for omitted dictionary 
batches on null-only columns.
   
   # Are these changes tested?
   
   Yes.  A new test (`test_read_null_dict_without_dictionary_batch`) writes an 
IPC stream with an all-null dictionary column, strips the dictionary batch 
message from the raw bytes to simulate C++ behavior, then verifies the Rust 
reader successfully decodes the stream.
   
   # Are there any user-facing changes?
   
   IPC streams produced by C++ (or other implementations) that omit dictionary 
batches for null-only dictionary columns can now be read without error.  
Previously these streams caused a `ParseError`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to