jecsand838 commented on code in PR #8349:
URL: https://github.com/apache/arrow-rs/pull/8349#discussion_r2365872073
##########
arrow-avro/src/reader/record.rs:
##########
@@ -486,6 +734,70 @@ impl Decoder {
Self::Decimal256(_, _, _, builder) =>
builder.append_value(i256::ZERO),
Self::Enum(indices, _, _) => indices.push(0),
Self::Duration(builder) => builder.append_null(),
+ Self::Union(fields, type_ids, offsets, encodings, encoding_counts,
None) => {
+ let mut chosen = None;
+ for (i, ch) in encodings.iter().enumerate() {
+ if matches!(ch, Decoder::Null(_)) {
+ chosen = Some(i);
+ break;
+ }
+ }
+ let idx = chosen.unwrap_or(0);
+ let type_id = fields
+ .iter()
+ .nth(idx)
+ .map(|(type_id, _)| type_id)
+ .unwrap_or_else(|| i8::try_from(idx).unwrap_or(0));
Review Comment:
> Actually, maybe I do understand!
>
> If NULL is one of the union branches, emit that directly
> Otherwise, default to the first branch (whatever that is), and emit a NULL
value for it
> Is that correct?
>
> (but if so, then I don't understand why idx could ever be out of bounds?)
That's correct.
I'm working on cleaning this up now by making the invariants explicit at
construction time and erroring if the union is empty or if `encodings.len() !=
fields.len()`. Then I'll remove the `i8::try_from(idx)` fallback and only
compute `type_id` from `fields[idx]`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]