jecsand838 commented on code in PR #7954:
URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2214304249
##########
arrow-avro/src/reader/record.rs:
##########
@@ -301,9 +301,23 @@ impl Decoder {
}
Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPACITY)),
};
- Ok(match data_type.nullability() {
- Some(nullability) => Self::Nullable(
- nullability,
+ let union_order = match data_type.nullability() {
+ None => None,
+ Some(Nullability::NullFirst) => Some(Nullability::NullFirst),
+ Some(Nullability::NullSecond) => {
+ if strict_mode {
+ return Err(ArrowError::ParseError(
+ "Found Avro union of the form ['T','null'], which is
disallowed in strict_mode"
+ .to_string(),
+ ));
+ }
+ Some(Nullability::NullSecond)
+ }
+ };
Review Comment:
@veronica-m-ef After looking through this again, I'm beginning to think it
may make more sense to move this check to the `Schema::Union(f)` branch inside
the `make_data_type` method in codec.rs. That way we can determine whether the
Avro Schema is valid when `strict_mode` is `true` before attempting to
construct the `RecordDecoder`.
By doing this we'd get an error earlier on line 224 in mod.rs, i.e. `let
root_field = AvroField::try_from(schema)?;`
You'd need to find a way to pass down `strict_mode` into `AvroField` though.
Maybe we could implement an `AvroFieldBuilder` in codec.rs containing
`with_utf8view` and `with_strict_mode` methods to achieve this?
Then in the `make_data_type` method we could do:
```rust
Schema::Union(f) => {
// Special case the common case of nullable primitives
let null = f
.iter()
.position(|x| x ==
&Schema::TypeName(TypeName::Primitive(PrimitiveType::Null)));
match (f.len() == 2, null) {
(true, Some(0)) => {
let mut field = make_data_type(&f[1], namespace,
resolver, use_utf8view, strict_mode)?;
field.nullability = Some(Nullability::NullFirst);
Ok(field)
}
(true, Some(1)) => {
if strict_mode {
return Err(ArrowError::SchemaError(
"Found Avro union of the form ['T','null'],
which is disallowed in strict_mode"
.to_string(),
));
}
let mut field = make_data_type(&f[0], namespace,
resolver, use_utf8view, strict_mode)?;
field.nullability = Some(Nullability::NullSecond);
Ok(field)
}
_ => Err(ArrowError::NotYetImplemented(format!(
"Union of {f:?} not currently supported"
))),
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]