jecsand838 commented on code in PR #7954:
URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2214304249


##########
arrow-avro/src/reader/record.rs:
##########
@@ -301,9 +301,23 @@ impl Decoder {
             }
             Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPACITY)),
         };
-        Ok(match data_type.nullability() {
-            Some(nullability) => Self::Nullable(
-                nullability,
+        let union_order = match data_type.nullability() {
+            None => None,
+            Some(Nullability::NullFirst) => Some(Nullability::NullFirst),
+            Some(Nullability::NullSecond) => {
+                if strict_mode {
+                    return Err(ArrowError::ParseError(
+                        "Found Avro union of the form ['T','null'], which is 
disallowed in strict_mode"
+                            .to_string(),
+                    ));
+                }
+                Some(Nullability::NullSecond)
+            }
+        };

Review Comment:
   @veronica-m-ef After looking through this again, I'm beginning to think it 
may make more sense to move this check to the `Schema::Union(f)` branch inside 
the `make_data_type` method in codec.rs. That way we can determine whether the 
Avro Schema is valid when `strict_mode` is `true` before attempting to 
construct the `RecordDecoder`. 
   
   By doing this we'd get an error earlier on line 224 in mod.rs, i.e. `let 
root_field = AvroField::try_from(schema)?;`
   
   You'd need to find a way to pass down `strict_mode` into `AvroField` though. 
Maybe we could implement an `AvroFieldBuilder` in codec.rs containing 
`with_utf8view` and `with_strict_mode` methods to achieve this?
   
   Then in the `make_data_type` method we could do:
   
   ```rust
           Schema::Union(f) => {
               // Special case the common case of nullable primitives
               let null = f
                   .iter()
                   .position(|x| x == 
&Schema::TypeName(TypeName::Primitive(PrimitiveType::Null)));
               match (f.len() == 2, null) {
                   (true, Some(0)) => {
                       let mut field = make_data_type(&f[1], namespace, 
resolver, use_utf8view, strict_mode)?;
                       field.nullability = Some(Nullability::NullFirst);
                       Ok(field)
                   }
                   (true, Some(1)) => {
                       if strict_mode {
                           return Err(ArrowError::SchemaError(
                               "Found Avro union of the form ['T','null'], 
which is disallowed in strict_mode"
                                   .to_string(),
                           ));
                       }
                       let mut field = make_data_type(&f[0], namespace, 
resolver, use_utf8view, strict_mode)?;
                       field.nullability = Some(Nullability::NullSecond);
                       Ok(field)
                   }
                   _ => Err(ArrowError::NotYetImplemented(format!(
                       "Union of {f:?} not currently supported"
                   ))),
               }
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to