[ 
https://issues.apache.org/jira/browse/AVRO-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079229#comment-18079229
 ] 

ASF subversion and git services commented on AVRO-4238:
-------------------------------------------------------

Commit 53d7d41042d99c0464988a969a1c898f95a926f4 in avro's branch 
refs/heads/branch-1.12 from Cedric
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=53d7d41042 ]

AVRO-4238: [java] Fix adding union fields with array default value (#3730)

When FastReader was enabled and a field of type Union<Array, ...>
with an Array as default value was added, an AvroRuntimeException
occurred during Schema Evolution.

This change resolves this bug in FastReaderBuilder.java, allowing the
Schema Migration to succeed as specified.

> FastReader fails to unbox nested type when defaulting a union<array<>> field
> ----------------------------------------------------------------------------
>
>                 Key: AVRO-4238
>                 URL: https://issues.apache.org/jira/browse/AVRO-4238
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.12.1
>         Environment: Java 21, Gradle 8.11, org.apache.avro:avro:1.12.1
>            Reporter: Cedric Holzer
>            Assignee: Cedric Holzer
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.13.0, 1.12.2
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Description
> When using the FastReader, the schema evolution fails with an 
> AvroRuntimeException when the reader schema adds a new field of type 
> union<array<T>, null> with a default value of an empty array.
> Fast Reader is enabled by default in 1.12.1, older versions are affected if 
> FastReader was enabled manually.
> h3. Cause
> In 
> [FastReaderBuilder.getDefaultingStep()|https://github.com/apache/avro/blob/4e376735ebbd14cc17e53116183039c8c4ced8ab/lang/java/avro/src/main/java/org/apache/avro/io/FastReaderBuilder.java#L191],
>  the fast path for non-empty lists calls 
> {code:java}
> data.newArray(old, 0, field.schema()) {code}
> field.schema() returns the union schema of the field. 
> [GenericData.newArray()|https://github.com/apache/avro/blob/4e376735ebbd14cc17e53116183039c8c4ced8ab/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1527]
>  internally calls schema.getElementType(), which is only valid for Array-type 
> schemas and therefore throws the error we see.
> h2. Steps to reproduce
> Run the following minimal reproducible example:
> {code:java}
> import org.apache.avro.Schema;
> import org.apache.avro.SchemaBuilder;
> import org.apache.avro.generic.GenericData;
> import org.apache.avro.generic.GenericDatumReader;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.io.DatumReader;
> import org.apache.avro.io.DatumWriter;
> import org.apache.avro.io.Decoder;
> import org.apache.avro.io.DecoderFactory;
> import org.apache.avro.io.Encoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.specific.SpecificData;
> import java.io.ByteArrayInputStream;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import static java.util.Collections.emptyList;
> public class AvroExample {
>     final static Schema EMPTY_RECORD = SchemaBuilder
>             .record("EmptyRecord")
>             .fields()
>             .endRecord();
>     // adds union<array<EmptyRecord>, null> someField = []
>     final static Schema READ_SCHEMA = SchemaBuilder.record("EvolvedRecord")
>             .fields()
>             .name("someField")
>             .type()
>             .unionOf()
>             .array()
>             .items(EMPTY_RECORD)
>             .and()
>             .nullType()
>             .endUnion()
>             .arrayDefault(emptyList())
>             .endRecord();
>     public static void main(String... args) throws IOException {
>         // Disable fast reader -> works as specified, enable -> throws 
> exception
>         GenericData model = SpecificData.get().setFastReaderEnabled(true);
>         // Serialize the empty record with the empty writer Schema
>         final Schema writeSchema = EMPTY_RECORD;
>         final byte[] serialized;
>         try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
>             Encoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
>             DatumWriter<GenericData.Record> w = new 
> GenericDatumWriter<>(EMPTY_RECORD, model);
>             GenericData.Record emptyRecord = new 
> GenericData.Record(writeSchema);
>             w.write(emptyRecord, encoder);
>             encoder.flush();
>             serialized = baos.toByteArray();
>         }
>         // Deserialize with readSchema, Avro should create the new field with 
> its default value
>         try (ByteArrayInputStream bais = new 
> ByteArrayInputStream(serialized)) {
>             Decoder decoder = DecoderFactory.get().directBinaryDecoder(bais, 
> null);
>             DatumReader<GenericData.Record> r = new 
> GenericDatumReader<>(writeSchema, READ_SCHEMA, model);
>             final Object deserialized = r.read(null, decoder);
>             System.out.println(deserialized);
>         }
>     }
> }{code}
> h3. Expected Behaviour
> Deserialization succeeds, the new field someField was populated with its 
> default value, the program prints \{"someField": []}. This is the observable 
> behavior with .setFastReaderEnabled(false).
> h3. Actual Behaviour
> {noformat}
> Exception in thread "main" org.apache.avro.AvroRuntimeException: Not an 
> array: 
> [{"type":"array","items":{"type":"record","name":"EmptyRecord","fields":[]}},"null"]
>    at org.apache.avro.Schema.getElementType(Schema.java:374)       at 
> org.apache.avro.generic.GenericData.newArray(GenericData.java:1528)  at 
> org.apache.avro.io.FastReaderBuilder.lambda$getDefaultingStep$5(FastReaderBuilder.java:199)
>   at 
> org.apache.avro.io.FastReaderBuilder.lambda$createFieldSetter$1(FastReaderBuilder.java:181)
>   at 
> org.apache.avro.io.FastReaderBuilder$RecordReader.read(FastReaderBuilder.java:575)
>    at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150) 
> at AvroExample.main(AvroExample.java:61){noformat}
> h2. Suggested Change
> In 
> [FastReaderBuilder.getDefaultingStep()|https://github.com/apache/avro/blob/4e376735ebbd14cc17e53116183039c8c4ced8ab/lang/java/avro/src/main/java/org/apache/avro/io/FastReaderBuilder.java#L198],
>  when the default value is an empty list, it could be checked whether the 
> type is an Union and if so, unbox the first child of type Array:
> {code:java}
> // Current (broken):
> (old, d) -> data.newArray(old, 0, field.schema())
> // Fix — unwrap union to find the array branch:
> Schema arraySchema = field.schema();
> if (arraySchema.getType() == Schema.Type.UNION) {
>     arraySchema = arraySchema.getTypes().stream()
>         .filter(s -> s.getType() == Schema.Type.ARRAY)
>         .findFirst()
>         .orElse(arraySchema);
> }
> (old, d) -> data.newArray(old, 0, arraySchema){code}
> h2. Workaround
> Disable FastReader by setting 
> {code:java}
> -Dorg.apache.avro.fastread=false{code}
> or change your type to be 
> {code:java}
> union<null, array<T>> = null{code}
> which works correctly but changes the default value to null instead of `[]`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to