jecsand838 opened a new pull request, #8292:
URL: https://github.com/apache/arrow-rs/pull/8292

   # Which issue does this PR close?
   
   This work continues arrow-avro schema resolution support and aligns behavior 
with the Avro spec.
   
   - **Related to**: #4886 (“Add Avro Support”): ongoing work to round out the 
reader/decoder, including schema resolution and type promotion.
   - **Follow-ups/Context**: #8124 (schema resolution & type promotion for the 
decoder), #8223 (enum mapping for schema resolution). These previous efforts 
established the foundations that this PR extends to default values and 
additional resolvable types.
   
   # Rationale for this change
   
   Avro’s **schema resolution** requires readers to reconcile differences 
between the writer and reader schemas, including:
   - Using record-field **default values** when the writer lacks a field 
present in the reader; defaults must be type-correct (i.e., union defaults 
match the first union member; bytes/fixed defaults are JSON strings).
   - Recursively resolving **arrays** (by item schema) and **maps** (by value 
schema).  
   - Resolving **fixed** types (size and unqualified name must match) and 
erroring when they do not.
   
   Prior to this change, arrow-avro’s resolution handled some cases but lacked 
full Codec support for **default values** and for resolving **array/map/fixed** 
shapes between writer and reader. This led to gaps when reading evolved data or 
datasets produced by heterogeneous systems. This PR implements these missing 
pieces so the Arrow reader behaves per the spec in common evolution scenarios.
   
   # What changes are included in this PR?
   
   This PR modifies **`arrow-avro/src/codec.rs`** to extend the 
schema-resolution path
   
   - **Default value handling** for record fields  
     - Reads and applies default values when the reader expects a field absent 
from the writer, including **nested defaults**.  
     - Validates defaults per the Avro spec (e.g., union defaults match the 
first schema; bytes/fixed defaults are JSON strings).
   
   - **Array / Map / Fixed schema resolution**  
     - **Array**: recursively resolves item schemas (writer↔reader).
     - **Map**: recursively resolves value schemas.
     - **Fixed**: enforces matching size and (unqualified) name; otherwise 
signals an error, consistent with the spec. 
   
   - **Codec updates**  
     - Refactors internal codec logic to support the above during decoding, 
including resolution for **record fields** and **nested defaults**. (See commit 
message for the high-level summary.)
   
   # Are these changes tested?
   
   **Yes.** This PR includes new unit tests in `arrow-avro/src/codec.rs` 
covering:
   
   1) **Default validation & persistence**
      - `Null`/union‑nullability rules; metadata persistence of defaults 
(`AVRO_FIELD_DEFAULT_METADATA_KEY`).
   2) **`AvroLiteral` Parsing**
      - Range checks for `i32`/`f32`; correct literals for `i64`/`f64`; 
`Utf8`/`Utf8View`; `uuid` strings (RFC‑4122).
      - Byte‑range mapping for `bytes`/`fixed` defaults; `Fixed(n)` length 
enforcement; `decimal` on `fixed` vs `bytes`; `duration`/interval fixed 
**12**‑byte enforcement.
   3) **Collections & records**
      - Array/map defaults shape; enum symbol validity; record defaults for 
missing fields, required‑field errors, and honoring field‑level defaults; 
skip‑fields retained for writer‑only fields.
   4) **Resolution mechanics**
      - Element **promotion** (`int` to `long`) for arrays; **reader metadata 
precedence** for colliding attributes; `fixed` name/size match including 
**alias**.
   
   # Are there any user-facing changes?
   
   N/A
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to