jecsand838 opened a new pull request, #8293:
URL: https://github.com/apache/arrow-rs/pull/8293

   # Which issue does this PR close?
   
   This work continues arrow-avro schema resolution support and aligns behavior 
with the Avro spec.
   
   - **Related to**: #4886 (“Add Avro Support”): ongoing work to round out the 
reader/decoder, including schema resolution and type promotion.
   - **Follow-ups/Context**: #8292 (Add array/map/fixed schema resolution and 
default value support to arrow-avro codec), #8124 (schema resolution & type 
promotion for the decoder), #8223 (enum mapping for schema resolution). These 
previous efforts established the foundations that this PR extends to default 
values and additional resolvable types.
   
   # Rationale for this change
   
   Avro’s specification requires readers to materialize default values when a 
field exists in the **reader** schema but not in the **writer** schema, and to 
validate defaults (i.e., union defaults must match the first branch; 
bytes/fixed defaults must be JSON strings; enums may specify a default symbol 
for unknown writer symbols). Implementing this behavior makes `arrow-avro` more 
standards‑compliant and improves interoperability with evolving schemas.
   
   # What changes are included in this PR?
   
   **High‑level summary**
   
   * **Refactor `RecordDecoder`** around a simpler **`Projector`**‑style 
abstraction that consumes `ResolvedRecord` to: (a) skip writer‑only fields, and 
(b) materialize reader‑only defaulted fields, reducing branching in the hot 
path. (See commit subject and record decoder changes.)
   **Touched files (2):**
   
   * `arrow-avro/src/reader/record.rs` - refactor decoder to use precomputed 
mappings and defaults.
   * `arrow-avro/src/reader/mod.rs` - add comprehensive tests for defaults and 
error cases (see below).
   
   # Are these changes tested?
   
   Yes, new integration tests cover both the **happy path** and **validation 
errors**:
   * `test_schema_resolution_defaults_all_supported_types`: verifies that 
defaults for 
boolean/int/long/float/double/bytes/string/date/time/timestamp/decimal/fixed/enum/duration/uuid/array/map/nested
 record and unions are materialized correctly for all rows.
   * `test_schema_resolution_default_enum_invalid_symbol_errors`: invalid enum 
default symbol is rejected.
   * `test_schema_resolution_default_fixed_size_mismatch_errors`: mismatched 
fixed/bytes default lengths are rejected.
   
   These tests assert the Avro‑spec behavior (i.e., union defaults must match 
the first branch; bytes/fixed defaults use JSON strings).
   
   # Are there any user-facing changes?
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to