jecsand838 opened a new pull request, #8293: URL: https://github.com/apache/arrow-rs/pull/8293
# Which issue does this PR close? This work continues arrow-avro schema resolution support and aligns behavior with the Avro spec. - **Related to**: #4886 (“Add Avro Support”): ongoing work to round out the reader/decoder, including schema resolution and type promotion. - **Follow-ups/Context**: #8292 (Add array/map/fixed schema resolution and default value support to arrow-avro codec), #8124 (schema resolution & type promotion for the decoder), #8223 (enum mapping for schema resolution). These previous efforts established the foundations that this PR extends to default values and additional resolvable types. # Rationale for this change Avro’s specification requires readers to materialize default values when a field exists in the **reader** schema but not in the **writer** schema, and to validate defaults (i.e., union defaults must match the first branch; bytes/fixed defaults must be JSON strings; enums may specify a default symbol for unknown writer symbols). Implementing this behavior makes `arrow-avro` more standards‑compliant and improves interoperability with evolving schemas. # What changes are included in this PR? **High‑level summary** * **Refactor `RecordDecoder`** around a simpler **`Projector`**‑style abstraction that consumes `ResolvedRecord` to: (a) skip writer‑only fields, and (b) materialize reader‑only defaulted fields, reducing branching in the hot path. (See commit subject and record decoder changes.) **Touched files (2):** * `arrow-avro/src/reader/record.rs` - refactor decoder to use precomputed mappings and defaults. * `arrow-avro/src/reader/mod.rs` - add comprehensive tests for defaults and error cases (see below). # Are these changes tested? Yes, new integration tests cover both the **happy path** and **validation errors**: * `test_schema_resolution_defaults_all_supported_types`: verifies that defaults for boolean/int/long/float/double/bytes/string/date/time/timestamp/decimal/fixed/enum/duration/uuid/array/map/nested record and unions are materialized correctly for all rows. * `test_schema_resolution_default_enum_invalid_symbol_errors`: invalid enum default symbol is rejected. * `test_schema_resolution_default_fixed_size_mismatch_errors`: mismatched fixed/bytes default lengths are rejected. These tests assert the Avro‑spec behavior (i.e., union defaults must match the first branch; bytes/fixed defaults use JSON strings). # Are there any user-facing changes? N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
