jecsand838 opened a new pull request, #8316: URL: https://github.com/apache/arrow-rs/pull/8316
# Which issue does this PR close? - **Related to**: #4886 (“Add Avro Support”) # Rationale for this change Working, end‑to‑end examples and clearer documentation make it much easier for users to adopt `arrow-avro` for common Avro ingestion paths (OCF files, Single‑Object framing, Confluent Schema Registry). This PR adds runnable examples that demonstrate typical patterns: projection via a reader schema, schema evolution, and streaming decode. It also expands module and type docs to explain trade‑offs and performance considerations. It also centralizes a default record‑name string as a constant to reduce duplication and potential drift in the codebase # What changes are included in this PR? ## New examples under `arrow-avro/examples/` * `read_avro_ocf.rs`: Read Avro OCF into Arrow RecordBatches with ReaderBuilder, including knobs for batch size, UTF‑8 handling, and strict mode; shows projection via a JSON reader schema. * `read_ocf_with_resolution.rs`: Demonstrates resolving older writer schemas to a current reader schema (schema evolution/projection). * `write_avro_ocf.rs`: Minimal example for writing Arrow data to Avro OCF. * `decode_stream.rs`: Build a streaming Decoder (ReaderBuilder::build_decoder), register writer schemas keyed by Single‑Object Rabin fingerprints, and decode generated frames. * `decode_kafka_stream.rs`: Decode Confluent Schema Registry–framed messages (0x00 magic, 4‑byte big‑endian schema ID, Avro body) while resolving older writer schemas against a current reader schema. ## Documentation improvements * Expanded `arrow-avro` module‑level docs and Decoder docs with usage examples for OCF, Single‑Object, and Confluent wire formats; added notes on schema evolution, streaming, and performance considerations. ## Maintenance tweak * Added `AVRO_ROOT_RECORD_DEFAULT_NAME` in schema.rs to centralize the default root record name. (Reduces literal duplication; no behavior change intended.) # Are these changes tested? * A unit test was added to `arrow-avro/src/codec.rs` to cover the addition of `AVRO_ROOT_RECORD_DEFAULT_NAME`. * No other tests were added in this PR because the work is primarily documentation and runnable examples. The examples themselves are intended to be compiled and executed by users as living documentation. # Are there any user-facing changes? N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
