jecsand838 opened a new issue, #9211:
URL: https://github.com/apache/arrow-rs/issues/9211
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
`arrow-avro` already provides a vectorized, column-first Avro→Arrow
reader/decoder (OCF `Reader` + streaming `Decoder`). In some production
workloads (especially high-throughput streaming / Kafka-style ingestion with
stable schemas), Avro decoding is still a dominant CPU cost.
In these workloads:
* Schemas are typically stable (the same writer schema for long periods, or
a small set of versions).
* Records can be wide (tens/hundreds of fields), with lots of `["null", T]`
optional fields and numeric promotions.
* The current decoder performs per-field runtime dispatch (i.e., matching on
a large decoder enum and calling per-type decode logic) on every row, which can
become branch / instruction overhead that is hard to eliminate with incremental
micro-optimizations.
Goal: add an **optional JIT Avro-to-Arrow decode path** that compiles a
schema-specialized decode kernel once per (writer, reader, options) pair and
reuses it for all subsequent batches, with an aspirational target of **~3×
higher steady-state decode throughput** on common primitive/nullable-heavy
schemas.
**Describe the solution you'd like**
Add an opt-in decode backend for `arrow-avro` that can JIT compile a
schema-specialized decoder, with the following properties:
* **Feature-flagged (Cargo)**: introduce a new feature such as `jit`
(default **off**) so the default build remains lightweight and avoids added
complexity/dependencies unless explicitly requested.
* **Runtime selectable**: add a knob on `ReaderBuilder` (applies to both OCF
`Reader` and streaming `Decoder`) such as:
* `ReaderBuilder::with_decode_engine(DecodeEngine::Jit)` /
`DecodeEngine::Interpreted`, or
* `ReaderBuilder::with_jit(true)`
Default remains the current interpreted/vectorized decoder.
* **Compilation model**
* Reuse existing schema resolution outputs (`Codec` / resolved record &
union metadata, defaults, promotions, projection decisions) to build a
fully-resolved per-schema “decode plan” (linear IR / bytecode).
* JIT compile the plan into a single tight decode kernel (e.g., via a
Rust-friendly JIT backend like `cranelift-jit`) that:
* reads Avro binary directly from a byte slice,
* writes directly into Arrow builders or raw Arrow buffers (values +
offsets + validity),
* preserves current semantics (projection, defaults, promotions, strict
union handling, error behavior).
* **Caching / amortization**
* Cache compiled kernels keyed by (writer schema fingerprint / ID + reader
schema + decode options + projection) so compilation is paid once and reused.
* Consider compiling lazily only after a schema is “hot” (seen N times) to
avoid compiling one-off schemas.
* **Fallback**
* If JIT is disabled, compilation fails, or the schema uses unsupported
shapes, fall back automatically to the existing decoder.
* Start with a conservative supported set (e.g., top-level records of
primitives + `["null", T]` + common promotions), and expand over time
(strings/bytes, arrays/maps, more union shapes).
* **Testing + benchmarking**
* Add correctness tests that run both backends on the same inputs and
assert identical `RecordBatch` output.
* Extend existing Criterion benches (e.g., `benches/decoder.rs`) to
compare interpreted vs JIT and to measure compile-time overhead separately from
steady-state decode throughput.
**Describe alternatives you've considered**
* **More interpreter specialization without native JIT**
* Replace enum matching with per-field function pointers / vtables, or
build a small bytecode interpreter. This could reduce some overhead, but likely
won’t remove as much branching as native codegen.
* **Build-time code generation**
* Generate schema-specific decoding code via `build.rs` / proc macros.
This doesn’t work well for dynamic schemas (Schema Registry, evolving streams)
where schemas are not known at compile time.
* **Lower-level JIT backend**
* Use a runtime assembler (e.g., `dynasm`/`dynasmrt`) to hand-assemble
x86_64/aarch64 kernels. This may be faster but is substantially harder to
maintain and test across architectures.
* **Row-centric Avro decoding**
* Decode into row values (e.g., `apache-avro` Value) and then build Arrow
arrays. This generally reintroduces row-wise overhead and is not competitive
with `arrow-avro`’s existing vectorized approach.
**Additional context**
* The JIT backend would be an optional "extra gear" for users who need
maximum decode throughput.
* JIT introduces practical constraints (executable memory / W^X policies,
WASM targets, some hardened environments). This is why it should be
**feature-flagged** and **runtime-selectable** with a transparent fallback path.
* A staged implementation could reduce risk:
1. Phase 1: primitives + nullable `["null", T]` + promotions + projection
skipping
2. Phase 2: bytes/string (including `StringViewArray`)
3. Phase 3: arrays/maps/nested records and more union shapes
* Success criteria (initial):
* No behavior change with default features / default settings
* JIT feature builds cleanly behind `--features arrow-avro/jit`
* Demonstrable steady-state speedup on representative benches, with
compile overhead amortized by schema reuse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]