jecsand838 opened a new issue, #9211:
URL: https://github.com/apache/arrow-rs/issues/9211

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   `arrow-avro` already provides a vectorized, column-first Avro→Arrow 
reader/decoder (OCF `Reader` + streaming `Decoder`). In some production 
workloads (especially high-throughput streaming / Kafka-style ingestion with 
stable schemas), Avro decoding is still a dominant CPU cost.
   
   In these workloads:
   
   * Schemas are typically stable (the same writer schema for long periods, or 
a small set of versions).
   * Records can be wide (tens/hundreds of fields), with lots of `["null", T]` 
optional fields and numeric promotions.
   * The current decoder performs per-field runtime dispatch (i.e., matching on 
a large decoder enum and calling per-type decode logic) on every row, which can 
become branch / instruction overhead that is hard to eliminate with incremental 
micro-optimizations.
   
   Goal: add an **optional JIT Avro-to-Arrow decode path** that compiles a 
schema-specialized decode kernel once per (writer, reader, options) pair and 
reuses it for all subsequent batches, with an aspirational target of **~3× 
higher steady-state decode throughput** on common primitive/nullable-heavy 
schemas.
   
   **Describe the solution you'd like**
   
   Add an opt-in decode backend for `arrow-avro` that can JIT compile a 
schema-specialized decoder, with the following properties:
   
   * **Feature-flagged (Cargo)**: introduce a new feature such as `jit` 
(default **off**) so the default build remains lightweight and avoids added 
complexity/dependencies unless explicitly requested.
   
   * **Runtime selectable**: add a knob on `ReaderBuilder` (applies to both OCF 
`Reader` and streaming `Decoder`) such as:
   
     * `ReaderBuilder::with_decode_engine(DecodeEngine::Jit)` / 
`DecodeEngine::Interpreted`, or
     * `ReaderBuilder::with_jit(true)`
   
     Default remains the current interpreted/vectorized decoder.
   
   * **Compilation model**
   
     * Reuse existing schema resolution outputs (`Codec` / resolved record & 
union metadata, defaults, promotions, projection decisions) to build a 
fully-resolved per-schema “decode plan” (linear IR / bytecode).
     * JIT compile the plan into a single tight decode kernel (e.g., via a 
Rust-friendly JIT backend like `cranelift-jit`) that:
   
       * reads Avro binary directly from a byte slice,
       * writes directly into Arrow builders or raw Arrow buffers (values + 
offsets + validity),
       * preserves current semantics (projection, defaults, promotions, strict 
union handling, error behavior).
   
   * **Caching / amortization**
   
     * Cache compiled kernels keyed by (writer schema fingerprint / ID + reader 
schema + decode options + projection) so compilation is paid once and reused.
     * Consider compiling lazily only after a schema is “hot” (seen N times) to 
avoid compiling one-off schemas.
   
   * **Fallback**
   
     * If JIT is disabled, compilation fails, or the schema uses unsupported 
shapes, fall back automatically to the existing decoder.
     * Start with a conservative supported set (e.g., top-level records of 
primitives + `["null", T]` + common promotions), and expand over time 
(strings/bytes, arrays/maps, more union shapes).
   
   * **Testing + benchmarking**
   
     * Add correctness tests that run both backends on the same inputs and 
assert identical `RecordBatch` output.
     * Extend existing Criterion benches (e.g., `benches/decoder.rs`) to 
compare interpreted vs JIT and to measure compile-time overhead separately from 
steady-state decode throughput.
   
   **Describe alternatives you've considered**
   
   * **More interpreter specialization without native JIT**
   
     * Replace enum matching with per-field function pointers / vtables, or 
build a small bytecode interpreter. This could reduce some overhead, but likely 
won’t remove as much branching as native codegen.
   
   * **Build-time code generation**
   
     * Generate schema-specific decoding code via `build.rs` / proc macros. 
This doesn’t work well for dynamic schemas (Schema Registry, evolving streams) 
where schemas are not known at compile time.
   
   * **Lower-level JIT backend**
   
     * Use a runtime assembler (e.g., `dynasm`/`dynasmrt`) to hand-assemble 
x86_64/aarch64 kernels. This may be faster but is substantially harder to 
maintain and test across architectures.
   
   * **Row-centric Avro decoding**
   
     * Decode into row values (e.g., `apache-avro` Value) and then build Arrow 
arrays. This generally reintroduces row-wise overhead and is not competitive 
with `arrow-avro`’s existing vectorized approach.
   
   **Additional context**
   
   * The JIT backend would be an optional "extra gear" for users who need 
maximum decode throughput.
   * JIT introduces practical constraints (executable memory / W^X policies, 
WASM targets, some hardened environments). This is why it should be 
**feature-flagged** and **runtime-selectable** with a transparent fallback path.
   * A staged implementation could reduce risk:
     1. Phase 1: primitives + nullable `["null", T]` + promotions + projection 
skipping
     2. Phase 2: bytes/string (including `StringViewArray`)
     3. Phase 3: arrays/maps/nested records and more union shapes
   * Success criteria (initial):
     * No behavior change with default features / default settings
     * JIT feature builds cleanly behind `--features arrow-avro/jit`
     * Demonstrable steady-state speedup on representative benches, with 
compile overhead amortized by schema reuse.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to