jecsand838 commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2238323313


##########
arrow-avro/src/reader/mod.rs:
##########
@@ -116,88 +129,295 @@ fn read_header<R: BufRead>(mut reader: R) -> 
Result<Header, ArrowError> {
             break;
         }
     }
-    decoder.flush().ok_or_else(|| {
-        ArrowError::ParseError("Unexpected EOF while reading Avro 
header".to_string())
-    })
+    decoder
+        .flush()
+        .ok_or_else(|| ArrowError::ParseError("Unexpected EOF while reading 
Avro header".into()))
 }
 
 /// A low-level interface for decoding Avro-encoded bytes into Arrow 
`RecordBatch`.
+///
+/// This decoder handles both standard Avro container file data and 
single-object encoded
+/// messages by managing schema resolution and caching decoders.
 #[derive(Debug)]
 pub struct Decoder {
-    record_decoder: RecordDecoder,
+    /// The maximum number of rows to decode into a single batch.
     batch_size: usize,
+    /// The number of rows decoded into the current batch.
     decoded_rows: usize,
+    /// The fingerprint of the active writer schema.
+    active_fp: Option<Fingerprint>,
+    /// The `RecordDecoder` corresponding to the active writer schema.
+    active_decoder: RecordDecoder,
+    /// An LRU cache of inactive `RecordDecoder`s, keyed by schema fingerprint.
+    cache: HashMap<Fingerprint, RecordDecoder>,
+    /// A queue to maintain the least recently used order of the cache.
+    lru: VecDeque<Fingerprint>,
+    /// Maximum number of cached decoders allowed.
+    max_cache_size: usize,
+    /// The user-provided reader schema for projection.
+    reader_schema: Option<AvroSchema<'static>>,
+    /// A store of known writer schemas for single-object decoding.
+    schema_store: Option<SchemaStore<'static>>,
+    /// Whether to decode string data as `StringViewArray`.
+    utf8_view: bool,
+    /// If true, do not allow resolving schemas not already in the 
`SchemaStore`.
+    static_store_mode: bool,
+    /// If true, schema resolution errors will cause a failure.
+    strict_mode: bool,
+    /// The fingerprint of a schema to switch to after the current batch is 
flushed.
+    pending_fp: Option<Fingerprint>,
+    /// A `RecordDecoder` for a new schema, staged to become active after the 
current batch.
+    pending_decoder: Option<RecordDecoder>,

Review Comment:
   The `cache` + `lru` was the most obvious. I also think it's possible to pair 
the `active_decoder` and `active_fingerprint`, however it's not quite as clear 
cut because the `active_fingerprint` is optional while the `active_decoder` 
isn't.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to