Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

via GitHub Mon, 04 Aug 2025 15:55:12 -0700


jecsand838 commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2252735697



##########
arrow-avro/src/reader/mod.rs:
##########
@@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) -> 
Result<Header, ArrowError> {
 /// A low-level interface for decoding Avro-encoded bytes into Arrow 
`RecordBatch`.
 #[derive(Debug)]
 pub struct Decoder {
-    record_decoder: RecordDecoder,
+    active_decoder: RecordDecoder,
+    active_fingerprint: Option<Fingerprint>,
     batch_size: usize,
-    decoded_rows: usize,
+    remaining_capacity: usize,
+    #[cfg(feature = "lru")]
+    cache: LruCache<Fingerprint, RecordDecoder>,
+    #[cfg(not(feature = "lru"))]
+    cache: IndexMap<Fingerprint, RecordDecoder>,
+    max_cache_size: usize,
+    reader_schema: Option<AvroSchema<'static>>,
+    writer_schema_store: Option<SchemaStore<'static>>,

Review Comment:
   > The main concern for this PR is:
   > 
   
   > > Does that [static lifetimes here] mean memory they [the passed-in 
schemas] reference must leak for all practical purposes?
   
   I haven't had to resort to any memory leaks in `arrow-avro`. The `AvroField` 
logic is also bound by the same lifetime and the `Schema` is only used to 
create a root `AvroField` with a `Codec` which in turn is then used to create a 
`RecordDecoder`. I haven't had to resort to any `box::leak` OR reference cycles 
and I was careful to add bounding to the cache. 
   
   Also inside of the `Decoder` when making a new `RecordDecoder` (i.e. 
`create_decoder_for`) I don't resort to using `clone` on either of the schemas 
and no new `Schema` are created either. Each `RecordDecoder` can only use the 
same `reader_schema` and set of `writer_schema`.
   
   If I'm missing something however I apologize in advance. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

Reply via email to