scovich commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2248768996


##########
arrow-avro/src/reader/mod.rs:
##########
@@ -182,21 +174,130 @@ impl Decoder {
             FingerprintAlgorithm::Rabin,
             SchemaStore::fingerprint_algorithm,
         );
+        // The loop stops when the batch is full, a schema change is staged,
+        // or handle_prefix indicates we need more bytes (Some(0)).
         while total_consumed < data.len() && self.remaining_capacity > 0 {
-            if let Some(prefix_bytes) = 
self.handle_prefix(&data[total_consumed..], hash_type)? {
-                // A batch is complete when its `remaining_capacity` is 0. It 
may be completed early if
-                // a schema change is detected or there are insufficient bytes 
to read the next prefix.
-                // A schema change requires a new batch.
-                total_consumed += prefix_bytes;
-                break;
+            match self.handle_prefix(&data[total_consumed..], hash_type)? {
+                None => {
+                    // No prefix: decode one row.
+                    let n = 
self.active_decoder.decode(&data[total_consumed..], 1)?;
+                    total_consumed += n;
+                    self.remaining_capacity -= 1;
+                }
+                Some(0) => {
+                    // Detected start of a prefix but need more bytes.
+                    break;
+                }

Review Comment:
   > use `Some(0)` to cover a scenario where magic bytes were read, but there 
wasn't enough remaining bytes in the buffer for the fingerprint. My thought was 
to then return so the caller could add more bytes.
   
   Sure, but the `Some(n)` case does exactly the same thing (just for a 
different reason). That's why I had suggested to combine them and just use a 
comment to explain what's going on.
   
   > digging into the Java implementation's code however I noticed they throw 
an error for this scenario
   
   Not sure why it should throw an error... is there some special property of 
the wire format that prevents partial reads?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to