jecsand838 opened a new pull request, #7966:
URL: https://github.com/apache/arrow-rs/pull/7966

   … Avro files
   
   # Which issue does this PR close?
   
   - Part of https://github.com/apache/arrow-rs/issues/4886
   
   - Follow up to https://github.com/apache/arrow-rs/pull/7834
   
   # Rationale for this change
   
   The initial Avro reader implementation contained an under-developed and 
temporary safeguard to prevent infinite loops when processing records that 
consumed zero bytes from the input buffer.
   
   When the `Decoder` reported that zero bytes were consumed, the `Reader` 
would advance it's cursor to the end of the current data block. While this 
successfully prevented an infinite loop, it had the critical side effect of 
silently discarding any remaining data in that block, leading to potential data 
loss.
   
   This change enhances the decoding logic to handle these zero-byte values 
correctly, ensuring that the `Reader` makes proper progress without dropping 
data and without risking an infinite loop.
   
   # What changes are included in this PR?
   
   - **Refined Decoder Logic**: The `Decoder` has been updated to accurately 
track and report the number of bytes consumed for all values, including valid 
zero-length records like `null` or empty `bytes`. This ensures the decoder 
always makes forward progress.
   - **Removal of Data-Skipping Safeguard**: The logic in the `Reader` that 
previously advanced to the end of a block on a zero-byte read has been removed. 
The reader now relies on the decoder to report accurate consumption and 
advances its cursor incrementally and safely.
   - * New integration test using a temporary `zero_byte.avro` file created via 
this python script: 
https://gist.github.com/jecsand838/e57647d0d12853f3cf07c350a6a40395
   
   # Are these changes tested?
   
   Yes, a new `test_read_zero_byte_avro_file` test was added that reads the new 
`zero_byte.avro` file and confirms the update.
   
   # Are there any user-facing changes?
   
   N/A
   
   # Follow-Up PRs
   
   1. PR to update `test_read_zero_byte_avro_file` once 
https://github.com/apache/arrow-testing/pull/109 is merged in.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to