gfeyer opened a new pull request, #3646:
URL: https://github.com/apache/avro/pull/3646

   What is the purpose of the change
   
   BinaryDecoder::arrayNext() calls doDecodeLong() directly instead of 
doDecodeItemCount(), causing it to mishandle negative array block counts. Per 
the Avro spec, a negative block count means the absolute value is the item 
count followed by an additional long for the byte-size of the block. When 
arrayNext() reads a negative count, static_cast<size_t>(-100) produces a huge 
value and the byte-size long is left unconsumed, corrupting the stream position.
   
   doDecodeItemCount() already handles this correctly and is used by 
arrayStart(), mapStart(), and mapNext(). Only arrayNext() bypassed it. The fix 
changes arrayNext() to call doDecodeItemCount() for consistency.
   
   This affects any array large enough to be encoded in multiple blocks with 
negative counts. ClickHouse independently found the same bug 
(https://github.com/ClickHouse/ClickHouse/issues/60438, 
https://github.com/ClickHouse/avro/pull/23).
   
   Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   The fix was verified against production Avro messages containing arrays with 
266+ items encoded in multiple blocks with negative counts, which previously 
failed ~20% of the time and now decode correctly.
   
   Documentation
   - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to