gfeyer opened a new pull request, #3646: URL: https://github.com/apache/avro/pull/3646
What is the purpose of the change BinaryDecoder::arrayNext() calls doDecodeLong() directly instead of doDecodeItemCount(), causing it to mishandle negative array block counts. Per the Avro spec, a negative block count means the absolute value is the item count followed by an additional long for the byte-size of the block. When arrayNext() reads a negative count, static_cast<size_t>(-100) produces a huge value and the byte-size long is left unconsumed, corrupting the stream position. doDecodeItemCount() already handles this correctly and is used by arrayStart(), mapStart(), and mapNext(). Only arrayNext() bypassed it. The fix changes arrayNext() to call doDecodeItemCount() for consistency. This affects any array large enough to be encoded in multiple blocks with negative counts. ClickHouse independently found the same bug (https://github.com/ClickHouse/ClickHouse/issues/60438, https://github.com/ClickHouse/avro/pull/23). Verifying this change This change is a trivial rework / code cleanup without any test coverage. The fix was verified against production Avro messages containing arrays with 266+ items encoded in multiple blocks with negative counts, which previously failed ~20% of the time and now decode correctly. Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
