[
https://issues.apache.org/jira/browse/AVRO-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated AVRO-4228:
---------------------------------
Labels: pull-request-available (was: )
> BinaryDecoder::arrayNext() does not handle negative block counts
> ----------------------------------------------------------------
>
> Key: AVRO-4228
> URL: https://issues.apache.org/jira/browse/AVRO-4228
> Project: Apache Avro
> Issue Type: Bug
> Reporter: Gabriel Feyer
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> BinaryDecoder::arrayNext() in lang/c++/impl/BinaryDecoder.cc calls
> doDecodeLong() directly instead of doDecodeItemCount(), causing it to
> mishandle negative array block counts.
> Per the Avro spec, a negative block count indicates the absolute value is the
> item count, followed by an additional long representing the byte-size of the
> block. The helper method doDecodeItemCount() handles this correctly, and is
> already used by arrayStart(), mapStart(), and mapNext(). Only arrayNext()
> bypasses it.
> When arrayNext() reads a negative count (e.g., -100),
> static_cast<size_t>(-100) produces a huge value, the byte-size long is left
> unconsumed, and the stream position becomes corrupted. Subsequent reads
> produce garbage, typically resulting in a vector::_M_range_check crash or
> (with the validating decoder) a "Wrong number of items" error.
> This affects any array large enough to be encoded in multiple blocks. Arrays
> that fit in a single block are unaffected because arrayNext() only sees the
> terminal zero.
> Fix:
> {code:java}
> // BinaryDecoder.cc — from:
> size_t BinaryDecoder::arrayNext() {
> return static_cast<size_t>(doDecodeLong());
> }
> // to:
> size_t BinaryDecoder::arrayNext() {
> return doDecodeItemCount();
> } {code}
> This is consistent with how arrayStart(), mapStart(), and mapNext() are
> implemented.
> Note: ClickHouse independently encountered this bug
> (https://github.com/ClickHouse/ClickHouse/issues/60438) and fixed it in their
> fork (https://github.com/ClickHouse/avro/pull/23) by inlining the
> negative-count logic. The fix above is simpler as it reuses the existing
> helper.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)