Gabriel Feyer created AVRO-4228:
-----------------------------------
Summary: BinaryDecoder::arrayNext() does not handle negative block
counts
Key: AVRO-4228
URL: https://issues.apache.org/jira/browse/AVRO-4228
Project: Apache Avro
Issue Type: Bug
Reporter: Gabriel Feyer
BinaryDecoder::arrayNext() in lang/c++/impl/BinaryDecoder.cc calls
doDecodeLong() directly instead of doDecodeItemCount(), causing it to mishandle
negative array block counts.
Per the Avro spec, a negative block count indicates the absolute value is the
item count, followed by an additional long representing the byte-size of the
block. The helper method doDecodeItemCount() handles this correctly, and is
already used by arrayStart(), mapStart(), and mapNext(). Only arrayNext()
bypasses it.
When arrayNext() reads a negative count (e.g., -100), static_cast<size_t>(-100)
produces a huge value, the byte-size long is left unconsumed, and the stream
position becomes corrupted. Subsequent reads produce garbage, typically
resulting in a vector::_M_range_check crash or (with the validating decoder) a
"Wrong number of items" error.
This affects any array large enough to be encoded in multiple blocks. Arrays
that fit in a single block are unaffected because arrayNext() only sees the
terminal zero.
Fix:
{code:java}
// BinaryDecoder.cc — from:
size_t BinaryDecoder::arrayNext() {
return static_cast<size_t>(doDecodeLong());
}
// to:
size_t BinaryDecoder::arrayNext() {
return doDecodeItemCount();
} {code}
This is consistent with how arrayStart(), mapStart(), and mapNext() are
implemented.
Note: ClickHouse independently encountered this bug
(https://github.com/ClickHouse/ClickHouse/issues/60438) and fixed it in their
fork (https://github.com/ClickHouse/avro/pull/23) by inlining the
negative-count logic. The fix above is simpler as it reuses the existing helper.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)