[ 
https://issues.apache.org/jira/browse/AVRO-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057961#comment-18057961
 ] 

Gabriel Feyer commented on AVRO-4228:
-------------------------------------

Thanks Martin! 

> BinaryDecoder::arrayNext() does not handle negative block counts
> ----------------------------------------------------------------
>
>                 Key: AVRO-4228
>                 URL: https://issues.apache.org/jira/browse/AVRO-4228
>             Project: Apache Avro
>          Issue Type: Bug
>            Reporter: Gabriel Feyer
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> BinaryDecoder::arrayNext() in lang/c++/impl/BinaryDecoder.cc calls 
> doDecodeLong() directly instead of doDecodeItemCount(), causing it to 
> mishandle negative array block counts.
> Per the Avro spec, a negative block count indicates the absolute value is the 
> item count, followed by an additional long representing the byte-size of the 
> block. The helper method doDecodeItemCount() handles this correctly, and is 
> already used by arrayStart(), mapStart(), and mapNext(). Only arrayNext() 
> bypasses it.
> When arrayNext() reads a negative count (e.g., -100), 
> static_cast<size_t>(-100) produces a huge value, the byte-size long is left 
> unconsumed, and the stream position becomes corrupted. Subsequent reads 
> produce garbage, typically resulting in a vector::_M_range_check crash or 
> (with the validating decoder) a "Wrong number of items" error.
> This affects any array large enough to be encoded in multiple blocks. Arrays 
> that fit in a single block are unaffected because arrayNext() only sees the 
> terminal zero.
> Fix:
> {code:java}
>   // BinaryDecoder.cc — from:
>   size_t BinaryDecoder::arrayNext() {
>       return static_cast<size_t>(doDecodeLong());
>   }
>   // to:
>   size_t BinaryDecoder::arrayNext() {
>       return doDecodeItemCount();
>   } {code}
> This is consistent with how arrayStart(), mapStart(), and mapNext() are 
> implemented.
> Note: ClickHouse independently encountered this bug 
> (https://github.com/ClickHouse/ClickHouse/issues/60438) and fixed it in their 
> fork (https://github.com/ClickHouse/avro/pull/23) by inlining the 
> negative-count logic. The fix above is simpler as it reuses the existing 
> helper.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to