[ 
https://issues.apache.org/jira/browse/AVRO-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058855#comment-18058855
 ] 

ASF subversion and git services commented on AVRO-4228:
-------------------------------------------------------

Commit 3508f0ec8d90038a96133d018c93e8f52e925264 in avro's branch 
refs/heads/main from gfeyer
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=3508f0ec8d ]

AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts 
(#3646)

* AVRO-4228: Fix BinaryDecoder::arrayNext() to handle negative block counts

* AVRO-4228: Add test for arrayNext() with negative block counts

* AVRO-4228: Move negative block count to second block in test

* AVRO-4228: Avoid undefined behavior when negating INT64_MIN in 
doDecodeItemCount

---------

Co-authored-by: Gabriel Feyer <[email protected]>

> BinaryDecoder::arrayNext() does not handle negative block counts
> ----------------------------------------------------------------
>
>                 Key: AVRO-4228
>                 URL: https://issues.apache.org/jira/browse/AVRO-4228
>             Project: Apache Avro
>          Issue Type: Bug
>            Reporter: Gabriel Feyer
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> BinaryDecoder::arrayNext() in lang/c++/impl/BinaryDecoder.cc calls 
> doDecodeLong() directly instead of doDecodeItemCount(), causing it to 
> mishandle negative array block counts.
> Per the Avro spec, a negative block count indicates the absolute value is the 
> item count, followed by an additional long representing the byte-size of the 
> block. The helper method doDecodeItemCount() handles this correctly, and is 
> already used by arrayStart(), mapStart(), and mapNext(). Only arrayNext() 
> bypasses it.
> When arrayNext() reads a negative count (e.g., -100), 
> static_cast<size_t>(-100) produces a huge value, the byte-size long is left 
> unconsumed, and the stream position becomes corrupted. Subsequent reads 
> produce garbage, typically resulting in a vector::_M_range_check crash or 
> (with the validating decoder) a "Wrong number of items" error.
> This affects any array large enough to be encoded in multiple blocks. Arrays 
> that fit in a single block are unaffected because arrayNext() only sees the 
> terminal zero.
> Fix:
> {code:java}
>   // BinaryDecoder.cc — from:
>   size_t BinaryDecoder::arrayNext() {
>       return static_cast<size_t>(doDecodeLong());
>   }
>   // to:
>   size_t BinaryDecoder::arrayNext() {
>       return doDecodeItemCount();
>   } {code}
> This is consistent with how arrayStart(), mapStart(), and mapNext() are 
> implemented.
> Note: ClickHouse independently encountered this bug 
> (https://github.com/ClickHouse/ClickHouse/issues/60438) and fixed it in their 
> fork (https://github.com/ClickHouse/avro/pull/23) by inlining the 
> negative-count logic. The fix above is simpler as it reuses the existing 
> helper.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to