github-actions[bot] commented on code in PR #63509:
URL: https://github.com/apache/doris/pull/63509#discussion_r3286761450


##########
be/src/format/parquet/byte_array_dict_decoder.cpp:
##########
@@ -41,7 +41,14 @@ Status 
ByteArrayDictDecoder::set_dict(DorisUniqueBufferPtr<uint8_t>& dict, int32
 
     size_t total_length = 0;
     for (int i = 0; i < num_values; ++i) {
+        if (UNLIKELY(offset_cursor > cast_set<uint32_t>(length))) {
+            return Status::Corruption("Wrong data length in dictionary");
+        }
         uint32_t l = decode_fixed32_le(_dict.get() + offset_cursor);
+        if (UNLIKELY(offset_cursor + l > cast_set<uint32_t>(length) ||

Review Comment:
   This check still allows truncated dictionary pages to be read out of bounds. 
`decode_fixed32_le(_dict.get() + offset_cursor)` consumes 4 bytes, but the 
preceding check only rejects `offset_cursor > length`; for example `length = 2` 
and `num_values = 1` reaches the decode with only two bytes available. The 
payload check also compares `offset_cursor + l` before accounting for the 
4-byte prefix, so `length = 4, l = 1` passes here, then the second loop copies 
one byte from `dict_item_address + 4` before reporting corruption. Please 
validate `length - offset_cursor >= 4` before decoding the prefix and then 
validate `l <= length - offset_cursor` after advancing past the prefix, using 
subtraction or widened arithmetic to avoid overflow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to