etseidl commented on code in PR #8602:
URL: https://github.com/apache/arrow-rs/pull/8602#discussion_r2430393357
##########
parquet/src/column/reader.rs:
##########
@@ -569,11 +569,15 @@ fn parse_v1_level(
match encoding {
Encoding::RLE => {
let i32_size = std::mem::size_of::<i32>();
- let data_size = read_num_bytes::<i32>(i32_size, buf.as_ref()) as
usize;
- Ok((
- i32_size + data_size,
- buf.slice(i32_size..i32_size + data_size),
- ))
+ if i32_size <= buf.len() {
+ let data_size = read_num_bytes::<i32>(i32_size, buf.as_ref())
as usize;
+ let end =
+
i32_size.checked_add(data_size).ok_or(general_err!("invalid level length"))?;
+ if end <= buf.len() {
+ return Ok((end, buf.slice(i32_size..end)));
+ }
+ }
+ Err(general_err!("not enough data to read levels"))
Review Comment:
In this particular instance we're reading a buffer that *should* contain an
entire page of data. If it doesn't, that likely points to a problem with the
metadata.
Changes to `read_num_bytes` would likely need more careful consideration as
I suspect it might be used in some performance critical sections.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]