Re: [PR] Change some panics to errors in parquet decoder [arrow-rs]

via GitHub Sat, 18 Oct 2025 09:09:20 -0700


etseidl commented on code in PR #8602:
URL: https://github.com/apache/arrow-rs/pull/8602#discussion_r2430393357



##########
parquet/src/column/reader.rs:
##########
@@ -569,11 +569,15 @@ fn parse_v1_level(
     match encoding {
         Encoding::RLE => {
             let i32_size = std::mem::size_of::<i32>();
-            let data_size = read_num_bytes::<i32>(i32_size, buf.as_ref()) as 
usize;
-            Ok((
-                i32_size + data_size,
-                buf.slice(i32_size..i32_size + data_size),
-            ))
+            if i32_size <= buf.len() {
+                let data_size = read_num_bytes::<i32>(i32_size, buf.as_ref()) 
as usize;
+                let end =
+                    
i32_size.checked_add(data_size).ok_or(general_err!("invalid level length"))?;
+                if end <= buf.len() {
+                    return Ok((end, buf.slice(i32_size..end)));
+                }
+            }
+            Err(general_err!("not enough data to read levels"))

Review Comment:
   In this particular instance we're reading a buffer that *should* contain an 
entire page of data. If it doesn't, that likely points to a problem with the 
metadata.
   
   Changes to `read_num_bytes` would likely need more careful consideration as 
I suspect it might be used in some performance critical sections.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Change some panics to errors in parquet decoder [arrow-rs]

Reply via email to