This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
     new 042d725888 Avoid infinite loop in bad parquet by checking the number 
of rep levels  (#6232)
042d725888 is described below

commit 042d725888358c73cd2a0d58868ea5c4bad778f7
Author: Jinpeng <[email protected]>
AuthorDate: Thu Aug 15 15:13:00 2024 -0700

    Avoid infinite loop in bad parquet by checking the number of rep levels  
(#6232)
    
    * check the number of rep levels read from page
    
    * minor fix on typo
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    * add check on record_read as well
    
    ---------
    
    Co-authored-by: jp0317 <[email protected]>
    Co-authored-by: Andrew Lamb <[email protected]>
---
 parquet/src/column/reader.rs | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/parquet/src/column/reader.rs b/parquet/src/column/reader.rs
index b40ca2b782..0c7cbb412a 100644
--- a/parquet/src/column/reader.rs
+++ b/parquet/src/column/reader.rs
@@ -240,6 +240,12 @@ where
                     let (mut records_read, levels_read) =
                         reader.read_rep_levels(out, remaining_records, 
remaining_levels)?;
 
+                    if records_read == 0 && levels_read == 0 {
+                        // The fact that we're still looping implies there 
must be some levels to read.
+                        return Err(general_err!(
+                            "Insufficient repetition levels read from column"
+                        ));
+                    }
                     if levels_read == remaining_levels && 
self.has_record_delimiter {
                         // Reached end of page, which implies records_read < 
remaining_records
                         // as otherwise would have stopped reading before 
reaching the end

Reply via email to