yordan-pavlov opened a new pull request #9588:
URL: https://github.com/apache/arrow/pull/9588


   While looking for a way to make loading array data from parquet files 
faster, I stumbled on an edge case where string and binary arrays are created 
with an incorrect length from an iterator with no upper bound.
   
   Here is an example for such an iterator:
   
   ```
    // iterator that doesn't declare (upper) size bound
   let string_iter = (0..).scan(0usize, |pos, i| { 
       if *pos < 10 {
           *pos += 1;
           Some(Some(format!("value {}", i)))
       }
       else {
            // actually returns up to 10 values
            None
        }
   })
   // limited using take()
   .take(100);
   ```
   
   For even more details please see the new tests I have added in this PR.
   Fortunately this is easy to fix by using the length of the child offset 
array.
   
   @jorgecarleitao


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to