yordan-pavlov opened a new pull request #9588: URL: https://github.com/apache/arrow/pull/9588
While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound. Here is an example for such an iterator: ``` // iterator that doesn't declare (upper) size bound let string_iter = (0..).scan(0usize, |pos, i| { if *pos < 10 { *pos += 1; Some(Some(format!("value {}", i))) } else { // actually returns up to 10 values None } }) // limited using take() .take(100); ``` For even more details please see the new tests I have added in this PR. Fortunately this is easy to fix by using the length of the child offset array. @jorgecarleitao ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org