thinkharderdev commented on code in PR #2552: URL: https://github.com/apache/arrow-rs/pull/2552#discussion_r951378549
########## parquet/src/arrow/record_reader/mod.rs: ########## @@ -135,7 +135,11 @@ where loop { // Try to find some records from buffers that has been read into memory // but not counted as seen records. - let end_of_column = !self.column_reader.as_mut().unwrap().has_next()?; + + // Check to see if the column is exhausted. Only peek the next page since in + // case we are reading to a page boundary and do not actually need to read + // the next page. + let end_of_column = !self.column_reader.as_mut().unwrap().peek_next()?; Review Comment: I think this should ultimately get cleaned up a bit. It is confusing since we need to `peek_next` but also call `has_next` below (since the next page needs to get loaded). It feels like the page-level logic wants to be encapsulated inside `GenericColumnReader` but it inevitable leaks out in places like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org