thinkharderdev commented on code in PR #2271: URL: https://github.com/apache/arrow-rs/pull/2271#discussion_r934953607
########## parquet/src/arrow/arrow_reader.rs: ########## @@ -283,42 +284,112 @@ pub struct ParquetRecordBatchReader { selection: Option<VecDeque<RowSelection>>, } +impl ParquetRecordBatchReader { + pub fn next_selection( + &mut self, + selection: &mut VecDeque<RowSelection>, + ) -> Option<ArrowResult<RecordBatch>> { + let mut buffer: Vec<ArrayRef> = vec![]; + let mut selected = false; + while let Some(front) = selection.pop_front() { + if front.skip { + let skipped = match self.array_reader.skip_records(front.row_count) { + Ok(skipped) => skipped, + Err(e) => { + return Some(Err(e.into())); + } + }; + + // TODO Why does this cause problems? + // if skipped != front.row_count { + // return Some(Err(general_err!( + // "failed to skip rows, expected {}, got {}", + // front.row_count, + // skipped + // ) + // .into())); + // } Review Comment: I'm not sure what's up here. This is the correct assertion to make here but I think there is a bug somewhere in the `skip` logic that skips the right values but gets the accounting wrong. Leaving this check in causes issues and leaving it out seems to produce the correct results (if the selections are created correctly). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org