tustvold commented on code in PR #2464: URL: https://github.com/apache/arrow-rs/pull/2464#discussion_r947277659
########## parquet/src/file/serialized_reader.rs: ########## @@ -471,234 +480,232 @@ pub(crate) fn decode_page( Ok(result) } -enum SerializedPages<T: Read> { - /// Read entire chunk - Chunk { buf: T }, - /// Read operate pages which can skip. +enum SerializedPageReaderState { + Values { + /// The current byte offset in the reader + offset: usize, + + /// The length of the chunk in bytes + remaining_bytes: usize, + }, Pages { - offset_index: Vec<PageLocation>, - seen_num_data_pages: usize, - has_dictionary_page_to_read: bool, - page_bufs: VecDeque<T>, + /// Remaining page locations + page_locations: VecDeque<PageLocation>, + /// Remaining dictionary location if any + dictionary_page: Option<PageLocation>, + /// The total number of rows in this column chunk + total_rows: usize, }, } /// A serialized implementation for Parquet [`PageReader`]. -pub struct SerializedPageReader<T: Read> { - // The file source buffer which references exactly the bytes for the column trunk - // to be read by this page reader. - buf: SerializedPages<T>, +pub struct SerializedPageReader<R: ChunkReader> { Review Comment: > This is a non trivial change, right? It brings SerializedPageReader into line with all the other readers, so I'm not really sure I agree that this is a major change. I suspect almost all users are using `RowGroupReader::get_column_page_reader` and not calling this constructor. Tbh I'm not entirely sure why this method is even public... Perhaps I should take the opportunity to make it crate private? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org