[GitHub] [arrow-rs] alamb commented on a diff in pull request #2464: Push ChunkReader into SerializedPageReader (#2463)

GitBox Tue, 16 Aug 2022 14:52:07 -0700


alamb commented on code in PR #2464:
URL: https://github.com/apache/arrow-rs/pull/2464#discussion_r947279840



##########
parquet/src/file/serialized_reader.rs:
##########
@@ -471,234 +480,232 @@ pub(crate) fn decode_page(
     Ok(result)
 }
 
-enum SerializedPages<T: Read> {
-    /// Read entire chunk
-    Chunk { buf: T },
-    /// Read operate pages which can skip.
+enum SerializedPageReaderState {
+    Values {
+        /// The current byte offset in the reader
+        offset: usize,
+
+        /// The length of the chunk in bytes
+        remaining_bytes: usize,
+    },
     Pages {
-        offset_index: Vec<PageLocation>,
-        seen_num_data_pages: usize,
-        has_dictionary_page_to_read: bool,
-        page_bufs: VecDeque<T>,
+        /// Remaining page locations
+        page_locations: VecDeque<PageLocation>,
+        /// Remaining dictionary location if any
+        dictionary_page: Option<PageLocation>,
+        /// The total number of rows in this column chunk
+        total_rows: usize,
     },
 }
 
 /// A serialized implementation for Parquet [`PageReader`].
-pub struct SerializedPageReader<T: Read> {
-    // The file source buffer which references exactly the bytes for the 
column trunk
-    // to be read by this page reader.
-    buf: SerializedPages<T>,
+pub struct SerializedPageReader<R: ChunkReader> {

Review Comment:
   > Perhaps I should take the opportunity to make it crate private as part of 
slowly reducing the amount of implementation detail that leaks out of the crate?
   
   github codesearch 
https://cs.github.com/?q=SerializedPageReader%20language%3ARust&scopeName=All%20repos&scope=
 seems to suggest that most of the uses of this structure are forks of the 
arrow-rs codebase in various states of divergence.
   
   Not that github codesearch would find all the possible issues, but it is a 
good sanity check that this isn't widel used
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on a diff in pull request #2464: Push ChunkReader into SerializedPageReader (#2463)

Reply via email to