[GitHub] [arrow-rs] tustvold commented on pull request #1977: Enable serialized_reader read specific Page by passing row ranges.

GitBox Fri, 01 Jul 2022 09:05:07 -0700


tustvold commented on PR #1977:
URL: https://github.com/apache/arrow-rs/pull/1977#issuecomment-1172493031


   I think small incremental PRs is a good approach. However, I have concerns 
with this specific PR:
   
   * It introduces public APIs that don't have clear semantics (the skipped 
rows are somewhat arbitrary)
   * I would prefer an approach that collocates the page and row skipping 
logic, instead of treating them as separate concerns. Once RecordReader is 
skipping rows it will be incredibly confusing if pages are being skipped 
somewhere else in addition
   
   I wonder if a plan of attack akin to the following might work:
   
   * Add a skip page function to SerializedPageReader that uses the column 
index to skip the next page without reading it (we may need to change it to 
take ChunkReader instead of Read)
   * Same as the above for InMemoryPageReader
   * Add the ability to skip decoding rows to ColumnValueDecoder
   * Pass index and row selection down to RecordReader
   * Perform skipping
   
   Currently it feels like we're adding the high-level functionality before the 
necessary lower level functionality exists...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on pull request #1977: Enable serialized_reader read specific Page by passing row ranges.

Reply via email to