tustvold commented on PR #1977: URL: https://github.com/apache/arrow-rs/pull/1977#issuecomment-1172493031
I think small incremental PRs is a good approach. However, I have concerns with this specific PR: * It introduces public APIs that don't have clear semantics (the skipped rows are somewhat arbitrary) * I would prefer an approach that collocates the page and row skipping logic, instead of treating them as separate concerns. Once RecordReader is skipping rows it will be incredibly confusing if pages are being skipped somewhere else in addition I wonder if a plan of attack akin to the following might work: * Add a skip page function to SerializedPageReader that uses the column index to skip the next page without reading it (we may need to change it to take ChunkReader instead of Read) * Same as the above for InMemoryPageReader * Add the ability to skip decoding rows to ColumnValueDecoder * Pass index and row selection down to RecordReader * Perform skipping Currently it feels like we're adding the high-level functionality before the necessary lower level functionality exists... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
