[GitHub] [arrow] jp0317 commented on a diff in pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

via GitHub Sun, 16 Jul 2023 12:23:57 -0700


jp0317 commented on code in PR #36510:
URL: https://github.com/apache/arrow/pull/36510#discussion_r1264734613



##########
cpp/src/parquet/file_reader.h:
##########
@@ -44,7 +44,8 @@ class PARQUET_EXPORT RowGroupReader {
   // An implementation of the Contents class is defined in the .cc file
   struct Contents {
     virtual ~Contents() {}
-    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i) = 0;
+    virtual std::unique_ptr<PageReader> GetColumnPageReader(

Review Comment:
   Thanks for the review. Regarding `GetColumnChunkRange`, is there any concern 
exposing it? IIUC currently users can only rely on `total_compressed_size` 
which  reveals no offset information and may not reflect the actual chunk size .
   
   For the `ColumnReaderProperties`,  given that the reader apis are all index 
based,  maybe we can just use index  (as mapleFU suggested) without involving 
column paths, especially a map on path strings? Initially i was trying to avoid 
keeping such a map in `ReaderProperties`, and more importantly, i feel it makes 
sense to implement this customized buffer size as "column chunk specific": 
different column chunks from the same column can have different buffer size.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jp0317 commented on a diff in pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

Reply via email to