[GitHub] [arrow] wgtmac commented on pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

via GitHub Sat, 08 Jul 2023 19:08:03 -0700


wgtmac commented on PR #36510:
URL: https://github.com/apache/arrow/pull/36510#issuecomment-1627572329


   Yep, I understand your overall goal is to control the total memory used in 
the buffered stream.
   
   My main concern is that parameters like `buffer_size` which should be solely 
controlled by ReaderProperties now it is at large and runs away into the public 
API of reader. It is yet not easy for end users to tune the new parameter: e.g. 
they need to call `ComputeColumnChunkRange` that you have just exposed to get a 
basic idea of column chunk size and make a decision based on that. Moreover, 
people in the future may follow the same way to implement their hacky 
optimizations which tend to make the code base difficult to maintain. It seems 
to me the benefit it brings does not worth breaking the design.
   
   It would be more favorable if we can add a new setting to ReaderProperties 
to fine tune the behavior and let the file reader to optimize this internally 
without changing any public API on the reader side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wgtmac commented on pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

Reply via email to