wgtmac commented on PR #36510: URL: https://github.com/apache/arrow/pull/36510#issuecomment-1627572329
Yep, I understand your overall goal is to control the total memory used in the buffered stream. My main concern is that parameters like `buffer_size` which should be solely controlled by ReaderProperties now it is at large and runs away into the public API of reader. It is yet not easy for end users to tune the new parameter: e.g. they need to call `ComputeColumnChunkRange` that you have just exposed to get a basic idea of column chunk size and make a decision based on that. Moreover, people in the future may follow the same way to implement their hacky optimizations which tend to make the code base difficult to maintain. It seems to me the benefit it brings does not worth breaking the design. It would be more favorable if we can add a new setting to ReaderProperties to fine tune the behavior and let the file reader to optimize this internally without changing any public API on the reader side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
