wgtmac commented on PR #48468: URL: https://github.com/apache/arrow/pull/48468#issuecomment-4047682751
Our internal implementation actually does not count the buffered size for row group size estimation and sometimes the estimation is significantly imprecise (always smaller than the real size). But adding the buffered size could make it more complicated to predict, especially for wide columns. So I think we can split this PR into two: one for adding the APIs which are good for all cases, and the other one for the size estimation algorithm which is still in debate. WDYT? @wecharyu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
