JigaoLuo commented on issue #8378: URL: https://github.com/apache/arrow-rs/issues/8378#issuecomment-3311611589
Thanks for this nice issue, and hope it’s okay to share a few thoughts. In my rewriter, I’ve implemented a selector that lets me specify both row group size and page size (due to performance reasons again). From there, I perform **a brute-force search across the encoding space** to find the optimal encoding configuration. The search space is actually quite small—especially for integer types, where encoding options are limited. So rewriting something like TPC-H SF500 only takes a few minutes. Also, because of that, I’ve skipped heuristics and sampling entirely for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
