JigaoLuo commented on issue #8378:
URL: https://github.com/apache/arrow-rs/issues/8378#issuecomment-3311611589

   Thanks for this nice issue, and hope it’s okay to share a few thoughts.
   
   In my rewriter, I’ve implemented a selector that lets me specify both row 
group size and page size (due to performance reasons again). From there, I 
perform **a brute-force search across the encoding space** to find the optimal 
encoding configuration. The search space is actually quite small—especially for 
integer types, where encoding options are limited. So rewriting something like 
TPC-H SF500 only takes a few minutes. Also, because of that, I’ve skipped 
heuristics and sampling entirely for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to