JigaoLuo commented on issue #8358: URL: https://github.com/apache/arrow-rs/issues/8358#issuecomment-3311178080
I’ll continue collecting observations here gradually. One thing I’ve noticed after sorting a column for stat-pruning is that the sorted integer keys become quite dense—often resulting in an encoded size under 1MB. In such cases, compression doesn’t seem to provide much benefit. - This leads me to consider a potential heuristic: if **the encoded size** falls below a certain threshold, it might be better to skip compression altogether. Applying compression in these scenarios could add overhead without meaningful space savings. --- Example: <img width="1948" height="332" alt="Image" src="https://github.com/user-attachments/assets/cc65aa57-0f42-4f47-99dd-48d49fa4383e" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
