westonpace commented on pull request #9656: URL: https://github.com/apache/arrow/pull/9656#issuecomment-812077030
Also, thanks for doing all this. It's nice to see some improvement at least in some cases. Gives some good validation we aren't solving these tricky local cases for no reason. I also wouldn't worry too much about the low file cases. They could, in theory, improve with better intra-file parallelism but we aren't taking a very long time here in the first case and I feel that intra-file parallelism will always be less efficient than intra-file parallelism because it is breaking up processing of the same data across multiple threads. Something that can be overcome when the processing is expensive but maybe not so easily overcome here. That theory isn't finalized though and I may be completely wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org