wesm commented on pull request #8188:
URL: https://github.com/apache/arrow/pull/8188#issuecomment-692852532


   In terms of benchmarking, it also strikes me that one issue is that it may 
be faster (especially on machines with a lot of cores -- e.g. 16/20 core 
servers) to read a 2-file (or even n-file where n is some number less than the 
number of cores on the machine) dataset by reading the files one at a time 
rather than using the datasets API. How many files do you have to have before 
the performance issue goes away? This is something that would be good to 
quantify in a collection of benchmarks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to