jonkeane commented on pull request #9615: URL: https://github.com/apache/arrow/pull/9615#issuecomment-841253893
Ok, I've run these benchmarks again. And we're seeing a massive improvement across the board (with one small exception). All of the simple types are ~50-75% faster. The naturalistic datasets are where this really shines though: Those are 85-90% faster. This is fantastic! # 🎊🚀🤯🚀🎊 There is one oddity (that I will dig into in a second) in the single-core tests: those *also* are faster (though this might be limited to only the strings case). The (relevant) way the benchmarks [set single versus multi core](https://github.com/ursacomputing/arrowbench/blob/c6453c8ef18e1f6c03b1e1542149e8adf6bbde95/R/run.R#L206-L213) is using `arrow:::SetCpuThreadPoolCapacity()` (the other values in there like setting the option `Ncpus` are intended to catch other libraries and shouldn't conflict, but I've highlighted them anyway). This is different than `option_use_threads`, though I would have expected that setting the thread pool capacity to one would have similar performance as before. Like I said above, I'll dig into this and see if it's actually using more than one core or if these optimizations _also_ optimized the single core case (though I'm a bit skeptical they could have optimized it *this much*) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org