jonkeane commented on pull request #9615:
URL: https://github.com/apache/arrow/pull/9615#issuecomment-841253893


   Ok, I've run these benchmarks again. And we're seeing a massive improvement 
across the board (with one small exception). All of the simple types are 
~50-75% faster. 
   
   The naturalistic datasets are where this really shines though: Those are 
85-90% faster.
   
   This is fantastic!
   
   # 🎊🚀🤯🚀🎊
   
   There is one oddity (that I will dig into in a second) in the single-core 
tests: those *also* are faster (though this might be limited to only the 
strings case). The (relevant) way the benchmarks [set single versus multi 
core](https://github.com/ursacomputing/arrowbench/blob/c6453c8ef18e1f6c03b1e1542149e8adf6bbde95/R/run.R#L206-L213)
 is using `arrow:::SetCpuThreadPoolCapacity()` (the other values in there like 
setting the option `Ncpus` are intended to catch other libraries and shouldn't 
conflict, but I've highlighted them anyway). This is different than 
`option_use_threads`, though I would have expected that setting the thread pool 
capacity to one would have similar performance as before. Like I said above, 
I'll dig into this and see if it's actually using more than one core or if 
these optimizations _also_ optimized the single core case (though I'm a bit 
skeptical they could have optimized it *this much*)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to