Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19685 @srowen I tried enabling native BLAS, but native BLAS implementation is still much slower: average on 10 runs is 2529,922753 ms against 515,510185 ms of the for loop. As a reference, I am using a OSX 2.5 GHz Intel Core i7. What is worth to notice, though, is that I tried to run the same code but performing the `toArray` before, thus excluding its time from the computation. In this case, native BLAS implementation is much faster: 100,969697 ms. Thus here the "performance killer" is the conversion to array, as you pointed out. @WeichenXu123 In the description od the PR and here you can see the tests I made. Do you think something else is needed?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org