> If your problem is evaluating vector expressions just like the above > (i.e. without using transcendental functions like sin, exp, etc...), > usually the bottleneck is on memory access, so using several threads is > simply not going to help you achieving better performance, but rather > the contrary (you have to deal with the additional thread overhead). > So, frankly, I would not waste more time trying to paralelize that.
I had a feeling this would be the case, I just haven't been sure about what point this comes into play. I really need to do some tests to understand exactly how CPU load and memory bandwidth interplay in these situations. I have worked with GPUs before and often the reason the GPU is faster than the CPU is simply the higher memory bandwidth. > As an example, in the recent support of VML in numexpr we have disabled > the use of VML (as well as the OpenMP threading support that comes with > it) in cases like yours, where only additions and multiplications are > performed (these operations are very fast in modern processors, and the > sole bottleneck for this case is the memory bandwidth, as I've said). > However, in case of expressions containing operations like division or > transcendental functions, then VML activates automatically, and you can > make use of several cores if you want. So, if you are in this case, > and you have access to Intel MKL (the library that contains VML), you > may want to give numexpr a try. OK, this is very interesting indeed. I didn't know that numexpr has support for VML, which has openmp support. I will definitely have look at this. Thanks! Brian > HTH, > > -- > Francesc Alted > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion