> If your problem is evaluating vector expressions just like the above
> (i.e. without using transcendental functions like sin, exp, etc...),
> usually the bottleneck is on memory access, so using several threads is
> simply not going to help you achieving better performance, but rather
> the contrary (you have to deal with the additional thread overhead).
> So, frankly, I would not waste more time trying to paralelize that.

I had a feeling this would be the case, I just haven't been sure about
what point this comes into play.  I really need to do some tests to
understand exactly how CPU load and memory bandwidth interplay in
these situations.  I have worked with GPUs before and often the reason
the GPU is faster than the CPU is simply the higher memory bandwidth.

> As an example, in the recent support of VML in numexpr we have disabled
> the use of VML (as well as the OpenMP threading support that comes with
> it) in cases like yours, where only additions and multiplications are
> performed (these operations are very fast in modern processors, and the
> sole bottleneck for this case is the memory bandwidth, as I've said).
> However, in case of expressions containing operations like division or
> transcendental functions, then VML activates automatically, and you can
> make use of several cores if you want.  So, if you are in this case,
> and you have access to Intel MKL (the library that contains VML), you
> may want to give numexpr a try.

OK, this is very interesting indeed.  I didn't know that numexpr has
support for VML, which has openmp support.  I will definitely have
look at this.  Thanks!

Brian

> HTH,
>
> --
> Francesc Alted
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to