Gael,

On Fri, Mar 05, 2010 at 10:51:12AM +0100, Gael Varoquaux wrote:
> On Fri, Mar 05, 2010 at 09:53:02AM +0100, Francesc Alted wrote:
> > Yeah, 10% of improvement by using multi-cores is an expected figure for
> > memory bound problems.  This is something people must know: if their
> > computations are memory bound (and this is much more common that one
> > may initially think), then they should not expect significant speed-ups
> > on their parallel codes.
> 
> Hey Francesc,
> 
> Any chance this can be different for NUMA (non uniform memory access)
> architectures? AMD multicores used to be NUMA, when I was still following
> these problems.

As far as I can tell, NUMA architectures work better accelerating
independent processes that run independently one of each other.  In
this case, hardware is in charge of putting closely-related data in
memory that is 'nearer' to each processor.  This scenario *could*
happen in truly parallel process too, but as I said, in general it
works best for independent processes (read multiuser machines).

> FWIW, I observe very good speedups on my problems (pretty much linear in
> the number of CPUs), and I have data parallel problems on fairly large
> data (~100Mo a piece, doesn't fit in cache), with no synchronisation at
> all between the workers. CPUs are Intel Xeons.

Maybe your processes are not as memory-bound as you think.  Do you get
much better speed-up by using NUMA than a simple multi-core machine
with one single path to memory?  I don't think so, but maybe I'm wrong
here.

Francesc
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to