Gael, On Fri, Mar 05, 2010 at 10:51:12AM +0100, Gael Varoquaux wrote: > On Fri, Mar 05, 2010 at 09:53:02AM +0100, Francesc Alted wrote: > > Yeah, 10% of improvement by using multi-cores is an expected figure for > > memory bound problems. This is something people must know: if their > > computations are memory bound (and this is much more common that one > > may initially think), then they should not expect significant speed-ups > > on their parallel codes. > > Hey Francesc, > > Any chance this can be different for NUMA (non uniform memory access) > architectures? AMD multicores used to be NUMA, when I was still following > these problems.
As far as I can tell, NUMA architectures work better accelerating independent processes that run independently one of each other. In this case, hardware is in charge of putting closely-related data in memory that is 'nearer' to each processor. This scenario *could* happen in truly parallel process too, but as I said, in general it works best for independent processes (read multiuser machines). > FWIW, I observe very good speedups on my problems (pretty much linear in > the number of CPUs), and I have data parallel problems on fairly large > data (~100Mo a piece, doesn't fit in cache), with no synchronisation at > all between the workers. CPUs are Intel Xeons. Maybe your processes are not as memory-bound as you think. Do you get much better speed-up by using NUMA than a simple multi-core machine with one single path to memory? I don't think so, but maybe I'm wrong here. Francesc _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion