I've been quite curious to try something like the f1200 as a potential
replacement for our Altixes, which were bought predominantly for running
single-threaded large-memory jobs.
we have an Altix as well, and I always cringe when I see a single-thread,
large-memory job running on it. ours has 128p, 256G, and I think 6M/core
caches.
so large-mem serial job, assuming uniform memory access, would have a
hit rate of .00002289. and in any case, there is >800 GB/s of memory
bandwidth available, but at best 6.4 GB/s in use. don't forget that the it2
is a fairly strict in-order chip, as well.
sure, perhaps a large-memory serial code has a small working set that
fits in cache. but doesn't it strike you as strange to have a
working set that's 1/40000 of the total footprint? I suspect that you
could reformulate such a code as a "memory-extension" MPI job and avoid
the need for custom hardware. (ie, let rank0 do all the work, and just
operate a software cache of data fed by all the other ranks. of course,
this begs the question of whether the code _has_ to be serial...)
It is fairly easy (barring cost issues) to get a single system image machine
with 8-16 processor cores and 128 GB ram. Beyond that, you need something
like ScaleMP or a "proprietary" box to get more RAM.
I'm guessing ScaleMP is approximately the same speed as a user-level
network-shared-memory implementation, but would love to see real numbers.
regards, mark hahn.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf