I've been quite curious to try something like the f1200 as a potential replacement for our Altixes, which were bought predominantly for running single-threaded large-memory jobs.

we have an Altix as well, and I always cringe when I see a single-thread,
large-memory job running on it.  ours has 128p, 256G, and I think 6M/core 
caches.
so large-mem serial job, assuming uniform memory access, would have a hit rate of .00002289. and in any case, there is >800 GB/s of memory
bandwidth available, but at best 6.4 GB/s in use.  don't forget that the it2
is a fairly strict in-order chip, as well.

sure, perhaps a large-memory serial code has a small working set that fits in cache. but doesn't it strike you as strange to have a working set that's 1/40000 of the total footprint? I suspect that you could reformulate such a code as a "memory-extension" MPI job and avoid the need for custom hardware. (ie, let rank0 do all the work, and just operate a software cache of data fed by all the other ranks. of course,
this begs the question of whether the code _has_ to be serial...)

It is fairly easy (barring cost issues) to get a single system image machine with 8-16 processor cores and 128 GB ram. Beyond that, you need something like ScaleMP or a "proprietary" box to get more RAM.

I'm guessing ScaleMP is approximately the same speed as a user-level network-shared-memory implementation, but would love to see real numbers.

regards, mark hahn.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to