On Thu, 15 Jul 2010 13:03:31 -0400, Jeff Squyres wrote:
> Given the oversubscription on the existing HT links, could contention
> account for the difference? (I have no idea how HT's contention
> management works) Meaning: if the stars line up in a given run, you
> could end up with very little/n
Given the oversubscription on the existing HT links, could contention account
for the difference? (I have no idea how HT's contention management works)
Meaning: if the stars line up in a given run, you could end up with very
little/no contention and you get good bandwidth. But if there's a bi
On Thu, 15 Jul 2010 09:36:18 -0400, Jeff Squyres wrote:
> Per my other disclaimer, I'm trolling through my disastrous inbox and
> finding some orphaned / never-answered emails. Sorry for the delay!
No problem, I should have followed up on this with further explanation.
> Just to be clear -- you
Per my other disclaimer, I'm trolling through my disastrous inbox and finding
some orphaned / never-answered emails. Sorry for the delay!
On Jun 2, 2010, at 4:36 PM, Jed Brown wrote:
> The nodes of interest are 4-socket Opteron 8380 (quad core, 2.5 GHz),
> connected
> with QDR InfiniBand. Th
Following up on this, I have partial resolution. The primary culprit
appears to be stale files in a ramdisk non-uniformly distributed across
the sockets, thus interactingly poorly with NUMA. The slow runs
invariably have high numa_miss and numa_foreign counts. I still have
trouble making it expl
I'm investigating some very large performance variation and have reduced
the issue to a very simple MPI_Allreduce benchmark. The variability
does not occur for serial jobs, but it does occur within single nodes.
I'm not at all convinced that this is an Open MPI-specific issue (in
fact the same var