Jeff Squyres wrote:
On May 18, 2009, at 11:49 PM, Bryan Lally wrote:
Ralph sent me a platform file and a corresponding .conf file. I built
ompi from openmpi-1.3.3a1r21223.tar.gz, with these files. I've been
running my normal tests and have been unable to hang a job yet. I've
run enough that I don't expect to see a problem.
That's both good and bad. :-)
Right!
Can you point out specifically which platform file is being used? If
that platform file is changing something from "not working" to
"working", it bears a bit closer examination to ensure that we aren't
just masking a bug.
Here's what we've found. It wasn't the platform file as such. I've
since built with ./configure and some standard, obvious command line
switches. What's then required is to edit the platform configuration
file, <prefix>/etc/openmpi-mca-params.conf and add:
coll_sync_priority = 100
coll_sync_barrier_before = 1000
--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico