Oliver,

Thank you for this summary insight.  This substantially affects the
structural design of software implementations, which points to a new
analysis "opportunity" in our software.

Ken Lloyd

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Oliver Geisler
Sent: Thursday, April 22, 2010 9:38 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

To sum up and give an update:

The extended communication times while using shared memory communication of
openmpi processes are caused by openmpi session directory laying on the
network via NFS.

The problem is resolved by establishing on each diskless node a ramdisk or
mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to
the according mountpoint shared memory communication and its files are kept
local, thus decreasing the communication times by magnitudes.

The relation of the problem to the kernel version is not really resolved,
but maybe not "the problem" in this respect.
My benchmark is now running fine on a single node with 4 CPU, kernel
2.6.33.1 and openmpi 1.4.1.
Running on multiple nodes I experience still higher (TCP) communication
times than I would expect. But that requires me some more deep researching
the issue (e.g. collisions on the network) and should probably posted to a
new thread.

Thank you guys for your help.

oli

--
This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to