Re: [OMPI users] busy wait in MPI_Recv
Brian Budge wrote: Hi all - I just ran a small test to find out the overhead of an MPI_Recv call when no communication is occurring. It seems quite high. I noticed during my google excursions that openmpi does busy waiting. I also noticed that the option to -mca mpi_yield_when_idle seems not to help much (in fact, turning on the yield seems only to slow down the program). What is the best way to reduce this polling cost during low-communication invervals? Should I write my own recv loop that sleeps for short periods? I don't want to go write someone that is possibly already done much better in the library :) I think this has been discussed a variety of times before on this list. Yes, OMPI does busy wait. Turning on the MCA yield parameter can help some. There will still be a load, but one that defers somewhat to other loads. In any case, even with yield, a wait is still relatively intrusive. You might have some luck writing something like this yourself, particularly if you know you'll be idle long periods.
[OMPI users] busy wait in MPI_Recv
Hi all - I just ran a small test to find out the overhead of an MPI_Recv call when no communication is occurring. It seems quite high. I noticed during my google excursions that openmpi does busy waiting. I also noticed that the option to -mca mpi_yield_when_idle seems not to help much (in fact, turning on the yield seems only to slow down the program). What is the best way to reduce this polling cost during low-communication invervals? Should I write my own recv loop that sleeps for short periods? I don't want to go write someone that is possibly already done much better in the library :) Thanks, Brian
Re: [OMPI users] my leak or OpenMPI's leak?
yes, sorry. I did mean 1.5. In my case, going back to 1.43 solved my oom problem. On Sun, Oct 17, 2010 at 4:57 PM, Ralph Castain wrote: > There is no OMPI 2.5 - do you mean 1.5? > > On Oct 17, 2010, at 4:11 PM, Brian Budge wrote: > >> Hi Jody - >> >> I noticed this exact same thing the other day when I used OpenMPI v >> 2.5 built with valgrind support. I actually ran out of memory due to >> this. When I went back to v 2.43, my program worked fine. >> >> Are you also using 2.5? >> >> Brian >> >> On Wed, Oct 6, 2010 at 4:32 AM, jody wrote: >>> Hi >>> I regularly use valgrind to check for leaks, but i ignore the leaks >>> clearly created by OpenMPI, >>> because i think most of them happen because of efficiency (lose no >>> time cleaning up unimportant leaks). >>> But i want to make sure no leaks come from my own apps. >>> In most of the cases, leaks i am responsible for have the name of one >>> of my files at the bottom of the stack printed by valgrind, >>> and no internal OpenMPI-calls above, whereas leaks clearly caused by >>> OpenMPI have something like >>> ompi_mpi_init, mca_pml_base_open, PMPI_Init etc at or very near the bottom. >>> >>> Now i have an application where i am completely unsure where the >>> responsibility for a particular leak lies. valgrind shows (among >>> others) this report >>> >>> ==2756== 9,704 (8,348 direct, 1,356 indirect) bytes in 1 blocks are >>> definitely lost in loss record 2,033 of 2,036 >>> ==2756== at 0x4005943: malloc (vg_replace_malloc.c:195) >>> ==2756== by 0x4049387: ompi_free_list_grow (in >>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2) >>> ==2756== by 0x41CA613: ??? >>> ==2756== by 0x41BDD91: ??? >>> ==2756== by 0x41B0C3D: ??? >>> ==2756== by 0x408AC9C: PMPI_Send (in >>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2) >>> ==2756== by 0x8123377: ConnectorBase::send(CollectionBase*, >>> std::pair, >>> std::pair >&) (ConnectorBase.cpp:39) >>> ==2756== by 0x8123CEE: TileConnector::sendTile() (TileConnector.cpp:36) >>> ==2756== by 0x80C6839: TDMaster::init(int, char**) (TDMaster.cpp:226) >>> ==2756== by 0x80C167B: main (TDMain.cpp:24) >>> ==2756== >>> >>> At a first glimpse it looks like an OpenMPI-internal leak, >>> because it happens iinside PMPI_Send, >>> but then i am using the function ConnectorBase::send() >>> several times from other callers than TileConnector, >>> but these don't show up in valgrind's output. >>> >>> Does anybody have an idea what is happening here? >>> >>> Thank You >>> jody >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] openmpi 1.5 build from rpm fails: --program-prefix now checked in configure
Thanks for the report. Someone reported pretty much the same issue to me off-list a few days ago for RHEL5. It looks like RHEL5 / 6 ship with Autoconf 2.63, and have a /usr/lib/rpm/macros that defines %configure to include options such as --program-suffix. We bootstrapped Open MPI v1.5 with Autoconf 2.65, which does not understand the --program-suffix option. I don't know why AC 2.65 dropped the --program-suffix option, but this seems to be where we are. I've emailed a contact at Red Hat asking for advice on what to do here -- I can't imagine Open MPI is the only package in this situation. On Oct 19, 2010, at 4:47 AM, livelfs wrote: > Hi > this is to report that building openmpi-1.5 from rpm fails on Linux > SLES10sp3 x86_64, > due to --program-prefix switch use now checked in configure script > delivered with 1.5. > > rpm is version 4.4.2-43.36.1 > > rpmbuild --rebuild SRPMS/openmpi-1.5.0.src.rpm --define > 'configure_options CC="/softs/gcc/4.5.1/bin/gcc > " CXX="/softs/gcc/4.5.1/bin/g++ > " F77="/softs/gcc/4.5.1/bin/gfortran > " FC="/softs/gcc/4.5.1/bin/gfortran " > --prefix=/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS > --enable-static --enable-shared > --with-wrapper-ldflags="-Wl,-rpath > -Wl,/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS/lib64 > -Wl,-rpath -Wl,/softs/blcr/0.8/lib" > --with-memory-manager=ptmalloc2 > --enable-orterun-prefix-by-default --with-openib > --disable-ipv6 --with-ft=cr > --enable-ft-thread --enable-mpi-threads > --with-blcr=/softs/blcr/0.8 > --enable-mpirun-prefix-by-default > --with-tm=/opt/pbs/default > --with-wrapper-libs="-lpthread -lutil -lrt"' --define '_prefix > /opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_name > openmpi_gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_topdir /scratch' > --define '_unpackaged_files_terminate_build 0' --define > 'use_default_rpm_opt_flags 0' > > ends with: > [...] > configure: WARNING: *** This configure script does not support > --program-prefix, --program-suffix or --program-transform-name. Users > are recommended to instead use --prefix with a unique directory and make > symbolic links as desired for renaming. > configure: error: *** Cannot continue > > > In the present environment (SLES10sp3 x86_64, rpm 4.4.2-43.36.1), > rpmbuild --rebuild produces and execs a temporary shell script calling > configure > with an *empty* --program-prefix switch (--program-prefix=). > > It works with openmpi 1.4.3 > but configure script from openmpi 1.5 is more picky about using > --program-prefix, --program-suffix or --program-transform-name: > > # diff /usr/src/packages/SOURCES/openmpi-1.5/configure > /usr/src/packages/SOURCES/openmpi-1.4.3/configure | grep program-prefix > < # Suggestion from Paul Hargrove to disable --program-prefix and > < { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: *** This > configure script does not support --program-prefix, --program-suffix or > --program-transform-name. Users are recommended to instead use --prefix > with a unique directory and make symbolic links as desired for > renaming." >&5 > < $as_echo "$as_me: WARNING: *** This configure script does not support > --program-prefix, --program-suffix or --program-transform-name. Users > are recommended to instead use --prefix with a unique directory and make > symbolic links as desired for renaming." >&2;} > > If I remove the new control on --program-prefix in openmpi-1.5 configure > script, the 1.5 build becomes OK. > > Regards, > Stephane Rouberol > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] Open MPI dynamic data structure error
Hi, I need to design a data structure to transfer data between nodes on Open MPI system. Some elements of the the structure has dynamic size. For example, typedef struct{ double data1;vector dataVec; } myDataType; The size of the dataVec depends on some intermidiate computing results. If I only declear it as the above myDataType, I think, only a pointer is transfered. When the data receiver try to access the elements of vector dataVec, it got segmentation fault error. But, I also need to use the myDataType to declear other data structures. such as vector newDataVec; I cannot declear myDataType in a function , sucjh as main(), I got errors: main.cpp:200: error: main(int, char**)::myDataType; uses local type main(int, char**)::myDataType; Any help is really appreciated. thanks Jack Oct. 19 2010
Re: [OMPI users] a question about [MPI]IO on systems without network filesystem
As Rob mentions There are three capabilities to consider: 1) The process (or processes) that will do the I/O are members of the file handle's hidden communicator and the call is collective 2)) The process (or processes) that will do the I/O are members of the file handle's hidden communicator but the call is non-collective and made by a remote rank 3) The process (or processes) that will do the I/O are not members. The MPI_COMM_SELF mention would probably be this second case. Number 2 & 3 are harder but still an implementation option. The standard does not require or prohibit them. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 From: Rob Latham To: Open MPI Users List-Post: users@lists.open-mpi.org Date: 10/19/2010 02:47 PM Subject: Re: [OMPI users] a question about [MPI]IO on systemswithout network filesystem Sent by: users-boun...@open-mpi.org On Thu, Sep 30, 2010 at 09:00:31AM -0400, Richard Treumann wrote: > It is possible for MPI-IO to be implemented in a way that lets a single > process or the set of process on a node act as the disk i/O agents for the > entire job but someone else will need to tell you if OpenMPI can do this, > I think OpenMPI built on the ROMIO MPI-IO implementation and based on my > outdated knowledge of ROMIO, I would be a bit surprised if it has his > option. SURPRISE!!! ROMIO has been able to do this since about 2002 (It was my first ROMIO project when I came to Argonne). now, if you do independent i/o or you do i/o on comm_self, then ROMIO can't really do anything for you. But... - if you use collective I/O - and you set the "cb_config_list" to contain the machine name of the one node with a disk (or if everyone has a disk, pick one to be the master) - and you set "romio_no_indep_rw" to "enable" then two things will happen. first, ROMIO will enter "deferred open" mode, meaning only the designated I/O aggregators will open the file. second, your collective MPI_File_*_all calls will all go through the one node you gave in the cb_config_list. Try it and if it does/doesn't work, I'd like to hear. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] a question about [MPI]IO on systems without network filesystem
On Thu, Sep 30, 2010 at 09:00:31AM -0400, Richard Treumann wrote: > It is possible for MPI-IO to be implemented in a way that lets a single > process or the set of process on a node act as the disk i/O agents for the > entire job but someone else will need to tell you if OpenMPI can do this, > I think OpenMPI built on the ROMIO MPI-IO implementation and based on my > outdated knowledge of ROMIO, I would be a bit surprised if it has his > option. SURPRISE!!! ROMIO has been able to do this since about 2002 (It was my first ROMIO project when I came to Argonne). now, if you do independent i/o or you do i/o on comm_self, then ROMIO can't really do anything for you. But... - if you use collective I/O - and you set the "cb_config_list" to contain the machine name of the one node with a disk (or if everyone has a disk, pick one to be the master) - and you set "romio_no_indep_rw" to "enable" then two things will happen. first, ROMIO will enter "deferred open" mode, meaning only the designated I/O aggregators will open the file. second, your collective MPI_File_*_all calls will all go through the one node you gave in the cb_config_list. Try it and if it does/doesn't work, I'd like to hear. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
Re: [OMPI users] Number of processes and spawn
The fix should be there - just didn't get mentioned. Let me know if it isn't and I'll ensure it is in the next one...but I'd be very surprised if it isn't already in there. On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote: > Hi Ralf ! > > I saw that the new realease 1.5 is out. > I didn't found this fix in the "list of changes", is it present but not > mentioned since is a minor fix ? > > Thank you, > Federico > > > > 2010/4/1 Ralph Castain > Hi there! > > It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the fix). I > understand that will come out sometime soon, but no firm date has been set. > > > On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote: > >> Hi Ralph, >> >> >> I've downloaded and tested the openmpi-1.7a1r22817 snapshot, >> and it works fine for (multiple) spawning more than 128 processes. >> >> That fix will be included in the next release of OpenMPI, right ? >> Do you when it will be released ? Or where I can find that info ? >> >> Thank you, >> Federico >> >> >> >> 2010/3/1 Ralph Castain >> http://www.open-mpi.org/nightly/trunk/ >> >> I'm not sure this patch will solve your problem, but it is worth a try. >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Number of processes and spawn
Hi Ralf ! I saw that the new realease 1.5 is out. I didn't found this fix in the "list of changes", is it present but not mentioned since is a minor fix ? Thank you, Federico 2010/4/1 Ralph Castain > Hi there! > > It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the fix). > I understand that will come out sometime soon, but no firm date has been > set. > > > On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote: > > Hi Ralph, > > > I've downloaded and tested the openmpi-1.7a1r22817 snapshot, > and it works fine for (multiple) spawning more than 128 processes. > > That fix will be included in the next release of OpenMPI, right ? > Do you when it will be released ? Or where I can find that info ? > > Thank you, > Federico > > > > 2010/3/1 Ralph Castain > >> http://www.open-mpi.org/nightly/trunk/ >> >> I'm not sure this patch will solve your problem, but it is worth a try. >> >> >> >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] openmpi 1.5 build from rpm fails: --program-prefix now checked in configure
Hi this is to report that building openmpi-1.5 from rpm fails on Linux SLES10sp3 x86_64, due to --program-prefix switch use now checked in configure script delivered with 1.5. rpm is version 4.4.2-43.36.1 rpmbuild --rebuild SRPMS/openmpi-1.5.0.src.rpm --define 'configure_options CC="/softs/gcc/4.5.1/bin/gcc " CXX="/softs/gcc/4.5.1/bin/g++ " F77="/softs/gcc/4.5.1/bin/gfortran " FC="/softs/gcc/4.5.1/bin/gfortran " --prefix=/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS --enable-static --enable-shared --with-wrapper-ldflags="-Wl,-rpath -Wl,/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS/lib64 -Wl,-rpath -Wl,/softs/blcr/0.8/lib" --with-memory-manager=ptmalloc2 --enable-orterun-prefix-by-default --with-openib --disable-ipv6 --with-ft=cr --enable-ft-thread --enable-mpi-threads --with-blcr=/softs/blcr/0.8 --enable-mpirun-prefix-by-default --with-tm=/opt/pbs/default --with-wrapper-libs="-lpthread -lutil -lrt"' --define '_prefix /opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_name openmpi_gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_topdir /scratch' --define '_unpackaged_files_terminate_build 0' --define 'use_default_rpm_opt_flags 0' ends with: [...] configure: WARNING: *** This configure script does not support --program-prefix, --program-suffix or --program-transform-name. Users are recommended to instead use --prefix with a unique directory and make symbolic links as desired for renaming. configure: error: *** Cannot continue In the present environment (SLES10sp3 x86_64, rpm 4.4.2-43.36.1), rpmbuild --rebuild produces and execs a temporary shell script calling configure with an *empty* --program-prefix switch (--program-prefix=). It works with openmpi 1.4.3 but configure script from openmpi 1.5 is more picky about using --program-prefix, --program-suffix or --program-transform-name: # diff /usr/src/packages/SOURCES/openmpi-1.5/configure /usr/src/packages/SOURCES/openmpi-1.4.3/configure | grep program-prefix < # Suggestion from Paul Hargrove to disable --program-prefix and < { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: *** This configure script does not support --program-prefix, --program-suffix or --program-transform-name. Users are recommended to instead use --prefix with a unique directory and make symbolic links as desired for renaming." >&5 < $as_echo "$as_me: WARNING: *** This configure script does not support --program-prefix, --program-suffix or --program-transform-name. Users are recommended to instead use --prefix with a unique directory and make symbolic links as desired for renaming." >&2;} If I remove the new control on --program-prefix in openmpi-1.5 configure script, the 1.5 build becomes OK. Regards, Stephane Rouberol