Re: [OMPI users] busy wait in MPI_Recv

2010-10-19 Thread Eugene Loh

Brian Budge wrote:


Hi all -

I just ran a small test to find out the overhead of an MPI_Recv call
when no communication is occurring.   It seems quite high.  I noticed
during my google excursions that openmpi does busy waiting.  I also
noticed that the option to -mca mpi_yield_when_idle seems not to help
much (in fact, turning on the yield seems only to slow down the
program).  What is the best way to reduce this polling cost during
low-communication invervals?  Should I write my own recv loop that
sleeps for short periods?  I don't want to go write someone that is
possibly already done much better in the library :)
 


I think this has been discussed a variety of times before on this list.

Yes, OMPI does busy wait.

Turning on the MCA yield parameter can help some.  There will still be a 
load, but one that defers somewhat to other loads.  In any case, even 
with yield, a wait is still relatively intrusive.


You might have some luck writing something like this yourself, 
particularly if you know you'll be idle long periods.


[OMPI users] busy wait in MPI_Recv

2010-10-19 Thread Brian Budge
Hi all -

I just ran a small test to find out the overhead of an MPI_Recv call
when no communication is occurring.   It seems quite high.  I noticed
during my google excursions that openmpi does busy waiting.  I also
noticed that the option to -mca mpi_yield_when_idle seems not to help
much (in fact, turning on the yield seems only to slow down the
program).  What is the best way to reduce this polling cost during
low-communication invervals?  Should I write my own recv loop that
sleeps for short periods?  I don't want to go write someone that is
possibly already done much better in the library :)

Thanks,
  Brian


Re: [OMPI users] my leak or OpenMPI's leak?

2010-10-19 Thread Brian Budge
yes, sorry.  I did mean 1.5.  In my case, going back to 1.43 solved my
oom problem.

On Sun, Oct 17, 2010 at 4:57 PM, Ralph Castain  wrote:
> There is no OMPI 2.5 - do you mean 1.5?
>
> On Oct 17, 2010, at 4:11 PM, Brian Budge wrote:
>
>> Hi Jody -
>>
>> I noticed this exact same thing the other day when I used OpenMPI v
>> 2.5 built with valgrind support.  I actually ran out of memory due to
>> this.  When I went back to v 2.43, my program worked fine.
>>
>> Are you also using 2.5?
>>
>>  Brian
>>
>> On Wed, Oct 6, 2010 at 4:32 AM, jody  wrote:
>>> Hi
>>> I regularly use valgrind to check for leaks, but i ignore the leaks
>>> clearly created by OpenMPI,
>>> because i think most of them happen because of efficiency (lose no
>>> time cleaning up unimportant leaks).
>>> But i want to make sure no leaks come from my own apps.
>>> In most of the cases, leaks i am responsible for have the name of one
>>> of my files at the bottom of the stack printed by valgrind,
>>> and no internal OpenMPI-calls above, whereas leaks clearly caused by
>>> OpenMPI have something like
>>> ompi_mpi_init, mca_pml_base_open, PMPI_Init etc at or very near the bottom.
>>>
>>> Now i have an application where i am completely unsure where the
>>> responsibility for a particular leak lies. valgrind  shows (among
>>> others) this report
>>>
>>> ==2756== 9,704 (8,348 direct, 1,356 indirect) bytes in 1 blocks are
>>> definitely lost in loss record 2,033 of 2,036
>>> ==2756==    at 0x4005943: malloc (vg_replace_malloc.c:195)
>>> ==2756==    by 0x4049387: ompi_free_list_grow (in
>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>> ==2756==    by 0x41CA613: ???
>>> ==2756==    by 0x41BDD91: ???
>>> ==2756==    by 0x41B0C3D: ???
>>> ==2756==    by 0x408AC9C: PMPI_Send (in
>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>> ==2756==    by 0x8123377: ConnectorBase::send(CollectionBase*,
>>> std::pair,
>>> std::pair >&) (ConnectorBase.cpp:39)
>>> ==2756==    by 0x8123CEE: TileConnector::sendTile() (TileConnector.cpp:36)
>>> ==2756==    by 0x80C6839: TDMaster::init(int, char**) (TDMaster.cpp:226)
>>> ==2756==    by 0x80C167B: main (TDMain.cpp:24)
>>> ==2756==
>>>
>>> At a first glimpse it looks like an OpenMPI-internal leak,
>>> because it happens iinside PMPI_Send,
>>> but then i am using the function ConnectorBase::send()
>>> several times from other callers than TileConnector,
>>> but these don't show up in valgrind's output.
>>>
>>> Does anybody have an idea what is happening here?
>>>
>>> Thank You
>>> jody
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] openmpi 1.5 build from rpm fails: --program-prefix now checked in configure

2010-10-19 Thread Jeff Squyres
Thanks for the report.  Someone reported pretty much the same issue to me 
off-list a few days ago for RHEL5.  

It looks like RHEL5 / 6 ship with Autoconf 2.63, and have a /usr/lib/rpm/macros 
that defines %configure to include options such as --program-suffix.  We 
bootstrapped Open MPI v1.5 with Autoconf 2.65, which does not understand the 
--program-suffix option.

I don't know why AC 2.65 dropped the --program-suffix option, but this seems to 
be where we are.

I've emailed a contact at Red Hat asking for advice on what to do here -- I 
can't imagine Open MPI is the only package in this situation.




On Oct 19, 2010, at 4:47 AM, livelfs wrote:

> Hi
> this is to report that building openmpi-1.5 from rpm fails on Linux
> SLES10sp3 x86_64,
> due to  --program-prefix switch use now checked in configure script
> delivered with 1.5.
> 
> rpm is version 4.4.2-43.36.1
> 
> rpmbuild --rebuild SRPMS/openmpi-1.5.0.src.rpm --define
> 'configure_options  CC="/softs/gcc/4.5.1/bin/gcc
> "  CXX="/softs/gcc/4.5.1/bin/g++
> "  F77="/softs/gcc/4.5.1/bin/gfortran
> "  FC="/softs/gcc/4.5.1/bin/gfortran "
> --prefix=/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS
>  --enable-static --enable-shared
> --with-wrapper-ldflags="-Wl,-rpath
> -Wl,/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS/lib64
> -Wl,-rpath -Wl,/softs/blcr/0.8/lib"
> --with-memory-manager=ptmalloc2
> --enable-orterun-prefix-by-default --with-openib
>  --disable-ipv6 --with-ft=cr
> --enable-ft-thread --enable-mpi-threads
> --with-blcr=/softs/blcr/0.8
> --enable-mpirun-prefix-by-default
> --with-tm=/opt/pbs/default
> --with-wrapper-libs="-lpthread -lutil -lrt"' --define '_prefix
> /opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_name
> openmpi_gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_topdir /scratch'
> --define '_unpackaged_files_terminate_build 0' --define
> 'use_default_rpm_opt_flags 0'
> 
> ends with:
> [...]
> configure: WARNING: *** This configure script does not support
> --program-prefix, --program-suffix or --program-transform-name. Users
> are recommended to instead use --prefix with a unique directory and make
> symbolic links as desired for renaming.
> configure: error: *** Cannot continue
> 
> 
> In the present environment (SLES10sp3 x86_64, rpm 4.4.2-43.36.1),
> rpmbuild --rebuild produces and execs a temporary shell script calling
> configure
> with an *empty* --program-prefix switch (--program-prefix=).
> 
> It works with openmpi 1.4.3
> but configure script from openmpi 1.5 is more picky about using
> --program-prefix, --program-suffix or --program-transform-name:
> 
> #  diff /usr/src/packages/SOURCES/openmpi-1.5/configure
> /usr/src/packages/SOURCES/openmpi-1.4.3/configure | grep program-prefix
> < # Suggestion from Paul Hargrove to disable --program-prefix and
> < { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: *** This
> configure script does not support --program-prefix, --program-suffix or
> --program-transform-name. Users are recommended to instead use --prefix
> with a unique directory and make symbolic links as desired for
> renaming." >&5
> < $as_echo "$as_me: WARNING: *** This configure script does not support
> --program-prefix, --program-suffix or --program-transform-name. Users
> are recommended to instead use --prefix with a unique directory and make
> symbolic links as desired for renaming." >&2;}
> 
> If I remove the new control on --program-prefix in openmpi-1.5 configure
> script, the 1.5 build becomes OK.
> 
> Regards,
> Stephane Rouberol
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Open MPI dynamic data structure error

2010-10-19 Thread Jack Bryan

Hi, 
I need to design a data structure to transfer data between nodes on Open MPI 
system. 
Some elements of the the structure has dynamic size. 
For example, 
typedef struct{
double data1;vector dataVec; 
} myDataType;
The size of the dataVec depends on some intermidiate computing results.
If I only declear it as the above myDataType, I think, only a pointer is 
transfered. 
When the data receiver try to access the elements of vector dataVec, it 
got segmentation fault error.
But, I also need to use the myDataType to declear other data structures. 
such as vector newDataVec;
I cannot declear myDataType in a function , sucjh as main(), I got errors: 
 main.cpp:200: error: main(int, char**)::myDataType; uses local type main(int, 
char**)::myDataType;

Any help is really appreciated. 
thanks
Jack
Oct. 19 2010
  

Re: [OMPI users] a question about [MPI]IO on systems without network filesystem

2010-10-19 Thread Richard Treumann
As Rob mentions

There are three capabilities to consider:

1) The process (or processes) that will do the I/O are members of the file 
handle's hidden communicator and the call is collective

2)) The process (or processes) that will do the I/O are members of the 
file handle's hidden communicator but the call is non-collective and made 
by a remote rank

3) The process (or processes) that will do the I/O are not members.  The 
MPI_COMM_SELF mention would probably be this second case.

Number 2 & 3 are harder but still an implementation option.  The standard 
does not require or prohibit them.


Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363




From:
Rob Latham 
To:
Open MPI Users 
List-Post: users@lists.open-mpi.org
Date:
10/19/2010 02:47 PM
Subject:
Re: [OMPI users] a question about [MPI]IO on systemswithout network 
filesystem
Sent by:
users-boun...@open-mpi.org



On Thu, Sep 30, 2010 at 09:00:31AM -0400, Richard Treumann wrote:
> It is possible for MPI-IO to be implemented in a way that lets a single 
> process or the set of process on a node act as the disk i/O agents for 
the 
> entire job but someone else will need to tell you if OpenMPI can do 
this, 
> I think OpenMPI built on the ROMIO MPI-IO implementation and based on my 

> outdated knowledge of ROMIO, I would be a bit surprised if it has his 
> option.

SURPRISE!!!  ROMIO has been able to do this since about 2002 (It was
my first ROMIO project when I came to Argonne).

now, if you do independent i/o or you do i/o on comm_self, then ROMIO
can't really do anything for you. 

But... 
- if you use collective I/O 
- and you set the "cb_config_list" to contain the machine name of the
  one node with a disk (or if everyone has a disk, pick one to be the
  master)
- and you set "romio_no_indep_rw" to "enable"

then two things will happen.  first, ROMIO will enter "deferred open"
mode, meaning only the designated I/O aggregators will open the file.
second, your collective MPI_File_*_all calls will all go through the
one node you gave in the cb_config_list.

Try it and if it does/doesn't work, I'd like to hear. 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] a question about [MPI]IO on systems without network filesystem

2010-10-19 Thread Rob Latham
On Thu, Sep 30, 2010 at 09:00:31AM -0400, Richard Treumann wrote:
> It is possible for MPI-IO to be implemented in a way that lets a single 
> process or the set of process on a node act as the disk i/O agents for the 
> entire job but someone else will need to tell you if OpenMPI can do this, 
> I think OpenMPI built on the ROMIO MPI-IO implementation and based on my 
> outdated knowledge of ROMIO, I would be a bit surprised if it has his 
> option.

SURPRISE!!!  ROMIO has been able to do this since about 2002 (It was
my first ROMIO project when I came to Argonne).

now, if you do independent i/o or you do i/o on comm_self, then ROMIO
can't really do anything for you.  

But... 
- if you use collective I/O 
- and you set the "cb_config_list" to contain the machine name of the
  one node with a disk (or if everyone has a disk, pick one to be the
  master)
- and you set "romio_no_indep_rw" to "enable"

then two things will happen.  first, ROMIO will enter "deferred open"
mode, meaning only the designated I/O aggregators will open the file.
second, your collective MPI_File_*_all calls will all go through the
one node you gave in the cb_config_list.

Try it and if it does/doesn't work, I'd like to hear.  

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


Re: [OMPI users] Number of processes and spawn

2010-10-19 Thread Ralph Castain
The fix should be there - just didn't get mentioned.

Let me know if it isn't and I'll ensure it is in the next one...but I'd be very 
surprised if it isn't already in there.


On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote:

> Hi Ralf !
> 
> I saw that the new realease 1.5 is out. 
> I didn't found this fix in the "list of changes", is it present but not 
> mentioned since is a minor fix ?
> 
> Thank you,
> Federico
> 
> 
> 
> 2010/4/1 Ralph Castain 
> Hi there!
> 
> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the fix). I 
> understand that will come out sometime soon, but no firm date has been set.
> 
> 
> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote:
> 
>> Hi Ralph,
>> 
>> 
>>  I've downloaded and tested the openmpi-1.7a1r22817 snapshot,
>> and it works fine for (multiple) spawning more than 128 processes.
>> 
>> That fix will be included in the next release of OpenMPI, right ?
>> Do you when it will be released ? Or where I can find that info ?
>> 
>> Thank you,
>>  Federico
>> 
>> 
>> 
>> 2010/3/1 Ralph Castain 
>> http://www.open-mpi.org/nightly/trunk/
>> 
>> I'm not sure this patch will solve your problem, but it is worth a try.
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Number of processes and spawn

2010-10-19 Thread Federico Golfrè Andreasi
Hi Ralf !

I saw that the new realease 1.5 is out.
I didn't found this fix in the "list of changes", is it present but not
mentioned since is a minor fix ?

Thank you,
Federico



2010/4/1 Ralph Castain 

> Hi there!
>
> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the fix).
> I understand that will come out sometime soon, but no firm date has been
> set.
>
>
> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote:
>
> Hi Ralph,
>
>
>  I've downloaded and tested the openmpi-1.7a1r22817 snapshot,
> and it works fine for (multiple) spawning more than 128 processes.
>
> That fix will be included in the next release of OpenMPI, right ?
> Do you when it will be released ? Or where I can find that info ?
>
> Thank you,
>  Federico
>
>
>
> 2010/3/1 Ralph Castain 
>
>> http://www.open-mpi.org/nightly/trunk/
>>
>> I'm not sure this patch will solve your problem, but it is worth a try.
>>
>>
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] openmpi 1.5 build from rpm fails: --program-prefix now checked in configure

2010-10-19 Thread livelfs
Hi
this is to report that building openmpi-1.5 from rpm fails on Linux
SLES10sp3 x86_64,
due to  --program-prefix switch use now checked in configure script
delivered with 1.5.

rpm is version 4.4.2-43.36.1

rpmbuild --rebuild SRPMS/openmpi-1.5.0.src.rpm --define
'configure_options  CC="/softs/gcc/4.5.1/bin/gcc
"  CXX="/softs/gcc/4.5.1/bin/g++
"  F77="/softs/gcc/4.5.1/bin/gfortran
"  FC="/softs/gcc/4.5.1/bin/gfortran "
--prefix=/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS
  --enable-static --enable-shared
--with-wrapper-ldflags="-Wl,-rpath
-Wl,/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS/lib64
-Wl,-rpath -Wl,/softs/blcr/0.8/lib"
--with-memory-manager=ptmalloc2
--enable-orterun-prefix-by-default --with-openib
  --disable-ipv6 --with-ft=cr
--enable-ft-thread --enable-mpi-threads
--with-blcr=/softs/blcr/0.8
--enable-mpirun-prefix-by-default
--with-tm=/opt/pbs/default
--with-wrapper-libs="-lpthread -lutil -lrt"' --define '_prefix
/opt/openmpi/1.5.0/gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_name
openmpi_gfortran-4.5.1-gcc-4.5.1-BLCR-PBS' --define '_topdir /scratch'
--define '_unpackaged_files_terminate_build 0' --define
'use_default_rpm_opt_flags 0'

ends with:
[...]
configure: WARNING: *** This configure script does not support
--program-prefix, --program-suffix or --program-transform-name. Users
are recommended to instead use --prefix with a unique directory and make
symbolic links as desired for renaming.
configure: error: *** Cannot continue


In the present environment (SLES10sp3 x86_64, rpm 4.4.2-43.36.1),
rpmbuild --rebuild produces and execs a temporary shell script calling
configure
with an *empty* --program-prefix switch (--program-prefix=).

It works with openmpi 1.4.3
but configure script from openmpi 1.5 is more picky about using
--program-prefix, --program-suffix or --program-transform-name:

#  diff /usr/src/packages/SOURCES/openmpi-1.5/configure
/usr/src/packages/SOURCES/openmpi-1.4.3/configure | grep program-prefix
< # Suggestion from Paul Hargrove to disable --program-prefix and
< { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: *** This
configure script does not support --program-prefix, --program-suffix or
--program-transform-name. Users are recommended to instead use --prefix
with a unique directory and make symbolic links as desired for
renaming." >&5
< $as_echo "$as_me: WARNING: *** This configure script does not support
--program-prefix, --program-suffix or --program-transform-name. Users
are recommended to instead use --prefix with a unique directory and make
symbolic links as desired for renaming." >&2;}

If I remove the new control on --program-prefix in openmpi-1.5 configure
script, the 1.5 build becomes OK.

Regards,
Stephane Rouberol