Re: [OMPI devel] shmem error msg

2011-07-25 Thread Samuel K. Gutierrez

Hi Ralph,


On Jul 25, 2011, at 11:05 AM, Ralph Castain wrote:



On Jul 25, 2011, at 10:16 AM, Samuel K. Gutierrez wrote:


Hi Ralph,

It seems as if this issue is related to a missing shm_unlink  
wrapper within Valgrind.  I'm going to disable posix by default and  
commit later today.


Is that the right solution?


No, not really.

If the problem is something in valgrind, then let's not disable  
something just for their problem. Is there a way we can wrap it  
ourselves so the error doesn't cause the message?


I think so.  They outline the procedure in  
README_MISSING_SYSCALL_OR_IOCTL, so I'll take a look.


Stay tuned,

Sam



Like I said, everything worked just fine - the message just implied  
the proc would die, and it doesn't.




Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jul 23, 2011, at 8:54 PM, Samuel K. Gutierrez wrote:


Hi Ralph,

That's mine - I'll take a look.

Thanks,

Sam

Whenever I run valgrind on orterun (or any OMPI tool), I get the  
following

error msg:

--
A system call failed during shared memory initialization that  
should

not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

Local host:  Ralph
System call: shm_unlink(2)
Error:   Function not implemented (errno 78)
--

It's coming out of open-rte/help-opal-shmem-posix.txt.

Everything continues, so I'm not sure what this is all about.  
Anyone

recognize this???

It's on the trunk, running on a Mac, vanilla configure.
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] shmem error msg

2011-07-23 Thread Samuel K. Gutierrez
Hi Ralph,

That's mine - I'll take a look.

Thanks,

Sam

> Whenever I run valgrind on orterun (or any OMPI tool), I get the following
> error msg:
>
> --
> A system call failed during shared memory initialization that should
> not have.  It is likely that your MPI job will now either abort or
> experience performance degradation.
>
>   Local host:  Ralph
>   System call: shm_unlink(2)
>   Error:   Function not implemented (errno 78)
> --
>
> It's coming out of open-rte/help-opal-shmem-posix.txt.
>
> Everything continues, so I'm not sure what this is all about. Anyone
> recognize this???
>
> It's on the trunk, running on a Mac, vanilla configure.
> Ralph
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



Re: [OMPI devel] RFC: Bring in Shared Memory Backing Facility Framework (shmem)

2011-06-21 Thread Samuel K. Gutierrez

In r24795.

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 15, 2011, at 10:01 AM, Samuel K. Gutierrez wrote:


WHAT:
Bring in new shared memory backing facility framework (shmem) and  
its components.  shmem is simply a framework for the manipulation of  
shared memory segments (create, attach, detach, unlink, etc).


WHY:
The use of shared memory is probably going to start poking up in  
other parts of Open MPI, so this simply provides the needed  
infrastructure to facilitate that work.


WHERE:

See: https://bitbucket.org/samuelkgutierrez/orte_shmem

Additions:
opal/mca/shmem

Other Modifications:
M   opal/runtime/opal_init.c
M   opal/runtime/opal_params.c
M   opal/runtime/opal_finalize.c
M   ompi/tools/ompi_info/ompi_info.c
M   ompi/tools/ompi_info/components.c
M   ompi/mca/btl/sm/btl_sm_component.c
M   ompi/mca/mpool/sm/mpool_sm_module.c
!   ompi/mca/common/sm/common_sm_mmap.c
M   ompi/mca/common/sm/common_sm_rml.c
!   ompi/mca/common/sm/common_sm_windows.c
!   ompi/mca/common/sm/common_sm_mmap.h
M   ompi/mca/common/sm/common_sm_rml.h
!   ompi/mca/common/sm/common_sm_windows.h
!   ompi/mca/common/sm/common_sm_posix.c
!   ompi/mca/common/sm/common_sm_sysv.c
M   ompi/mca/common/sm/help-mpi-common-sm.txt
!   ompi/mca/common/sm/common_sm_posix.h
M   ompi/mca/common/sm/configure.m4
!   ompi/mca/common/sm/common_sm_sysv.h
M   ompi/mca/common/sm/common_sm.c
M   ompi/mca/common/sm/Makefile.am
M   ompi/mca/common/sm/common_sm.h
M   ompi/mca/coll/sm/coll_sm_component.c
M   ompi/mca/coll/sm/coll_sm_module.c
M   orte/mca/odls/base/odls_base_default_fns.c
M   orte/tools/orte-info/orte-info.c
M   orte/tools/orte-info/components.c

WHEN:
Before 1.7.

TIMEOUT:
Teleconference, Tues 21 June 2011

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RFC: Bring in Shared Memory Backing Facility Framework (shmem)

2011-06-15 Thread Samuel K. Gutierrez

WHAT:
Bring in new shared memory backing facility framework (shmem) and its  
components.  shmem is simply a framework for the manipulation of  
shared memory segments (create, attach, detach, unlink, etc).


WHY:
The use of shared memory is probably going to start poking up in other  
parts of Open MPI, so this simply provides the needed infrastructure  
to facilitate that work.


WHERE:

See: https://bitbucket.org/samuelkgutierrez/orte_shmem

Additions:
opal/mca/shmem

Other Modifications:
M   opal/runtime/opal_init.c
M   opal/runtime/opal_params.c
M   opal/runtime/opal_finalize.c
M   ompi/tools/ompi_info/ompi_info.c
M   ompi/tools/ompi_info/components.c
M   ompi/mca/btl/sm/btl_sm_component.c
M   ompi/mca/mpool/sm/mpool_sm_module.c
!   ompi/mca/common/sm/common_sm_mmap.c
M   ompi/mca/common/sm/common_sm_rml.c
!   ompi/mca/common/sm/common_sm_windows.c
!   ompi/mca/common/sm/common_sm_mmap.h
M   ompi/mca/common/sm/common_sm_rml.h
!   ompi/mca/common/sm/common_sm_windows.h
!   ompi/mca/common/sm/common_sm_posix.c
!   ompi/mca/common/sm/common_sm_sysv.c
M   ompi/mca/common/sm/help-mpi-common-sm.txt
!   ompi/mca/common/sm/common_sm_posix.h
M   ompi/mca/common/sm/configure.m4
!   ompi/mca/common/sm/common_sm_sysv.h
M   ompi/mca/common/sm/common_sm.c
M   ompi/mca/common/sm/Makefile.am
M   ompi/mca/common/sm/common_sm.h
M   ompi/mca/coll/sm/coll_sm_component.c
M   ompi/mca/coll/sm/coll_sm_module.c
M   orte/mca/odls/base/odls_base_default_fns.c
M   orte/tools/orte-info/orte-info.c
M   orte/tools/orte-info/components.c

WHEN:
Before 1.7.

TIMEOUT:
Teleconference, Tues 21 June 2011

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory







Re: [OMPI devel] 1.4.4rc2 is up

2011-05-18 Thread Samuel K. Gutierrez
Here is the 'pgCC -V' output from versions that I have access to.

$ pgCC -V

pgCC 7.1-6 64-bit target on x86-64 Linux -tp gh-64 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2007, STMicroelectronics, Inc.  All Rights Reserved.


$ pgCC -V

pgCC 9.0-3 64-bit target on x86-64 Linux -tp gh-64 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2009, STMicroelectronics, Inc.  All Rights Reserved.


$ pgCC -V

pgCC 10.3-0 64-bit target on x86-64 Linux -tp istanbul-64 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc.  All Rights Reserved.

--
Samuel Gutierrez
Los Alamos National Laboratory


On May 18, 2011, at 12:34 PM, Paul H. Hargrove wrote:

> Below is a sampling of "pgCC -V" outputs in response to Jeff's question.
> The complete output looks like:
> 
> $ pgCC -V
> 
> pgCC 11.1-0 64-bit target on x86-64 Linux -tp nehalem
> Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
> Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.
> 
> Including the initial blank line.
> 
> Here is the "important" line for a range of versions I can currently access:
> 
> pgCC 7.2-5 64-bit target on x86-64 Linux -tp gh-64
> pgCC 8.0-6 64-bit target on x86-64 Linux -tp gh-64
> pgCC 9.0-3 64-bit target on x86-64 Linux -tp nehalem-64
> pgCC 10.8-0 64-bit target on x86-64 Linux -tp nehalem-64
> pgCC 11.1-0 64-bit target on x86-64 Linux -tp nehalem
> 
> I am afraid my system w/ 5.x and 6.x versions was retired last month (not 
> joking).
> However, I found the following output for the C (not C++) compiler in my bug 
> database:
> 
> pgcc 6.0-8 32-bit target on x86-64 Linux
> 
> And for their MacOSX port, there is a wrinkle.  As anybody who as dealt w/ 
> mpicc vs mpiCC knows, Apple's filesystem  is case PRESERVING but 
> case-insensitive.  So, there PGI's C++ compiler is "pgcpp" and the -V output 
> (also from my bug database) looks like:
> 
> pgcpp 7.1-5 64-bit target on Apple OS/X
> 
> 
> -Paul
> 
> 
> On 5/18/2011 5:50 AM, Jeff Squyres wrote:
>> (addinglibtool-patc...@gnu.org)
>> 
>> Is this guaranteed to work for all versions of the PGI compiler?  I.e., does 
>> "pgCC -V" always return something in the form of (digit)+\. ?
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Samuel K. Gutierrez
Hi Tim,

Great news!  Happy calculating :-).

--
Samuel K. Gutierrez
Los Alamos National Laboratory

> Dear Samuel,
>
> Just as you replied I was trying that on the compute nodes. Surprise,
> surprise...the value returned as the hard and soft limits is 1024.
>
> Thanks for confirming my suspicions...
>
> Regards,
>
> Tim.
>
> On Mar 30, 2011, at 7:41 PM, Samuel K. Gutierrez wrote:
>
> Hi,
>
> It sounds like Open MPI is hitting your system's open file descriptor
> limit.  If that's the case, one potential workaround is to have your
> system administrator raise file descriptor limits.
>
> On a compute node, what does "ulimit -a" show (using bash)?
>
> Hope that helps,
>
> --
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>
> On Mar 30, 2011, at 5:22 PM, Timothy Stitt wrote:
>
> Dear OpenMPI developers,
>
> One of our users was running a benchmark on a 1032 core simulation. He had
> a successful run at 900 cores but when he stepped up to 1032 cores the job
> just stalled and his logs contained many occurrences of the following
> line:
>
> [d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c:885:mca_btl_tcp_component_accept_handler]
> accept() failed: Too many open files (24)
>
> The simulation has a single master task that communicates with all the
> other tasks to write out some I/O via the master. We are assuming the
> message is related to this bottleneck. Is there a 1024 limit on the number
> of open files/connections for instance?
>
> Can anyone confirm the meaning of this error and secondly provide a
> resolution that hopefully doesn't involve a code rewrite.
>
> Thanks in advance,
>
> Tim.
>
> Tim Stitt PhD (User Support Manager).
> Center for Research Computing | University of Notre Dame |
> P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email:
> tst...@nd.edu<mailto:tst...@nd.edu>
>
> ___
> devel mailing list
> de...@open-mpi.org<mailto:de...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> 
>
> Tim Stitt PhD (User Support Manager).
> Center for Research Computing | University of Notre Dame |
> P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email:
> tst...@nd.edu<mailto:tst...@nd.edu>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Samuel K. Gutierrez

Hi,

It sounds like Open MPI is hitting your system's open file descriptor  
limit.  If that's the case, one potential workaround is to have your  
system administrator raise file descriptor limits.


On a compute node, what does "ulimit -a" show (using bash)?

Hope that helps,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Mar 30, 2011, at 5:22 PM, Timothy Stitt wrote:


Dear OpenMPI developers,

One of our users was running a benchmark on a 1032 core simulation.  
He had a successful run at 900 cores but when he stepped up to 1032  
cores the job just stalled and his logs contained many occurrences  
of the following line:


[d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c: 
885:mca_btl_tcp_component_accept_handler] accept() failed: Too many  
open files (24)


The simulation has a single master task that communicates with all  
the other tasks to write out some I/O via the master. We are  
assuming the message is related to this bottleneck. Is there a 1024  
limit on the number of open files/connections for instance?


Can anyone confirm the meaning of this error and secondly provide a  
resolution that hopefully doesn't involve a code rewrite.


Thanks in advance,

Tim.

Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: tst...@nd.edu

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Threading

2010-10-12 Thread Samuel K. Gutierrez
Same here.

--
Samuel K. Gutierrez
Los Alamos National Laboratory


> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote:
>
>> Does anyone know of a reason why mpirun can -not- be threaded, assuming
>> that all threads block and do not continuously chew cpu? Is there an
>> environment where this would cause a problem?
>
> We don't have any machines at Sandia where I could see this being a
> problem.
>
> Brian
>
> --
>   Brian W. Barrett
>   Dept. 1423: Scalable System Software
>   Sandia National Laboratories
>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Question regarding recently common shared-memory component

2010-09-21 Thread Samuel K. Gutierrez

Hi,

Just to be clear - do you see similar checkpoint performance  
differences in 1.5rc6 and 1.4.2 with and without shared memory enabled?


Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Sep 21, 2010, at 9:35 AM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com 
> wrote:



Hello Samuel
This problem seems to be resolved after I moved to r23781. However,  
I see another discrepancy in checkpoint image creation time when I  
disable shared memory (--mca btl self,tcp,openib) vs using it. I  
mean the time to create checkpoint image for this simple program is  
about 0.4 seconds if I disable shared memory while it is close to  
6.5 seconds when I use shared memory component. I have not seen this  
behavior earlier. Do I have to tune any other parameter to reduce  
the time?

Thanks
Ananda
Hi Ananda,

This issue should be resolved in r23781. Please let me know if it is
not.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Sep 20, 2010, at 11:26 AM, <ananda.mudar_at_[hidden]>  
<ananda.mudar_at_[hidden]

 > wrote:
> I have used following options to build:
> ./configure CC=/usr/bin/gcc CXX=/usr/bin/c++ F77=/usr/bin/gfortran
> FC=/usr/bin/gfortran --prefix /users/amudar/openmpi-1.7 --with-tm=/
> usr/local/pbs --with-openib --with-threads=posix --enable-mpi- 
thread-
> multiple --enable-ft-thread --enable-debug --with-ft=cr --with- 
blcr=/

> usr/blcr --with-blcr-libdir=/usr/blcr/lib
>
> Alsop please note that this is with r23756 build.
>
> Let me know if you need any other information.
>
> Thanks
> Ananda
> Let me take a look at it. How did you configure your build?
> Thanks,
>
> --
> Samuel K. Gutierrez
> Los Alamos National Laboratory
> On Sep 20, 2010, at 10:14 AM, <ananda.mudar_at_[hidden]>
> <ananda.mudar_at_[hidden]
>  > wrote:
> > Hi
> >
> > I believe the new common shared memory component was committed to
> > the trunk sometime towards the later part of August. I had not  
tried
> > this trunk version until last week and I have seen some  
discrepancy

> > with this component specifically related to checkpoint
> > functionality. I am not able to checkpoint any program with the
> > latest trunk version. Am I missing something here? Should I be  
using

> > any other options to enable checkpoint functionality for shared
> > memory component?
> >
> > However if I disable shared memory component and use only self,  
tcp,

> > and openib (--mca btl self,tcp,openib), I can checkpoint
> > successfully!!
> >
> > Following are the options I have used with mpirun:
> >
> > mpirun -am ft-enable-cr --mca opal_cr_enable_timer 1 --mca
> > sstore_stage_global_is_shared 1 --mca
> > sstore_base_global_snapshot_dir /scratch/hpl005/UIT_test/amudar/ 
FWI

> > --mca mpi_paffinity_alone 1  -np 32 -hostfile hostfile-32 ../
> hellompi
> >
> > Please note that hellompi is a very simple program without any
> > collective calls. When I issue checkpoint, this program fails with
> > the following messages:
> >
> > hplcnlj158:13937] Signal: Segmentation fault (11)
> > [hplcnlj158:13937] Signal code: Address not mapped (1)
> > [hplcnlj158:13937] Failing at address: 0x2aaa0001
> > [hplcnlj158:13937] [ 0] /lib64/libpthread.so.0 [0x2b4019a064c0]
> > [hplcnlj158:13937] [ 1] /users/amudar/openmpi-1.7/lib/
> > libmca_common_sm.so.0(mca_common_sm_param_register+0x262)
> > [0x2d96628a]
> > [hplcnlj158:13937] [ 2] /users/amudar/openmpi-1.7/lib/openmpi/
> > mca_btl_sm.so [0x2f0a55e8]
> > [hplcnlj158:13937] [ 3] /users/amudar/openmpi-1.7/lib/libmpi.so.0
> > [0x2b4018c3c11b]
> > [hplcnlj158:13937] [ 4] /users/amudar/openmpi-1.7/lib/libmpi.so.
> > 0(mca_base_components_open+0x3ef) [0x2b4018c3b70b]
> > [hplcnlj158:13937] [ 5] /users/amudar/openmpi-1.7/lib/libmpi.so.
> > 0(mca_btl_base_open+0xfd) [0x2b4018b620fe]
> > [hplcnlj158:13937] [ 6] /users/amudar/openmpi-1.7/lib/openmpi/
> > mca_bml_r2.so [0x2dd9e4fb]
> > [hplcnlj158:13937] [ 7] /users/amudar/openmpi-1.7/lib/openmpi/
> > mca_pml_ob1.so [0x2e5fa429]
> > [hplcnlj158:13937] [ 8] /users/amudar/openmpi-1.7/lib/openmpi/
> > mca_pml_crcpw.so [0x2dfadce6]
> > [hplcnlj158:13937] [ 9] /users/amudar/openmpi-1.7/lib/libmpi.so.0
> > [0x2b4018b01a0d]
> > [hplcnlj158:13937] [10] /users/amudar/openmpi-1.7/lib/libmpi.so.
> > 0(ompi_cr_coord+0xc0) [0x2b4018b017ba]
> > [hplcnlj158:13937] [11] /users/amudar/openmpi-1.7/lib/libmpi.so.
> > 0(opal_cr_inc_core_recover+0xed) [0x2b4018c0efab]
> > [hplcnlj158:13937] [12] /users/amudar/openmpi-1.7/lib/openmpi/
> > mca_snapc_full.so [0x2bd280fc]
> > [hplcnlj158:13937

Re: [OMPI devel] Question regarding recently common shared-memory component

2010-09-20 Thread Samuel K. Gutierrez

Hi Ananda,

This issue should be resolved in r23781.  Please let me know if it is  
not.


Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Sep 20, 2010, at 11:26 AM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com 
> wrote:



I have used following options to build:
./configure CC=/usr/bin/gcc CXX=/usr/bin/c++ F77=/usr/bin/gfortran  
FC=/usr/bin/gfortran --prefix /users/amudar/openmpi-1.7 --with-tm=/ 
usr/local/pbs --with-openib --with-threads=posix --enable-mpi-thread- 
multiple --enable-ft-thread --enable-debug --with-ft=cr --with-blcr=/ 
usr/blcr --with-blcr-libdir=/usr/blcr/lib


Alsop please note that this is with r23756 build.

Let me know if you need any other information.

Thanks
Ananda
Let me take a look at it. How did you configure your build?
Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Sep 20, 2010, at 10:14 AM, <ananda.mudar_at_[hidden]>  
<ananda.mudar_at_[hidden]

 > wrote:
> Hi
>
> I believe the new common shared memory component was committed to
> the trunk sometime towards the later part of August. I had not tried
> this trunk version until last week and I have seen some discrepancy
> with this component specifically related to checkpoint
> functionality. I am not able to checkpoint any program with the
> latest trunk version. Am I missing something here? Should I be using
> any other options to enable checkpoint functionality for shared
> memory component?
>
> However if I disable shared memory component and use only self, tcp,
> and openib (--mca btl self,tcp,openib), I can checkpoint
> successfully!!
>
> Following are the options I have used with mpirun:
>
> mpirun -am ft-enable-cr --mca opal_cr_enable_timer 1 --mca
> sstore_stage_global_is_shared 1 --mca
> sstore_base_global_snapshot_dir /scratch/hpl005/UIT_test/amudar/FWI
> --mca mpi_paffinity_alone 1  -np 32 -hostfile hostfile-32 ../ 
hellompi

>
> Please note that hellompi is a very simple program without any
> collective calls. When I issue checkpoint, this program fails with
> the following messages:
>
> hplcnlj158:13937] Signal: Segmentation fault (11)
> [hplcnlj158:13937] Signal code: Address not mapped (1)
> [hplcnlj158:13937] Failing at address: 0x2aaa0001
> [hplcnlj158:13937] [ 0] /lib64/libpthread.so.0 [0x2b4019a064c0]
> [hplcnlj158:13937] [ 1] /users/amudar/openmpi-1.7/lib/
> libmca_common_sm.so.0(mca_common_sm_param_register+0x262)
> [0x2d96628a]
> [hplcnlj158:13937] [ 2] /users/amudar/openmpi-1.7/lib/openmpi/
> mca_btl_sm.so [0x2f0a55e8]
> [hplcnlj158:13937] [ 3] /users/amudar/openmpi-1.7/lib/libmpi.so.0
> [0x2b4018c3c11b]
> [hplcnlj158:13937] [ 4] /users/amudar/openmpi-1.7/lib/libmpi.so.
> 0(mca_base_components_open+0x3ef) [0x2b4018c3b70b]
> [hplcnlj158:13937] [ 5] /users/amudar/openmpi-1.7/lib/libmpi.so.
> 0(mca_btl_base_open+0xfd) [0x2b4018b620fe]
> [hplcnlj158:13937] [ 6] /users/amudar/openmpi-1.7/lib/openmpi/
> mca_bml_r2.so [0x2dd9e4fb]
> [hplcnlj158:13937] [ 7] /users/amudar/openmpi-1.7/lib/openmpi/
> mca_pml_ob1.so [0x2e5fa429]
> [hplcnlj158:13937] [ 8] /users/amudar/openmpi-1.7/lib/openmpi/
> mca_pml_crcpw.so [0x2dfadce6]
> [hplcnlj158:13937] [ 9] /users/amudar/openmpi-1.7/lib/libmpi.so.0
> [0x2b4018b01a0d]
> [hplcnlj158:13937] [10] /users/amudar/openmpi-1.7/lib/libmpi.so.
> 0(ompi_cr_coord+0xc0) [0x2b4018b017ba]
> [hplcnlj158:13937] [11] /users/amudar/openmpi-1.7/lib/libmpi.so.
> 0(opal_cr_inc_core_recover+0xed) [0x2b4018c0efab]
> [hplcnlj158:13937] [12] /users/amudar/openmpi-1.7/lib/openmpi/
> mca_snapc_full.so [0x2bd280fc]
> [hplcnlj158:13937] [13] /users/amudar/openmpi-1.7/lib/libmpi.so.
> 0(opal_cr_test_if_checkpoint_ready+0x11b) [0x2b4018c0ecd3]
> [hplcnlj158:13937] [14] /users/amudar/openmpi-1.7/lib/libmpi.so.0
> [0x2b4018c0f6e7]
> [hplcnlj158:13937] [15] /lib64/libpthread.so.0 [0x2b40199fe367]
> [hplcnlj158:13937] [16] /lib64/libc.so.6(clone+0x6d)  
[0x2b4019ce5f7d]

> [hplcnlj158:13937] *** End of error message ***
> [hplcnlj161:00637] *** Process received signal ***
> [hplcnlj161:00637] Signal: Segmentation fault (11)
> [hplcnlj161:00637] Signal code: Address not mapped (1)
> [hplcnlj161:00637] Failing at address: 0x2aaa0001
> [hplcnlj161:00649] *** Process received signal ***
> [hplcnlj161:00649] Signal: Segmentation fault (11)
> [hplcnlj161:00649] Signal code: Address not mapped (1)
> [hplcnlj161:00649] Failing at address: 0x2aaa0001
> /users/amudar/Fix_for_pidinuse/cr_restart: line 5: 14012
> Segmentation fault  /usr/blcr/bin/cr_restart --no-restore-pid  
"$@"

> [hplcnlj161:00643] *** Process received signal ***
> [hplcnlj161:00643] Signal: Segmentation fault (11)
> [hplcnlj161:00643] Signal code: Address not mapped (1)
> [hplcnlj161:00643] Failing at 

Re: [OMPI devel] Question regarding recently common shared-memory component

2010-09-20 Thread Samuel K. Gutierrez

Let me take a look at it.  How did you configure your build?

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Sep 20, 2010, at 10:14 AM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com 
> wrote:



Hi

I believe the new common shared memory component was committed to  
the trunk sometime towards the later part of August. I had not tried  
this trunk version until last week and I have seen some discrepancy  
with this component specifically related to checkpoint  
functionality. I am not able to checkpoint any program with the  
latest trunk version. Am I missing something here? Should I be using  
any other options to enable checkpoint functionality for shared  
memory component?


However if I disable shared memory component and use only self, tcp,  
and openib (--mca btl self,tcp,openib), I can checkpoint  
successfully!!


Following are the options I have used with mpirun:

mpirun -am ft-enable-cr --mca opal_cr_enable_timer 1 --mca  
sstore_stage_global_is_shared 1 --mca  
sstore_base_global_snapshot_dir /scratch/hpl005/UIT_test/amudar/FWI  
--mca mpi_paffinity_alone 1  -np 32 -hostfile hostfile-32 ../hellompi


Please note that hellompi is a very simple program without any  
collective calls. When I issue checkpoint, this program fails with  
the following messages:


hplcnlj158:13937] Signal: Segmentation fault (11)
[hplcnlj158:13937] Signal code: Address not mapped (1)
[hplcnlj158:13937] Failing at address: 0x2aaa0001
[hplcnlj158:13937] [ 0] /lib64/libpthread.so.0 [0x2b4019a064c0]
[hplcnlj158:13937] [ 1] /users/amudar/openmpi-1.7/lib/ 
libmca_common_sm.so.0(mca_common_sm_param_register+0x262)  
[0x2d96628a]
[hplcnlj158:13937] [ 2] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_btl_sm.so [0x2f0a55e8]
[hplcnlj158:13937] [ 3] /users/amudar/openmpi-1.7/lib/libmpi.so.0  
[0x2b4018c3c11b]
[hplcnlj158:13937] [ 4] /users/amudar/openmpi-1.7/lib/libmpi.so. 
0(mca_base_components_open+0x3ef) [0x2b4018c3b70b]
[hplcnlj158:13937] [ 5] /users/amudar/openmpi-1.7/lib/libmpi.so. 
0(mca_btl_base_open+0xfd) [0x2b4018b620fe]
[hplcnlj158:13937] [ 6] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_bml_r2.so [0x2dd9e4fb]
[hplcnlj158:13937] [ 7] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_pml_ob1.so [0x2e5fa429]
[hplcnlj158:13937] [ 8] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_pml_crcpw.so [0x2dfadce6]
[hplcnlj158:13937] [ 9] /users/amudar/openmpi-1.7/lib/libmpi.so.0  
[0x2b4018b01a0d]
[hplcnlj158:13937] [10] /users/amudar/openmpi-1.7/lib/libmpi.so. 
0(ompi_cr_coord+0xc0) [0x2b4018b017ba]
[hplcnlj158:13937] [11] /users/amudar/openmpi-1.7/lib/libmpi.so. 
0(opal_cr_inc_core_recover+0xed) [0x2b4018c0efab]
[hplcnlj158:13937] [12] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_snapc_full.so [0x2bd280fc]
[hplcnlj158:13937] [13] /users/amudar/openmpi-1.7/lib/libmpi.so. 
0(opal_cr_test_if_checkpoint_ready+0x11b) [0x2b4018c0ecd3]
[hplcnlj158:13937] [14] /users/amudar/openmpi-1.7/lib/libmpi.so.0  
[0x2b4018c0f6e7]

[hplcnlj158:13937] [15] /lib64/libpthread.so.0 [0x2b40199fe367]
[hplcnlj158:13937] [16] /lib64/libc.so.6(clone+0x6d) [0x2b4019ce5f7d]
[hplcnlj158:13937] *** End of error message ***
[hplcnlj161:00637] *** Process received signal ***
[hplcnlj161:00637] Signal: Segmentation fault (11)
[hplcnlj161:00637] Signal code: Address not mapped (1)
[hplcnlj161:00637] Failing at address: 0x2aaa0001
[hplcnlj161:00649] *** Process received signal ***
[hplcnlj161:00649] Signal: Segmentation fault (11)
[hplcnlj161:00649] Signal code: Address not mapped (1)
[hplcnlj161:00649] Failing at address: 0x2aaa0001
/users/amudar/Fix_for_pidinuse/cr_restart: line 5: 14012  
Segmentation fault  /usr/blcr/bin/cr_restart --no-restore-pid "$@"

[hplcnlj161:00643] *** Process received signal ***
[hplcnlj161:00643] Signal: Segmentation fault (11)
[hplcnlj161:00643] Signal code: Address not mapped (1)
[hplcnlj161:00643] Failing at address: 0x2aaa0001
[hplcnlj161:00640] *** Process received signal ***
[hplcnlj161:00640] Signal: Segmentation fault (11)
[hplcnlj161:00640] Signal code: Address not mapped (1)
[hplcnlj161:00640] Failing at address: 0x2aaa0001
[hplcnlj161:00636] *** Process received signal ***
[hplcnlj161:00652] *** Process received signal ***
[hplcnlj161:00652] Signal: Segmentation fault (11)
[hplcnlj161:00652] Signal code: Address not mapped (1)
[hplcnlj161:00652] Failing at address: 0x2aaa0001
[hplcnlj161:00636] Signal: Segmentation fault (11)
[hplcnlj161:00636] Signal code: Address not mapped (1)
[hplcnlj161:00636] Failing at address: 0x2aaa0001
[hplcnlj161:00637] [ 0] /lib64/libpthread.so.0 [0x2b86c74694c0]
[hplcnlj161:00637] [ 1] /users/amudar/openmpi-1.7/lib/ 
libmca_common_sm.so.0(mca_common_sm_param_register+0x262)  
[0x2d96628a]
[hplcnlj161:00637] [ 2] /users/amudar/openmpi-1.7/lib/openmpi/ 
mca_btl_sm.so [0x2f0a55e8]
[hplcnlj161:00637] [ 3] /users/amudar/openmpi-1.7/lib/libmpi.so.0  
[0x2b86c669f11b]
[hplcnlj161:00637] [ 4] /users/

Re: [OMPI devel] common_sm_mmap.c: wrong args to orte_show_help() (1.5rc5 and 1.4.3rc1)

2010-08-26 Thread Samuel K. Gutierrez

Will do.

Sam

On Aug 26, 2010, at 2:08 PM, Jeff Squyres wrote:


I think Sam already submitted CMR's for 1.5:

 https://svn.open-mpi.org/trac/ompi/ticket/2545

Sam -- can you construct an equivalent for v1.4 and CC Paul so that  
he knows not to follow up on it?


Thanks!


On Aug 26, 2010, at 3:56 PM, Paul H. Hargrove wrote:

The warnings below have appeared in some of my other testing  
results.  However, I now know what they correspond to.


In both 1.5rc5 and 1.4.3rc1 there are two calls to orte_show_help()  
that are passing orte_process_info.nodename as the third argument,  
where a _Bool is expected.  It looks to me as if the third argument  
is actually just missing from these 2 calls.


-Paul

For 1.4.3rc1:

"../../../../../ompi/mca/common/sm/common_sm_mmap.c", line 111.41:  
1506-280 (W) Function argument assignment between types "_Bool" and  
"char*" is not allowed.
"../../../../../ompi/mca/common/sm/common_sm_mmap.c", line 136.45:  
1506-280 (W) Function argument assignment between types "_Bool" and  
"char*" is not allowed.


For 1.5rc5:

"../../../../../ompi/mca/common/sm/common_sm_mmap.c", line 110.41:  
1506-280 (W) Function argument assignment between types "_Bool" and  
"char*" is not allowed.
"../../../../../ompi/mca/common/sm/common_sm_mmap.c", line 135.45:  
1506-280 (W) Function argument assignment between types "_Bool" and  
"char*" is not allowed.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-12 Thread Samuel K. Gutierrez
Sorry, I should have included the link containing the discussion of  
the plot.


http://www.open-mpi.org/community/lists/devel/2010/06/8078.php

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Aug 12, 2010, at 11:20 AM, Terry Dontje wrote:

Sorry Rich, I didn't realize there was a graph attached at the end  
of message.  In other words my comments are not applicable because I  
really didn't know you were asking about the graph.  I agree it  
would be nice to know what the graph was plotting.


--td
Terry Dontje wrote:


Graham, Richard L. wrote:


Stupid question:
   What is being plotted, and what are the units ?

Rich

MB of Resident and Shared memory as gotten from top (on linux).   
The values for each of the processes run cases seem to be the same  
between posix, mmap and sysv.


--td

On 8/11/10 3:15 PM, "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:

Hi Terry,










On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote:


 I've done some minor testing on Linux looking at resident and  
shared memory sizes for np=4, 8 and 16 jobs.  I could not see any  
appreciable differences in sizes in the process between sysv,  
posix or mmap usage in the SM btl.


 So I am still somewhat non-plussed about making this the  
default.  It seems like the biggest gain out of using posix might  
be one doesn't need to worry about the location of the backing  
file.  This seems kind of frivolous to me since there is a warning  
that happens if the backing file is not memory based.


If I'm not mistaken, the warning is only issued if the backing  
files is stored on the following file systems: Lustre, NFS,  
Panasas, and GPFS  (see: opal_path_nfs in opal/util/path.c).   
Based on the performance numbers that Sylvain provided on June 9th  
of this year (see attached),  there was a performance difference  
between mmap on /tmp and mmap on a tmpfs-like file system (/dev/ 
shm in that particular case).  Using the new POSIX component  
provides us with a portable way to provide similar shared memory  
performance gains without having to worry about where the OMPI  
session directory is rooted.


--
Samuel K. Gutierrez
Los Alamos National Laboratory

[cid:3364459484_11867134]


 I still working on testing the code on Solaris but I don't  
imagine I will see anything that will change my mind.


 --td

 Samuel K. Gutierrez wrote:
Hi Rich,

 It's a modification to the existing common sm component.  The  
modifications do include the addition of a new POSIX shared memory  
facility, however.


 Sam

 On Aug 11, 2010, at 10:05 AM, Graham, Richard L. wrote:


Is this a modification of the existing component, or a new  
component ?


 Rich


 On 8/10/10 10:52 AM, "Samuel K. Gutierrez" <sam...@lanl.gov> <mailto:sam...@lanl.gov 
>  wrote:


 Hi,

 I wanted to give everyone a heads-up about a new POSIX shared  
memory

 component
 that has been in the works for a while now and is ready to be  
pushed

 into the
 trunk.

 http://bitbucket.org/samuelkgutierrez/ompi_posix_sm_new

 Some highlights:
 o New posix component now the new default.
o May address some of the shared memory performance issues  
users

 encounter
   when OMPI's session directories are inadvertently  
placed on a non-

 local
   filesystem.
 o Silent component failover.
o In the default case, if the posix component fails  
initialization,

   mmap will be selected.
 o The sysv component will only be queried for selection if it is
 placed before
the mmap component (for example, -mca mpi_common_sm
 sysv,posix,mmap).  In the
default case, sysv will never be queried/selected.
 o Per some on-list discussion, now unlinking mmaped file in both  
mmap

 and posix
components (see: "System V Shared Memory for Open MPI: Request  
for

 Community
Input and Testing" thread).
 o  Assuming local process homogeneity with respect to all utilized
 shared
 memory facilities. That is, if one local process deems a
 particular shared
 memory facility acceptable, then ALL local processes should be
 able to
 utilize that facility. As it stands, this is an important point
 because one
 process dictates to all other local processes which common sm
 component will
 be selected based on its own, local run-time test.
 o Addressed some of George's code reuse concerns.

 If there are no major objections by August 17th, I'll commit the  
code

 after the
 Tuesday morning conference call.

 Thanks!

 --
 Samuel K. Gutierrez
 Los Alamos National Laboratory





 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel


 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel


 ___
 devel mailing list
 de...@open-mpi.org
 ht

Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-11 Thread Samuel K. Gutierrez

Hi Terry,

One more thing...  Before testing on Solaris 10, could you please  
update (I just committed a Solaris 10 fix).


Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Aug 11, 2010, at 1:15 PM, Samuel K. Gutierrez wrote:


Hi Terry,








On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote:

I've done some minor testing on Linux looking at resident and  
shared memory sizes for np=4, 8 and 16 jobs.  I could not see any  
appreciable differences in sizes in the process between sysv, posix  
or mmap usage in the SM btl.


So I am still somewhat non-plussed about making this the default.   
It seems like the biggest gain out of using posix might be one  
doesn't need to worry about the location of the backing file.  This  
seems kind of frivolous to me since there is a warning that happens  
if the backing file is not memory based.


If I'm not mistaken, the warning is only issued if the backing files  
is stored on the following file systems: Lustre, NFS, Panasas, and  
GPFS  (see: opal_path_nfs in opal/util/path.c).  Based on the  
performance numbers that Sylvain provided on June 9th of this year  
(see attached),  there was a performance difference between mmap on / 
tmp and mmap on a tmpfs-like file system (/dev/shm in that  
particular case).  Using the new POSIX component provides us with a  
portable way to provide similar shared memory performance gains  
without having to worry about where the OMPI session directory is  
rooted.


--
Samuel K. Gutierrez
Los Alamos National Laboratory





I still working on testing the code on Solaris but I don't imagine  
I will see anything that will change my mind.


--td

Samuel K. Gutierrez wrote:


Hi Rich,

It's a modification to the existing common sm component.  The  
modifications do include the addition of a new POSIX shared memory  
facility, however.


Sam

On Aug 11, 2010, at 10:05 AM, Graham, Richard L. wrote:

Is this a modification of the existing component, or a new  
component ?


Rich


On 8/10/10 10:52 AM, "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:

Hi,

I wanted to give everyone a heads-up about a new POSIX shared  
memory

component
that has been in the works for a while now and is ready to be  
pushed

into the
trunk.

http://bitbucket.org/samuelkgutierrez/ompi_posix_sm_new

Some highlights:
o New posix component now the new default.
   o May address some of the shared memory performance issues  
users

encounter
  when OMPI's session directories are inadvertently  
placed on a non-

local
  filesystem.
o Silent component failover.
   o In the default case, if the posix component fails  
initialization,

  mmap will be selected.
o The sysv component will only be queried for selection if it is
placed before
   the mmap component (for example, -mca mpi_common_sm
sysv,posix,mmap).  In the
   default case, sysv will never be queried/selected.
o Per some on-list discussion, now unlinking mmaped file in both  
mmap

and posix
   components (see: "System V Shared Memory for Open MPI: Request  
for

Community
   Input and Testing" thread).
o  Assuming local process homogeneity with respect to all utilized
shared
memory facilities. That is, if one local process deems a
particular shared
memory facility acceptable, then ALL local processes should be
able to
utilize that facility. As it stands, this is an important point
because one
process dictates to all other local processes which common sm
component will
be selected based on its own, local run-time test.
o Addressed some of George's code reuse concerns.

If there are no major objections by August 17th, I'll commit the  
code

after the
Tuesday morning conference call.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-11 Thread Samuel K. Gutierrez

Hi Terry,








On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote:

I've done some minor testing on Linux looking at resident and shared  
memory sizes for np=4, 8 and 16 jobs.  I could not see any  
appreciable differences in sizes in the process between sysv, posix  
or mmap usage in the SM btl.


So I am still somewhat non-plussed about making this the default.   
It seems like the biggest gain out of using posix might be one  
doesn't need to worry about the location of the backing file.  This  
seems kind of frivolous to me since there is a warning that happens  
if the backing file is not memory based.


If I'm not mistaken, the warning is only issued if the backing files  
is stored on the following file systems: Lustre, NFS, Panasas, and  
GPFS  (see: opal_path_nfs in opal/util/path.c).  Based on the  
performance numbers that Sylvain provided on June 9th of this year  
(see attached),  there was a performance difference between mmap on / 
tmp and mmap on a tmpfs-like file system (/dev/shm in that particular  
case).  Using the new POSIX component provides us with a portable way  
to provide similar shared memory performance gains without having to  
worry about where the OMPI session directory is rooted.


--
Samuel K. Gutierrez
Los Alamos National Laboratory





I still working on testing the code on Solaris but I don't imagine I  
will see anything that will change my mind.


--td

Samuel K. Gutierrez wrote:


Hi Rich,

It's a modification to the existing common sm component.  The  
modifications do include the addition of a new POSIX shared memory  
facility, however.


Sam

On Aug 11, 2010, at 10:05 AM, Graham, Richard L. wrote:

Is this a modification of the existing component, or a new  
component ?


Rich


On 8/10/10 10:52 AM, "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:

Hi,

I wanted to give everyone a heads-up about a new POSIX shared memory
component
that has been in the works for a while now and is ready to be pushed
into the
trunk.

http://bitbucket.org/samuelkgutierrez/ompi_posix_sm_new

Some highlights:
o New posix component now the new default.
   o May address some of the shared memory performance issues  
users

encounter
  when OMPI's session directories are inadvertently placed  
on a non-

local
  filesystem.
o Silent component failover.
   o In the default case, if the posix component fails  
initialization,

  mmap will be selected.
o The sysv component will only be queried for selection if it is
placed before
   the mmap component (for example, -mca mpi_common_sm
sysv,posix,mmap).  In the
   default case, sysv will never be queried/selected.
o Per some on-list discussion, now unlinking mmaped file in both  
mmap

and posix
   components (see: "System V Shared Memory for Open MPI: Request  
for

Community
   Input and Testing" thread).
o  Assuming local process homogeneity with respect to all utilized
shared
memory facilities. That is, if one local process deems a
particular shared
memory facility acceptable, then ALL local processes should be
able to
utilize that facility. As it stands, this is an important point
because one
process dictates to all other local processes which common sm
component will
be selected based on its own, local run-time test.
o Addressed some of George's code reuse concerns.

If there are no major objections by August 17th, I'll commit the  
code

after the
Tuesday morning conference call.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-11 Thread Samuel K. Gutierrez

Hi Rich,

It's a modification to the existing common sm component.  The  
modifications do include the addition of a new POSIX shared memory  
facility, however.


Sam

On Aug 11, 2010, at 10:05 AM, Graham, Richard L. wrote:


Is this a modification of the existing component, or a new component ?

Rich


On 8/10/10 10:52 AM, "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:

Hi,

I wanted to give everyone a heads-up about a new POSIX shared memory
component
that has been in the works for a while now and is ready to be pushed
into the
trunk.

http://bitbucket.org/samuelkgutierrez/ompi_posix_sm_new

Some highlights:
o New posix component now the new default.
   o May address some of the shared memory performance issues  
users

encounter
  when OMPI's session directories are inadvertently placed  
on a non-

local
  filesystem.
o Silent component failover.
   o In the default case, if the posix component fails  
initialization,

  mmap will be selected.
o The sysv component will only be queried for selection if it is
placed before
   the mmap component (for example, -mca mpi_common_sm
sysv,posix,mmap).  In the
   default case, sysv will never be queried/selected.
o Per some on-list discussion, now unlinking mmaped file in both mmap
and posix
   components (see: "System V Shared Memory for Open MPI: Request for
Community
   Input and Testing" thread).
o  Assuming local process homogeneity with respect to all utilized
shared
memory facilities. That is, if one local process deems a
particular shared
memory facility acceptable, then ALL local processes should be
able to
utilize that facility. As it stands, this is an important point
because one
process dictates to all other local processes which common sm
component will
be selected based on its own, local run-time test.
o Addressed some of George's code reuse concerns.

If there are no major objections by August 17th, I'll commit the code
after the
Tuesday morning conference call.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-10 Thread Samuel K. Gutierrez

Hi,

I wanted to give everyone a heads-up about a new POSIX shared memory  
component
that has been in the works for a while now and is ready to be pushed  
into the

trunk.

http://bitbucket.org/samuelkgutierrez/ompi_posix_sm_new

Some highlights:
o New posix component now the new default.
	o May address some of the shared memory performance issues users  
encounter
	   when OMPI's session directories are inadvertently placed on a non- 
local

   filesystem.
o Silent component failover.
o In the default case, if the posix component fails initialization,
   mmap will be selected.
o The sysv component will only be queried for selection if it is  
placed before
   the mmap component (for example, -mca mpi_common_sm  
sysv,posix,mmap).  In the

   default case, sysv will never be queried/selected.
o Per some on-list discussion, now unlinking mmaped file in both mmap  
and posix
   components (see: "System V Shared Memory for Open MPI: Request for  
Community

   Input and Testing" thread).
o  Assuming local process homogeneity with respect to all utilized  
shared
memory facilities. That is, if one local process deems a  
particular shared
memory facility acceptable, then ALL local processes should be  
able to
utilize that facility. As it stands, this is an important point  
because one
process dictates to all other local processes which common sm  
component will

be selected based on its own, local run-time test.
o Addressed some of George's code reuse concerns.

If there are no major objections by August 17th, I'll commit the code  
after the

Tuesday morning conference call.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory







Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-03 Thread Samuel K. Gutierrez

On Jun 2, 2010, at 11:58 AM, Samuel K. Gutierrez wrote:


Good point - I forgot about that.

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 2, 2010, at 11:40 AM, Jeff Squyres wrote:

Don't forget that the RML is also used to broadcast the success/ 
failure of the creation of the shared memory segment.


If the RML goes away, be sure that you can still determine that  
without hanging.


Personally, I still don't see the problem with using the RML stuff...


On Jun 2, 2010, at 1:07 PM, Samuel K. Gutierrez wrote:


Hi George,

That may work - I'll try it.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 2, 2010, at 10:59 AM, George Bosilca wrote:


How about ftok ? The init function takes a file_name as argument,
and this file name is unique per instance of the shared memory
region we want to create. We can use this file name with ftok to
create a unique key_t that can be used by shmget to retrieve the
shared memory identifier.

george.


Hi George,

I think ftok brings us back to the atomic file creation problem.  In  
particular, ftok requires pathname be an existing file.  As it stands,  
this file is created by the common sm module.


--
Samuel K. Gutierrez
Los Alamos National Laboratory



On Jun 2, 2010, at 11:53 , Samuel K. Gutierrez wrote:


On Jun 2, 2010, at 8:49 AM, Jeff Squyres wrote:


On Jun 2, 2010, at 10:44 AM, George Bosilca wrote:


Not sure what you mean here.  common/sm may create new shmem
segments at any time (e.g., during coll sm).  The RML message
exchange is to ensure that only 1 process creates and
initializes the segment and then all the others just attach  
to it.


Absolutely not! The RML messaging is not about initializing the
shared memory segment. As stated on my original text it has only
one purpose: to ensure the file used by mmap is created
atomically. The code for Windows do not exchange any RML  
messages

as the function to allocate the shared memory region provided by
the OS is atomic (exactly as the sysv one).


I thought that Sam said that it was important that only 1 process
shmctl/IPC_RMID...?


Hi George,

We are using RML messaging in the sysv code to exchange the shared
memory ID (generated by exactly one process).  I'm not sure how we
would go about passing along the shared memory ID without RML, but
any ideas are greatly appreciated.

Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez

Good point - I forgot about that.

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 2, 2010, at 11:40 AM, Jeff Squyres wrote:

Don't forget that the RML is also used to broadcast the success/ 
failure of the creation of the shared memory segment.


If the RML goes away, be sure that you can still determine that  
without hanging.


Personally, I still don't see the problem with using the RML stuff...


On Jun 2, 2010, at 1:07 PM, Samuel K. Gutierrez wrote:


Hi George,

That may work - I'll try it.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 2, 2010, at 10:59 AM, George Bosilca wrote:


How about ftok ? The init function takes a file_name as argument,
and this file name is unique per instance of the shared memory
region we want to create. We can use this file name with ftok to
create a unique key_t that can be used by shmget to retrieve the
shared memory identifier.

george.

On Jun 2, 2010, at 11:53 , Samuel K. Gutierrez wrote:


On Jun 2, 2010, at 8:49 AM, Jeff Squyres wrote:


On Jun 2, 2010, at 10:44 AM, George Bosilca wrote:


Not sure what you mean here.  common/sm may create new shmem
segments at any time (e.g., during coll sm).  The RML message
exchange is to ensure that only 1 process creates and
initializes the segment and then all the others just attach to  
it.


Absolutely not! The RML messaging is not about initializing the
shared memory segment. As stated on my original text it has only
one purpose: to ensure the file used by mmap is created
atomically. The code for Windows do not exchange any RML messages
as the function to allocate the shared memory region provided by
the OS is atomic (exactly as the sysv one).


I thought that Sam said that it was important that only 1 process
shmctl/IPC_RMID...?


Hi George,

We are using RML messaging in the sysv code to exchange the shared
memory ID (generated by exactly one process).  I'm not sure how we
would go about passing along the shared memory ID without RML, but
any ideas are greatly appreciated.

Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez

Hi George,

That may work - I'll try it.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jun 2, 2010, at 10:59 AM, George Bosilca wrote:

How about ftok ? The init function takes a file_name as argument,  
and this file name is unique per instance of the shared memory  
region we want to create. We can use this file name with ftok to  
create a unique key_t that can be used by shmget to retrieve the  
shared memory identifier.


 george.

On Jun 2, 2010, at 11:53 , Samuel K. Gutierrez wrote:


On Jun 2, 2010, at 8:49 AM, Jeff Squyres wrote:


On Jun 2, 2010, at 10:44 AM, George Bosilca wrote:

Not sure what you mean here.  common/sm may create new shmem  
segments at any time (e.g., during coll sm).  The RML message  
exchange is to ensure that only 1 process creates and  
initializes the segment and then all the others just attach to it.


Absolutely not! The RML messaging is not about initializing the  
shared memory segment. As stated on my original text it has only  
one purpose: to ensure the file used by mmap is created  
atomically. The code for Windows do not exchange any RML messages  
as the function to allocate the shared memory region provided by  
the OS is atomic (exactly as the sysv one).


I thought that Sam said that it was important that only 1 process  
shmctl/IPC_RMID...?


Hi George,

We are using RML messaging in the sysv code to exchange the shared  
memory ID (generated by exactly one process).  I'm not sure how we  
would go about passing along the shared memory ID without RML, but  
any ideas are greatly appreciated.


Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez

On Jun 2, 2010, at 7:28 AM, Jeff Squyres wrote:


On Jun 2, 2010, at 5:38 AM, George Bosilca wrote:

I think adding support for sysv shared memory is a good thing.  
However, I have some strong objections over the implementation in  
the hg tree. Here are 2 of the major ones:


1) the sysv shared memory creation is __atomic__ based on the flags  
used. Therefore, all the RML messages exchange is totally useless.


Not sure what you mean here.  common/sm may create new shmem  
segments at any time (e.g., during coll sm).  The RML message  
exchange is to ensure that only 1 process creates and initializes  
the segment and then all the others just attach to it.


The initializing of the segment after it is created/attached could  
be pipelined a little more.  E.g, since the init has an atomicly-set  
flag indicating when it's done, the root could create the seg,  
signal the others that they can attach, and then do the init -- the  
non-root procs can wait for flag to change atomicly to know when the  
seg has been initialized).  Is that what you're referring to?


2) the whole code is replicated in the 3 files (mmap, sysv and  
windows), even the common parts. However in the sysv case most of  
the comments have been modified to remove all capitals letter.
I'm in favor of extracting all the common parts and moving them in  
a special file. What should be kept in the particular files should  
only be the really different parts (small part of the init and  
finalize).


Sam -- are the common parts really common?  I.e., could they be  
factored out?  Or are they "just different enough" that factoring  
them out would be a PITA?


I'm sure some refactoring could be done - let me take a look.
--
Samuel K. Gutierrez
Los Alamos National Laboratory



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-01 Thread Samuel K. Gutierrez

Doh!

bitbucket repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory


On Jun 1, 2010, at 11:08 AM, Samuel K. Gutierrez wrote:


WHAT: New System V shared memory component.

WHY: https://svn.open-mpi.org/trac/ompi/ticket/1320

WHERE:
M  ompi/mca/btl/sm/btl_sm.c
M  ompi/mca/btl/sm/btl_sm_component.c
M  ompi/mca/btl/sm/btl_sm.h
M  ompi/mca/mpool/sm/mpool_sm_component.c
M  ompi/mca/mpool/sm/mpool_sm.h
M  ompi/mca/mpool/sm/mpool_sm_module.c
A  ompi/mca/common/sm/configure.m4
A  ompi/mca/common/sm/common_sm_sysv.h
A  ompi/mca/common/sm/common_sm_windows.c
A  ompi/mca/common/sm/common_sm_windows.h
A  ompi/mca/common/sm/common_sm.c
A  ompi/mca/common/sm/common_sm_sysv.c
A  ompi/mca/common/sm/common_sm.h
M  ompi/mca/common/sm/common_sm_mmap.c
M  ompi/mca/common/sm/common_sm_mmap.h
M  ompi/mca/common/sm/Makefile.am
M  ompi/mca/common/sm/help-mpi-common-sm.txt
M  ompi/mca/coll/sm/coll_sm_module.c
M  ompi/mca/coll/sm/coll_sm.h

WHEN: Upon acceptance.

TIMEOUT: Tuesday, June 8, 2010 (after devel concall).

HOW:
MCA mpi: parameter "mpi_common_sm" (current value: ,
 data source: default value)
 Which shared memory support will be used.  
Valid
 values: sysv,mmap - or a comma delimited  
combination
 of them (order dependent).  The first  
component that

 is successfully selected is used.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory forOpenMPI:Request forCommunity Input and Testing

2010-05-05 Thread Samuel K. Gutierrez

On May 5, 2010, at 6:10 AM, Jeff Squyres wrote:


On May 4, 2010, at 9:53 AM, Ashley Pittman wrote:

Point noted.  But actually -- can you give specific reasons as to  
why a user should care?  Keep in mind that this would be a short- 
lived fork'ed process -- not "spawn" in the MPI sense of the word.


You might be running the job under Valgrind or another debugger,  
bclr has some issues with fork as I remember and traditionally  
there have been IB mapping issues here as well.  I'm sure you could  
make a case against any of those points if you wanted to but I  
think the argument stands, doing this kind of run-time check  
shouldn't be needed.


Mmm; good points (especially Valgrind).  BLCR and OpenFabrics verbs  
shouldn't be much of an issue here, but I can see that there might  
be unexpectedness if you're running under Valgrind or some other  
debugger.


It might be possible to construct the code however so that if it  
failed to initialise it just wasn't used rather than aborted the  
job which would have much the same effect as a run-time test but  
without having to fork new processes and create short-lived shared  
memory regions.


That's how most of the network transports are in OMPI today -- if  
they fail to init, they are just skipped.


The problem here is that you really need 2 processes to do this  
test.  I suppose it could be done with local ranks 0 and 1 instead  
of forking a new process -- they would just need to communicate via  
RML to sync up, I suppose.


I need to think about it a little more, but I like this solution.

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory



I should of course said fork where I mentioned spawn above to avoid  
any confusion, spawn has a specific meaning in the context of MPI.


I still think a better understanding of the issue is required  
before any decision here is made though, I'm surprised by Samuels  
description of the problem because it's not how I remember it and  
from what Chris says it doesn't reflect what is in linux Git code  
either.  I'd like to see why there is an apparent difference in  
behaviour before a decision is made to only support one.


There's no intent to only support sysv or mmap.  Samuel's work was  
to extend OMPI to support sysv in the case where it would be  
advantageous (e.g., guaranteed cleanup of the shmem segment).  The  
mmap stuff is definitely not going to be removed.


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-03 Thread Samuel K. Gutierrez

Hi All,

New configure-time test added - thanks for the suggestion, Jeff.   
Update and give it a whirl.


Ethan - could you please try again?  This time, I'm hoping sysv  
support will be disabled ;-).


Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 3, 2010, at 9:18 AM, Samuel K. Gutierrez wrote:


Hi Jeff,

Sounds like a plan :-).

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 3, 2010, at 9:12 AM, Jeff Squyres wrote:

It might well be that you need a configure test to determine  
whether this behavior occurs or not.  Heck, it may even need to be  
a run-time test!  Hrm.


Write a small C program that does something like the following  
(this is off the top of my head):


fork a child
child goes to sleep immediately
sysv alloc a segment
attach to it
ipc rm it
parent wakes up child
child tries to attach to segment

If that succeeds, then all is good.  If not, then don't use this  
stuff.



On May 3, 2010, at 10:55 AM, Samuel K. Gutierrez wrote:


Hi all,

Does anyone know of a relatively portable solution for querying a
given system for the shmctl behavior that I am relying on, or is  
this

going to be a nightmare?  Because, if I am reading this thread
correctly, the presence of shmget and Linux is not sufficient for
determining an adequate level of sysv support.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 2, 2010, at 7:48 AM, N.M. Maclaren wrote:


On May 2 2010, Ashley Pittman wrote:

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:

As to performance there should be no difference in use between  
sys-

V shared memory and file-backed shared memory, the instructions
issued and the MMU flags for the page should both be the same so
the performance should be identical.


Not necessarily, and possibly not so even for far-future Linuces.
On at least one system I used, the poxious kernel wrote the  
complete

file to disk before returning - all right, it did that for System V
shared memory, too, just to a 'hidden' file!  But, if I recall, on
another it did that only for file-backed shared memory - however,  
it's

a decade ago now and I may be misremembering.

Of course, that's a serious issue mainly for large segments.  I was
using multi-GB ones.  I don't know how big the ones you need are.


The one area you do need to keep an eye on for performance is on
numa machines where it's important which process on a node touches
each page first, you can end up using different areas (pages, not
regions) for communicating in different directions between the  
same

pair of processes. I don't believe this is any different to mmap
backed shared memory though.


On some systems it may be, but in bizarre, inconsistent,  
undocumented

and unpredictable ways :-(  Also, there are usually several system
(and
sometimes user) configuration options that change the behaviour, so
you
have to allow for that.  My experience of trying to use those is  
that

different uses have incompatible requirements, and most of the
critical
configuration parameters apply to ALL uses!

In my view, the configuration variability is the number one  
nightmare
for trying to write portable code that uses any form of shared  
memory.

ARMCI seem to agree.


Because of this, sysv support may be limited to Linux systems -
that is,
until we can get a better sense of which systems provide the  
shmctl

IPC_RMID behavior that I am relying on.


And, I suggest, whether they have an evil gotcha on one of the  
areas

that
Ashley Pittman noted.


Regards,
Nick Maclaren.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-03 Thread Samuel K. Gutierrez

Hi Jeff,

Sounds like a plan :-).

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 3, 2010, at 9:12 AM, Jeff Squyres wrote:

It might well be that you need a configure test to determine whether  
this behavior occurs or not.  Heck, it may even need to be a run- 
time test!  Hrm.


Write a small C program that does something like the following (this  
is off the top of my head):


fork a child
child goes to sleep immediately
sysv alloc a segment
attach to it
ipc rm it
parent wakes up child
child tries to attach to segment

If that succeeds, then all is good.  If not, then don't use this  
stuff.



On May 3, 2010, at 10:55 AM, Samuel K. Gutierrez wrote:


Hi all,

Does anyone know of a relatively portable solution for querying a
given system for the shmctl behavior that I am relying on, or is this
going to be a nightmare?  Because, if I am reading this thread
correctly, the presence of shmget and Linux is not sufficient for
determining an adequate level of sysv support.

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 2, 2010, at 7:48 AM, N.M. Maclaren wrote:


On May 2 2010, Ashley Pittman wrote:

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:

As to performance there should be no difference in use between sys-
V shared memory and file-backed shared memory, the instructions
issued and the MMU flags for the page should both be the same so
the performance should be identical.


Not necessarily, and possibly not so even for far-future Linuces.
On at least one system I used, the poxious kernel wrote the complete
file to disk before returning - all right, it did that for System V
shared memory, too, just to a 'hidden' file!  But, if I recall, on
another it did that only for file-backed shared memory - however,  
it's

a decade ago now and I may be misremembering.

Of course, that's a serious issue mainly for large segments.  I was
using multi-GB ones.  I don't know how big the ones you need are.


The one area you do need to keep an eye on for performance is on
numa machines where it's important which process on a node touches
each page first, you can end up using different areas (pages, not
regions) for communicating in different directions between the same
pair of processes. I don't believe this is any different to mmap
backed shared memory though.


On some systems it may be, but in bizarre, inconsistent,  
undocumented

and unpredictable ways :-(  Also, there are usually several system
(and
sometimes user) configuration options that change the behaviour, so
you
have to allow for that.  My experience of trying to use those is  
that

different uses have incompatible requirements, and most of the
critical
configuration parameters apply to ALL uses!

In my view, the configuration variability is the number one  
nightmare
for trying to write portable code that uses any form of shared  
memory.

ARMCI seem to agree.


Because of this, sysv support may be limited to Linux systems -
that is,
until we can get a better sense of which systems provide the  
shmctl

IPC_RMID behavior that I am relying on.


And, I suggest, whether they have an evil gotcha on one of the areas
that
Ashley Pittman noted.


Regards,
Nick Maclaren.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-03 Thread Samuel K. Gutierrez

Hi all,

Does anyone know of a relatively portable solution for querying a  
given system for the shmctl behavior that I am relying on, or is this  
going to be a nightmare?  Because, if I am reading this thread  
correctly, the presence of shmget and Linux is not sufficient for  
determining an adequate level of sysv support.


Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 2, 2010, at 7:48 AM, N.M. Maclaren wrote:


On May 2 2010, Ashley Pittman wrote:

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:

As to performance there should be no difference in use between sys- 
V shared memory and file-backed shared memory, the instructions  
issued and the MMU flags for the page should both be the same so  
the performance should be identical.


Not necessarily, and possibly not so even for far-future Linuces.
On at least one system I used, the poxious kernel wrote the complete
file to disk before returning - all right, it did that for System V
shared memory, too, just to a 'hidden' file!  But, if I recall, on
another it did that only for file-backed shared memory - however, it's
a decade ago now and I may be misremembering.

Of course, that's a serious issue mainly for large segments.  I was
using multi-GB ones.  I don't know how big the ones you need are.

The one area you do need to keep an eye on for performance is on  
numa machines where it's important which process on a node touches  
each page first, you can end up using different areas (pages, not  
regions) for communicating in different directions between the same  
pair of processes. I don't believe this is any different to mmap  
backed shared memory though.


On some systems it may be, but in bizarre, inconsistent, undocumented
and unpredictable ways :-(  Also, there are usually several system  
(and
sometimes user) configuration options that change the behaviour, so  
you

have to allow for that.  My experience of trying to use those is that
different uses have incompatible requirements, and most of the  
critical

configuration parameters apply to ALL uses!

In my view, the configuration variability is the number one nightmare
for trying to write portable code that uses any form of shared memory.
ARMCI seem to agree.

Because of this, sysv support may be limited to Linux systems -  
that is,

until we can get a better sense of which systems provide the shmctl
IPC_RMID behavior that I am relying on.


And, I suggest, whether they have an evil gotcha on one of the areas  
that

Ashley Pittman noted.


Regards,
Nick Maclaren.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread Samuel K. Gutierrez
Hi Ethan,

Sorry about the lag.

As far as I can tell, calling shmctl IPC_RMID is immediately destroying
the shared memory segment even though there is at least one process
attached to it.  This is interesting and confusing because Solaris 10's
behavior description of shmctl IPC_RMID is similar to that of Linux'.

I call shmctl IPC_RMID immediately after one process has attached to the
segment because, at least on Linux, this only marks the segment for
destruction.  The segment is only actually destroyed after all attached
processes have terminated.  I'm relying on this behavior for resource
cleanup upon application termination (normal/abnormal).

Because of this, sysv support may be limited to Linux systems - that is,
until we can get a better sense of which systems provide the shmctl
IPC_RMID behavior that I am relying on.

Any other ideas are greatly appreciated.

Thanks for testing!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

> On Thu, Apr/29/2010 02:52:24PM, Samuel K. Gutierrez wrote:
>>  Hi Ethan,
>>  Bummer.  What does the following command show?
>>  sysctl -a | grep shm
>
> In this case, I think the Solaris equivalent to sysctl is prctl, e.g.,
>
>   $ prctl -i project group.staff
>   project: 10: group.staff
>   NAMEPRIVILEGE   VALUEFLAG   ACTION
> RECIPIENT
>   ...
>   project.max-shm-memory
>   privileged  3.92GB  -   deny

> -
>   system  16.0EBmax   deny

> -
>   project.max-shm-ids
>   privileged128   -   deny

> -
>   system  16.8M max   deny

> -
>   ...
>
> Is that the info you need?
>
> -Ethan
>
>>  Thanks!
>>  --
>>  Samuel K. Gutierrez
>>  Los Alamos National Laboratory
>>  On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote:
>> > Hi Samuel,
>> >
>> > I'm trying to run off your HG clone, but I'm seeing issues with
c_hello, e.g.,
>> >
>> >  $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host
>> > burl-ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello
>> >  --
A system call failed during shared memory initialization that should not
have.  It is likely that your MPI job will now either abort or experience
performance degradation.
>> >
>> >Local host:  burl-ct-v440-2
>> >System call: shmat(2)
>> >Process: [[43408,1],1]
>> >Error:   Invalid argument (errno 22)
>> >  --
^Cmpirun: killing job...
>> >
>> >  $ uname -a
>> >  SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc
>> SUNW,Sun-Fire-V440
>> >
>> > The same test works okay if I s/sysv/mmap/.
>> >
>> > Regards,
>> > Ethan
>> >
>> >
>> > On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote:
>> >> Hi,
>> >>
>> >> Faster component initialization/finalization times is one of the
main
>> >> motivating factors of this work.  The general idea is to get away
>> from
>> >> creating a rather large backing file.  With respect to module
>> bandwidth
>> >> and
>> >> latency, mmap and sysv seem to be comparable - at least that is what
>> my
>> >> preliminary tests have shown.  As it stands, I have not come across
a
>> >> situation where the mmap SM component doesn't work or is slower.
>> >>
>> >> Hope that helps,
>> >>
>> >> --
>> >> Samuel K. Gutierrez
>> >> Los Alamos National Laboratory
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote:
>> >>
>> >>> On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez
>> <sam...@lanl.gov>
>> >>> wrote:
>> >>>> With Jeff and Ralph's help, I have completed a System V shared
>> memory
>> >>>> component for Open MPI.
>> >>>
>> >>> What is the motivation for this work ? Are there situations where
>> the
>> >>> mmap based SM component doesn't work or is slow(er) ?
>> >>>
>> >>> Kind regards,
>> >>> Bogdan
>> >>> ___
>> >>> devel mailing list
>> >>> de...@open-mpi.org
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >> ___
>> >> devel mailing list
>> >> de...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  ___
>>  devel mailing list
>>  de...@open-mpi.org
>>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>





Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-04-29 Thread Samuel K. Gutierrez

Hi Ethan,

Bummer.  What does the following command show?

sysctl -a | grep shm

Thanks!

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote:


Hi Samuel,

I'm trying to run off your HG clone, but I'm seeing issues with
c_hello, e.g.,

 $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host burl- 
ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello
  
--

 A system call failed during shared memory initialization that should
 not have.  It is likely that your MPI job will now either abort or
 experience performance degradation.

   Local host:  burl-ct-v440-2
   System call: shmat(2)
   Process: [[43408,1],1]
   Error:   Invalid argument (errno 22)
  
--

 ^Cmpirun: killing job...

 $ uname -a
 SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc SUNW,Sun- 
Fire-V440


The same test works okay if I s/sysv/mmap/.

Regards,
Ethan


On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote:

Hi,

Faster component initialization/finalization times is one of the main
motivating factors of this work.  The general idea is to get away  
from
creating a rather large backing file.  With respect to module  
bandwidth and
latency, mmap and sysv seem to be comparable - at least that is  
what my

preliminary tests have shown.  As it stands, I have not come across a
situation where the mmap SM component doesn't work or is slower.

Hope that helps,

--
Samuel K. Gutierrez
Los Alamos National Laboratory





On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote:

On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez <sam...@lanl.gov 
>

wrote:
With Jeff and Ralph's help, I have completed a System V shared  
memory

component for Open MPI.


What is the motivation for this work ? Are there situations where  
the

mmap based SM component doesn't work or is slow(er) ?

Kind regards,
Bogdan
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-04-28 Thread Samuel K. Gutierrez

Hi,

Faster component initialization/finalization times is one of the main  
motivating factors of this work.  The general idea is to get away from  
creating a rather large backing file.  With respect to module  
bandwidth and latency, mmap and sysv seem to be comparable - at least  
that is what my preliminary tests have shown.  As it stands, I have  
not come across a  situation where the mmap SM component doesn't work  
or is slower.


Hope that helps,

--
Samuel K. Gutierrez
Los Alamos National Laboratory





On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote:

On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez  
<sam...@lanl.gov> wrote:

With Jeff and Ralph's help, I have completed a System V shared memory
component for Open MPI.


What is the motivation for this work ? Are there situations where the
mmap based SM component doesn't work or is slow(er) ?

Kind regards,
Bogdan
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-04-27 Thread Samuel K. Gutierrez

Hi,

With Jeff and Ralph's help, I have completed a System V shared memory  
component for Open MPI.  I have conducted some preliminary tests on  
our systems, but would like to get test results from a broader audience.


As it stands, mmap is the defaul, but System V shared memory can be  
activated using: -mca mpi_common_sm sysv


Repository:
http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm

Input is greatly appreciated!

--
Samuel K. Gutierrez
Los Alamos National Laboratory



Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-22 Thread Samuel K. Gutierrez


On Apr 22, 2010, at 10:08 AM, Rainer Keller wrote:


Hello Oliver,
thanks for the update.

Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their  
session

directory is on NFS (or Lustre).


... or panfs :-)

Samuel K. Gutierrez



Best regards,
Rainer


On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote:

To sum up and give an update:

The extended communication times while using shared memory  
communication
of openmpi processes are caused by openmpi session directory laying  
on

the network via NFS.

The problem is resolved by establishing on each diskless node a  
ramdisk

or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to
point to the according mountpoint shared memory communication and its
files are kept local, thus decreasing the communication times by
magnitudes.

The relation of the problem to the kernel version is not really
resolved, but maybe not "the problem" in this respect.
My benchmark is now running fine on a single node with 4 CPU, kernel
2.6.33.1 and openmpi 1.4.1.
Running on multiple nodes I experience still higher (TCP)  
communication

times than I would expect. But that requires me some more deep
researching the issue (e.g. collisions on the network) and should
probably posted to a new thread.

Thank you guys for your help.

oli



--

Rainer Keller, PhD  Tel: +1 (865) 241-6293
Oak Ridge National Lab  Fax: +1 (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI v1.3.4rc4 is out

2009-11-05 Thread Samuel K. Gutierrez
That's interesting...  Works great now that carto is built.  Why is  
carto now required?


--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Nov 5, 2009, at 4:11 PM, David Gunter wrote:

Oh, good catch.  I'm not sure who updates the platform files or who  
would have added the "carto" option to the no_build.  It's the only  
difference between the the 1.3.4 platform files and the previous  
ones, save for some compiler flags.


-david

--
David Gunter
HPC-3: Infrastructure Team
Los Alamos National Laboratory




On Nov 5, 2009, at 3:55 PM, Jeff Squyres wrote:


I see:

enable_mca_no_build=carto,crs,routed-direct,routed-linear,snapc,pml- 
dr,pml-crcp2,pml-crcpw,pml-v,pml-example,crcp,pml-cm,filem


Which means that you're directing all carto components not to build  
at all.


It looks like carto is now required...?


On Nov 5, 2009, at 5:38 PM, Samuel K. Gutierrez wrote:


Hi Jeff,

This is how I configured my build.

./configure --with-platform=./contrib/platform/lanl/rr-class/ 
optimized-

panasas --prefix=/usr/projects/hpctools/samuel/local/rr-dev/apps/
openmpi/gcc/ompi-1.3.4rc4 --libdir=/usr/projects/hpctools/samuel/ 
local/

rr-dev/apps/openmpi/gcc/ompi-1.3.4rc4/lib64

I'll send the build log shortly.

Thanks!
--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Nov 5, 2009, at 3:07 PM, Jeff Squyres wrote:

> How did you build?
>
> I see one carto component named "auto_detect" in the 1.3.4 source
> tree, but I don't see it in your ompi_info output.
>
> Did that component not build?
>
>
> On Nov 4, 2009, at 7:20 PM, Samuel K. Gutierrez wrote:
>
>> Hi All,
>>
>> I just built OMPI 1.3.4rc4 on one of our Roadrunner machines.   
When I

>> try to launch a simple MPI job, I get the following:
>>
>> [rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking  
for

>> carto components
>> [rra011a.rr.lanl.gov:31601] mca: base: components_open: opening  
carto

>> components
>> [rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto
>> components
>> [rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component
>> selected!
>>  
--

>> It looks like opal_init failed for some reason; your parallel
>> process is
>> likely to abort.  There are many reasons that a parallel  
process can

>> fail during opal_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal
>> failure;
>> here's some additional information (which may only be relevant  
to an

>> Open MPI developer):
>>
>>   opal_carto_base_select failed
>>   --> Returned value -13 instead of OPAL_SUCCESS
>>  
--
>> [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG:  
Not

>> found in file runtime/orte_init.c at line 77
>> [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG:  
Not

>> found in file orterun.c at line 541
>>
>> This may be an issue on our end regarding a runtime parameter  
that
>> isn't set correctly.  See attached.  Please let me know if you  
need

>> any more info.
>>
>> Thanks!
>> --
>> Samuel K. Gutierrez
>> Los Alamos National Laboratory
>>
>>
>> 
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI v1.3.4rc4 is out

2009-11-05 Thread Samuel K. Gutierrez

Hi Jeff,

This is how I configured my build.

./configure --with-platform=./contrib/platform/lanl/rr-class/optimized- 
panasas --prefix=/usr/projects/hpctools/samuel/local/rr-dev/apps/ 
openmpi/gcc/ompi-1.3.4rc4 --libdir=/usr/projects/hpctools/samuel/local/ 
rr-dev/apps/openmpi/gcc/ompi-1.3.4rc4/lib64


I'll send the build log shortly.

Thanks!
--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Nov 5, 2009, at 3:07 PM, Jeff Squyres wrote:


How did you build?

I see one carto component named "auto_detect" in the 1.3.4 source  
tree, but I don't see it in your ompi_info output.


Did that component not build?


On Nov 4, 2009, at 7:20 PM, Samuel K. Gutierrez wrote:


Hi All,

I just built OMPI 1.3.4rc4 on one of our Roadrunner machines.  When I
try to launch a simple MPI job, I get the following:

[rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking for
carto components
[rra011a.rr.lanl.gov:31601] mca: base: components_open: opening carto
components
[rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto
components
[rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component
selected!
--
It looks like opal_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal  
failure;

here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_carto_base_select failed
  --> Returned value -13 instead of OPAL_SUCCESS
--
[rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
found in file runtime/orte_init.c at line 77
[rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
found in file orterun.c at line 541

This may be an issue on our end regarding a runtime parameter that
isn't set correctly.  See attached.  Please let me know if you need
any more info.

Thanks!
--
Samuel K. Gutierrez
Los Alamos National Laboratory






--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI v1.3.4rc4 is out

2009-11-04 Thread Samuel K. Gutierrez

Hi All,

I just built OMPI 1.3.4rc4 on one of our Roadrunner machines.  When I  
try to launch a simple MPI job, I get the following:


[rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking for  
carto components
[rra011a.rr.lanl.gov:31601] mca: base: components_open: opening carto  
components
[rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto  
components
[rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component  
selected!

--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_carto_base_select failed
  --> Returned value -13 instead of OPAL_SUCCESS
--
[rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not  
found in file runtime/orte_init.c at line 77
[rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not  
found in file orterun.c at line 541


This may be an issue on our end regarding a runtime parameter that  
isn't set correctly.  See attached.  Please let me know if you need  
any more info.


Thanks!
--
Samuel K. Gutierrez
Los Alamos National Laboratory



lanl-rr-class-1.3.4rc4.tar.gz
Description: GNU Zip compressed data




On Nov 4, 2009, at 3:00 PM, Jeff Squyres wrote:


The latest-n-greatest is available here:

   http://www.open-mpi.org/software/ompi/v1.3/

Please beat it up and look for problems!

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Samuel K. Gutierrez

Hi Jeff,

Sorry about the ambiguity.  I just had another conversation with our  
TotalView person and the problem -seems- to be unrelated to OMPI.   
Guess I jumped the gun...


Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 8:58 AM, Jeff Squyres wrote:


Can you more precisely define "not working properly"?

On Sep 21, 2009, at 10:26 AM, Samuel K. Gutierrez wrote:


Hi,

According to our TotalView person, PGI and Intel versions of OMPI
1.3.3 are not working properly.  She noted that 1.2.8 and 1.3.2 work
fine.

Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 7:19 AM, Terry Dontje wrote:

> Ralph Castain wrote:
>> I see it declared "extern" in orte/tools/orterun/debuggers.h, but
>> not DECLSPEC'd
>>
>> FWIW: LANL uses intel compilers + totalview on a regular basis,  
and

>> I have yet to hear of an issue.
>>
> It actually will work if you attach to the job or if you are not
> relying on the MPIR_Breakpoint to actually stop execution.
>
> --td
>
>> On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:
>>
>>> I was kind of amazed no one else managed to run into this but it
>>> was brought to my attention that compiling OMPI with Intel
>>> compilers and visibility on that the MPIR_Breakpoint symbol was
>>> not being exposed. I am assuming this is due to MPIR_Breakpoint
>>> not being ORTE or OMPI_DECLSPEC'd
>>> Do others agree or am I missing something obvious here?
>>>
>>> Interestingly enough, it doesn't look like gcc, pgi, pathscale or
>>> sun compilers are hiding the MPIR_Breakpoint symbol.
>>> --td
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Samuel K. Gutierrez

Hi,

According to our TotalView person, PGI and Intel versions of OMPI  
1.3.3 are not working properly.  She noted that 1.2.8 and 1.3.2 work  
fine.


Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 7:19 AM, Terry Dontje wrote:


Ralph Castain wrote:
I see it declared "extern" in orte/tools/orterun/debuggers.h, but  
not DECLSPEC'd


FWIW: LANL uses intel compilers + totalview on a regular basis, and  
I have yet to hear of an issue.


It actually will work if you attach to the job or if you are not  
relying on the MPIR_Breakpoint to actually stop execution.


--td


On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:

I was kind of amazed no one else managed to run into this but it  
was brought to my attention that compiling OMPI with Intel  
compilers and visibility on that the MPIR_Breakpoint symbol was  
not being exposed. I am assuming this is due to MPIR_Breakpoint  
not being ORTE or OMPI_DECLSPEC'd

Do others agree or am I missing something obvious here?

Interestingly enough, it doesn't look like gcc, pgi, pathscale or  
sun compilers are hiding the MPIR_Breakpoint symbol.

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: convert send to ssend

2009-08-24 Thread Samuel K. Gutierrez

Hi Ashley,

My understanding is that this behavior would not be enabled by default 
in the standard debug build.  The "always convert to synchronous sends" 
mode would be an additional configure-time option.


Samuel K. Gutierrez

Ashley Pittman wrote:

On Mon, 2009-08-24 at 13:27 -0400, Jeff Squyres wrote:
  

It's the difference between:

a. if (0) { ... convert ... }  Modern compilers will remove this code  
as part of dead-code removal.
b. if (1) { ... convert ... }  Modern compilers will remove the "if  
(1)" and always execute the code.
c. if (some_variable) { ... convert ...}  An MCA parameter can load  
some_variable with 0 or 1.


The point of b is for sysadmins (or individual developers) who want to  
force there to *always* be correct MPI applications.



But couldn't the sysadmin equally well write a config file to achieve
the same effect should they want to?

Having it enabled (and on) in the standard "debug" build is going to
change the behaviour of applications with using a debug library, may
well render bugs un-reproducible in debug mode or worse you may end up
with end-user applications that only run in debug mode and not with a
normal build.

I'm all for having as much error checking enabled in debug builds as
possible but to change the behaviour risks masking problems elsewhere
IMHO.

Ashley,

  


Re: [OMPI devel] RFC: convert send to ssend

2009-08-24 Thread Samuel K. Gutierrez

Hi Jeff,

Sounds good to me.

Samuel K. Gutierrez


Jeff Squyres wrote:
The debug builds already have quite a bit of performance overhead.  It 
might be desirable to change this RFC to have a similar tri-state as 
the MPI parameter checking:


- compiled out
- compiled in, always check
- compiled in, use MCA parameter to determine whether to check

Adapting that to this RFC, perhaps something like this:

- compiled out
- compiled in, always convert standard send to sync send
- compiled in, use MCA parameter to determine whether to convert 
standard -> sync


And we can leave the default as "compiled out".

Howzat?


On Aug 23, 2009, at 9:07 PM, Samuel K. Gutierrez wrote:


Hi all,

How about exposing this functionality as a run-time parameter that is 
only

available in debug builds?  This will make debugging easier and won't
impact the performance of optimized builds.  Just an idea...

Samuel K. Gutierrez

>
> - "Jeff Squyres" <jsquy...@cisco.com> wrote:
>
>> Does anyone have any suggestions?  Or are we stuck
>> with compile-time checking?
>
> I didn't see this until now, but I'd be happy with
> just a compile time option so we could produce an
> install just for debugging purposes and have our
> users explicitly select it with modules.
>
> I have to say that this is of interest to us as we're
> trying to help a researcher at one of our member uni's
> to track down a bug where a message appears to go missing.
>
> cheers!
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager
>  The Victorian Partnership for Advanced Computing
>  P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






Re: [OMPI devel] RFC: convert send to ssend

2009-08-23 Thread Samuel K. Gutierrez
Hi all,

How about exposing this functionality as a run-time parameter that is only
available in debug builds?  This will make debugging easier and won't
impact the performance of optimized builds.  Just an idea...

Samuel K. Gutierrez

>
> - "Jeff Squyres" <jsquy...@cisco.com> wrote:
>
>> Does anyone have any suggestions?  Or are we stuck
>> with compile-time checking?
>
> I didn't see this until now, but I'd be happy with
> just a compile time option so we could produce an
> install just for debugging purposes and have our
> users explicitly select it with modules.
>
> I have to say that this is of interest to us as we're
> trying to help a researcher at one of our member uni's
> to track down a bug where a message appears to go missing.
>
> cheers!
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager
>  The Victorian Partnership for Advanced Computing
>  P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



Re: [OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value

2009-03-25 Thread Samuel K. Gutierrez
Hi All,

George - I really appreciate the quick response.

> Hi,
>
> at least for the specific test program I used, the negative values for
the peer attribute disappeared after George's modifications in 20844.

Same here for my profiling library - tested with openmpi-1.3.2a1r20855.

>
> One remark: after installation, I had to remove the '#include
> "ompi_config.h"' line  in the "include/peruse.h" header to get PERUSE
applications to compile. Otherwise I got a missing header error message
for ompi_config.h.

I did not experience this - no modifications were needed on my end.  That
being said, my peruse.h does not include ompi_config.h, only mpi.h.

Thanks again,

Samuel K. Gutierrez

>
> Regards,
> Kiril
>
>
> On Mon, 2009-03-23 at 16:34 -0400, George Bosilca wrote:
>> You are absolutely right, the peer should never be set to -1 on any of
the PERUSE callbacks. I checked the code this morning and figure out
what was the problem. We report the peer and the tag attached to a
request before setting the right values (some code moved around). I
submitted a patch and created a "move request" to have this correction
as soon as possible on one of our stable releases. The move request can
be followed using our TRAC system and the following link
>> (https://svn.open-mpi.org/trac/ompi/ticket/1845
>> ). If you want to play with this change please update your Open MPI
installation to a nightly build or a fresh checkout from the SVN with
at least revision 20844 (a nightly including this change will be posted
on our website tomorrow morning).
>>Thanks,
>>  george.
>> On Mar 23, 2009, at 13:23 , Samuel K. Gutierrez wrote:
>> > Hi Kiril,
>> >
>> > Appreciate the quick response.
>> >
>> >> Hi Samuel,
>> >>
>> >> On Sat, 21 Mar 2009 18:18:54 -0600 (MDT)
>> >>  "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:
>> >>> Hi All,
>> >>>
>> >>> I'm writing a simple profiling library which utilizes
>> >>> PERUSE.  My callback
>> >>
>> >> So am I :)
>> >>
>> >>> function counts communication events (see example code
>> >>> below).  I noticed
>> >>> that in OMPI v1.3 spec->peer is sometimes a negative
>> >>> value (OMPI v1.2.6
>> >>> did not exhibit this behavior).  I added some boundary
>> >>> checks, but it
>> >>> seems as if this is a bug?  I hope I'm not missing
>> >>> something...
>> >>
>> >> It took me quite some time to reproduce the error - I also
>> >
>> > Sorry about that - I should have provided more information.
>> >
>> >> got peer value "-1" for the Peruse peruse_comm_spec_t
>> >> struct. I only managed to reproduce this with
>> >> communication of a process with itself, which is an
>> >> unusual scenario. Anyway, for all the tests I did, the
>> >> error happened only when:
>> >>
>> >> -a process communicates with itself
>> >> -the MPI receive call is made
>> >> -the Peruse event "PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q" is
>> >> triggered
>> >
>> > That's interesting... Nice work!
>> >
>> >>
>> >>
>> >> The file ompi/mca/pml/ob1/pml_ob1_recvreq.c seems to be
>> >> the place where the above event is called with a wrong
>> >> value of the peer attribute.
>> >>
>> >> I will let you know if I find something.
>> >
>> > I will also take a look.
>> >
>> >>
>> >>
>> >> Best regards,
>> >> Kiril
>> >>
>> >>>
>> >>> The peruse test provided in the OMPI v1.3 source
>> >>> exhibits similar behavior:
>> >>> mpirun -np 2 ./mpi_peruse | grep peer:-1
>> >>>
>> >>> int callback(peruse_event_h event_h, MPI_Aint unique_id,
>> >>> peruse_comm_spec_t *spec, void *param) {
>> >>>   if (spec->peer == rank) {
>> >>>   return MPI_SUCCESS;
>> >>>   }
>> >>>   rrCounts[spec->peer]++;
>> >>>   return MPI_SUCCESS;
>> >>> }
>> >>>
>> >>>
>> >>> Any insight is greatly appreciated.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Samuel K. Gutierrez
>> >>> ___
>> >>> devel mailing list
>> >>> de...@open-mpi.org
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >>
>> >
>> > Appreciate the help,
>> >
>> > Samuel K. Gutierrez
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>





Re: [OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value

2009-03-23 Thread Samuel K. Gutierrez
Hi Kiril,

Appreciate the quick response.

> Hi Samuel,
>
> On Sat, 21 Mar 2009 18:18:54 -0600 (MDT)
>   "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:
>> Hi All,
>>
>> I'm writing a simple profiling library which utilizes
>>PERUSE.  My callback
>
> So am I :)
>
>> function counts communication events (see example code
>>below).  I noticed
>> that in OMPI v1.3 spec->peer is sometimes a negative
>>value (OMPI v1.2.6
>> did not exhibit this behavior).  I added some boundary
>>checks, but it
>> seems as if this is a bug?  I hope I'm not missing
>>something...
>
> It took me quite some time to reproduce the error - I also

Sorry about that - I should have provided more information.

> got peer value "-1" for the Peruse peruse_comm_spec_t
> struct. I only managed to reproduce this with
> communication of a process with itself, which is an
> unusual scenario. Anyway, for all the tests I did, the
> error happened only when:
>
> -a process communicates with itself
> -the MPI receive call is made
> -the Peruse event "PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q" is
> triggered

That's interesting... Nice work!

>
>
> The file ompi/mca/pml/ob1/pml_ob1_recvreq.c seems to be
> the place where the above event is called with a wrong
> value of the peer attribute.
>
> I will let you know if I find something.

I will also take a look.

>
>
> Best regards,
> Kiril
>
>>
>> The peruse test provided in the OMPI v1.3 source
>>exhibits similar behavior:
>> mpirun -np 2 ./mpi_peruse | grep peer:-1
>>
>> int callback(peruse_event_h event_h, MPI_Aint unique_id,
>> peruse_comm_spec_t *spec, void *param) {
>>if (spec->peer == rank) {
>>return MPI_SUCCESS;
>>}
>>rrCounts[spec->peer]++;
>>return MPI_SUCCESS;
>> }
>>
>>
>> Any insight is greatly appreciated.
>>
>> Thanks,
>>
>> Samuel K. Gutierrez
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>

Appreciate the help,

Samuel K. Gutierrez


[OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value

2009-03-21 Thread Samuel K. Gutierrez
Hi All,

I'm writing a simple profiling library which utilizes PERUSE.  My callback
function counts communication events (see example code below).  I noticed
that in OMPI v1.3 spec->peer is sometimes a negative value (OMPI v1.2.6
did not exhibit this behavior).  I added some boundary checks, but it
seems as if this is a bug?  I hope I'm not missing something...

The peruse test provided in the OMPI v1.3 source exhibits similar behavior:
mpirun -np 2 ./mpi_peruse | grep peer:-1

int callback(peruse_event_h event_h, MPI_Aint unique_id,
peruse_comm_spec_t *spec, void *param) {
if (spec->peer == rank) {
return MPI_SUCCESS;
}
rrCounts[spec->peer]++;
return MPI_SUCCESS;
}


Any insight is greatly appreciated.

Thanks,

Samuel K. Gutierrez