[OMPI users] --mca btl_openib_if_include

2008-10-16 Thread Mostyn Lewis

Hello,

Using today's SVN 1.4a1r19757

with
MCA='--mca btl_openib_verbose 1 --mca btl openib,self --mca btl_openib_if_include 
"mlx4_0:1,mlx4_1:1"'

ibstatus (OFED 1.3.1) gives:
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0003:ba00:0100:71a1
base lid:0x2f
sm lid:  0x1
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)

Infiniband device 'mlx4_0' port 2 status:
default gid: fe80::::0003:ba00:0100:71a2
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)

Infiniband device 'mlx4_1' port 1 status:
default gid: fe80::::0003:ba00:0100:70b9
base lid:0x30
sm lid:  0x1
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)

Infiniband device 'mlx4_1' port 2 status:
default gid: fe80::::0003:ba00:0100:70ba
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)

OpenMPI says for a:
mpirun --prefix 
/tools/openmpi/1.4a1r19757_svn/connectx/gcc64/4.1.2/openib/rh_EL_4/x86_64/xeon -x 
LD_LIBRARY_PATH --mca btl_openib_verbose 1 --mca btl openib,self --mca 
btl_openib_if_include "mlx4_0:1,mlx4_1:1" -np 4 -machinefile dhosts 
./IMB-MPI1.openmpi

--
WARNING: One or more nonexistent OpenFabrics devices/ports were
specified:

  Host: r4450_3
  MCA parameter:mca_btl_if_include
  Nonexistent entities: "mlx4_0:1,mlx4_1:1"

These entities will be ignored.  You can disable this warning by
setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
--

Scali 5.6 works in dual-rail mode with this as does mvapich2-1.2rc2.

What am I doing wrong, please?

DM


Re: [OMPI users] Problem launching onto Bourne shell

2008-10-16 Thread Mostyn Lewis

Jeff,

You broke my ksh (and I expect something else)
Today's SVN 1.4a1r19757
orte/mca/plm/rsh/plm_rsh_module.c
line 471:
tmp = opal_argv_split("( test ! -r ./.profile || . ./.profile;", ' ');
   ^
   ARGHH
No (
tmp = opal_argv_split(" test ! -r ./.profile || . ./.profile;", ' ');
and all is well again :)

Regards,
Mostyn

On Thu, 9 Oct 2008, Jeff Squyres wrote:

FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN branches. 
So I'll probably take down the hg tree (we use those as temporary branches).


On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:


Hi,

Thanks for providing a fix, sorry for the delay in response.  Once I found 
out about -x, I've been busy working on the rest of our code, so I haven't 
had the time to try out the fix.  I'll take a look at it soon as I can and 
will let you know how it works out.


Hahn

On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:


On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:


you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
possibly others, such as that LICENSE key, etc.) regardless of
whether it's an interactive or non-interactive login.


Right, that's exactly what I want to do.  I was hoping that mpirun
would run .profile as the FAQ page stated, but the -x fix works for
now.


If you're using Bash, it should be running .bashrc.  But it looks like
you did identify a bug that we're *not* running .profile.  I have a
Mercurial branch up with a fix if you want to give it a spin:

  http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/


I just realized that I'm using .bash_profile on the x86 and need to
move its contents into .bashrc and call .bashrc from .bash_profile,
since eventually I will also be launching MPI jobs onto other x86
processors.

Thanks to everyone for their help.

Hahn

On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:


On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:


Regarding 1., we're actually using 1.2.5.  We started using Open MPI
last winter and just stuck with it.  For now, using the -x flag with
mpirun works.  If this really is a bug in 1.2.7, then I think we'll
stick with 1.2.5 for now, then upgrade later when it's fixed.


It looks like this behavior has been the same throughout the entire
1.2 series.


Regarding 2., are you saying I should run the commands you suggest
from the x86 node running bash, so that ssh logs into the Cell node
running Bourne?


I'm saying that if "ssh othernode env" gives different answers than
"ssh othernode"/"env", then your .bashrc or .profile or whatever is
dumping out early depending on whether you have an interactive login
or not.  This is the real cause of the error -- you probably want to
set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
as that LICENSE key, etc.) regardless of whether it's an interactive
or non-interactive login.



When I run "ssh othernode env" from the x86 node, I get the
following vanilla environment:

USER=ha17646
HOME=/home/ha17646
LOGNAME=ha17646
SHELL=/bin/sh
PWD=/home/ha17646

When I run "ssh othernode" from the x86 node, then run "env" on the
Cell, I get the following:

USER=ha17646
LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
HOME=/home/ha17646
MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
LOGNAME=ha17646
TERM=xterm-color
PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
tools/cmake-2.4.7/bin:/tools
SHELL=/bin/sh
PWD=/home/ha17646
TZ=EST5EDT

Hahn

On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:


Ralph and I just talked about this a bit:

1. In all released versions of OMPI, we *do* source the .profile
file
on the target node if it exists (because vanilla Bourne shells do
not
source anything on remote nodes -- Bash does, though, per the FAQ).
However, looking in 1.2.7, it looks like it might not be executing
that code -- there *may* be a bug in this area.  We're checking
into it.

2. You might want to check your configuration to see if
your .bashrc
is dumping out early because it's a non-interactive shell.  Check
the
output of:

ssh othernode env
vs.
ssh othernode
env

(i.e., a non-interactive running of "env" vs. an interactive login
and
running "env")



On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:


I am unaware of anything in the code that would "source .profile"
for you. I believe the FAQ page is in error here.

Ralph

On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:


Great, that worked, thanks!  However, it still concerns me that
the
FAQ page says that mpirun will execute .profile which doesn't
seem
to work for me.  Are there any configuration issues that could
possibly be preventing mpirun from doing this?  It would
certainly
be more convenient if I could maintain my environment in a
single .profile file instead of adding what could potentially
be a
lot of -x arguments to my mpirun command.

Hahn

On Oct 6, 2008, at 5:44 PM, Aur?lien Bouteiller wrote:


tYou can forward your local env with mpirun -x
LD_LIBRARY_PAT

Re: [OMPI users] OpenMPI portability problems: debug info isn't helpful

2008-10-16 Thread Aleksej Saushev
Jeff Squyres  writes:

> On Oct 11, 2008, at 10:20 AM, Aleksej Saushev wrote:
>
>> $ ompi_info | grep oob
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.7)
>
> Good!
>
>>> $ mpirun --mca rml_base_debug 100 -np 2 skosfile
>> [asau.local:09060] mca: base: components_open: Looking for rml
>> components
>> [asau.local:09060] mca: base: components_open: distilling rml
>> components
>> [asau.local:09060] mca: base: components_open: accepting all
>> rml  components
>> [asau.local:09060] mca: base: components_open: opening rml components
>> [asau.local:09060] mca: base: components_open: found loaded
>> component oob
>> [asau.local:09060] mca: base: components_open: component oob
>> open  function successful
>> [asau.local:09060] orte_rml_base_select: initializing rml
>> component  oob
>> [asau.local:09060] orte_rml_base_select: init returned failure
>
> Ah ha -- this is progress.  For some reason, your "oob" RML
> plugin is  declining to run.  I see that its
> query/initialization function is  actually quite short:
>
> if(mca_oob_base_init() != ORTE_SUCCESS)
> return NULL;
> *priority = 1;
> return &orte_rml_oob_module;
>
> So it must be failing the mca_oob_base_init() function -- this
> is what  initializes the underling "OOB" (out of band)
> communications subsystem.
>
> Of course, this doesn't fail often, so we don't have any
> run-time  switches to enable the debugging output.  :-(  Edit
> orte/mca/oob/base/ oob_base_open.c line 43 and change the value
> of mca_oob_base_output  from -1 to 0.  Let's see that output --
> I'm particularly interested in  the output from querying the tcp
> oob component.  I suspect that it's  declining to run as well.
>
> I wonder if this is going to end up being an opal_if() issue --
> where  we are traversing all the IP network interfaces from the
> kernel...   I'll bet even money that it is.

[asau.local:04648] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=6
[asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init_stage1.c at line 182
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_rml_base_select failed
  --> Returned value -13 instead of ORTE_SUCCESS

--
[asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_system_init.c at line 42
[asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init.c at line 52
--
Open RTE was unable to initialize properly.  The error occured while
attempting to orte_init().  Returned value -13 instead of ORTE_SUCCESS.
--

Why don't you use strerror(3) to print errno value explanation?

>From :
#define ENXIO   6   /* Device not configured */

It seems that I have to debug network interface probing,
how should I use *_output subroutines so that they do print?
I tried these changes but in vain:

--- opal/util/if.c.orig 2008-08-25 23:16:50.0 +0400
+++ opal/util/if.c  2008-10-15 23:55:07.0 +0400
@@ -242,6 +242,8 @@
 if(ifr->ifr_addr.sa_family != AF_INET)
 continue;

+   opal_output(0, "opal_ifinit: checking netif %s", ifr->ifr_name);
+   /* HERE IT FAILS!! */
 if(ioctl(sd, SIOCGIFFLAGS, ifr) < 0) {
 opal_output(0, "opal_ifinit: ioctl(SIOCGIFFLAGS) failed with 
errno=%d", errno);
 continue;
--- opal/util/if.c.orig 2008-08-25 23:16:50.0 +0400
+++ opal/util/if.c  2008-10-15 23:55:07.0 +0400
@@ -242,6 +242,8 @@
 if(ifr->ifr_addr.sa_family != AF_INET)
 continue;

+   fprintf(stderr, "opal_ifinit: checking netif %s\n", ifr->ifr_name);
+   /* HERE IT FAILS!! */
 if(ioctl(sd, SIOCGIFFLAGS, ifr) < 0) {
 opal_output(0, "opal_ifinit: ioctl(SIOCGIFFLAGS) failed with 
errno=%d", errno);
 continue;
--- opal/util/output.c.orig 2008-08-25 23:16:50.0 +0400
+++ opal/util/output.c  2008-10-16 19:58:49.0 +0400
@@ -41,7 +41,7 @@
 /*
  * Private data
  */
-static int verbose_stream = -1;
+static int verbose_stream = 0;
 static opal_output_stream_t verbose;
 static char *output_dir = NULL;
 static char *output_prefix = NULL;

It seems a bit tricky, and it is scarcely documented.
Have I overlooked it?

What makes it strange, that fprintf(stderr,..) doen't work.

> Specifically: I predict that t

Re: [OMPI users] The --with-sge option

2008-10-16 Thread Mike Hanby
I did find the following in ompi_info:

 

MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7)

MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7)

 

However I see that in an ompi_info built without using the --with-sge
switch.

 

Also, since I'm building 1.2.8, shouldn't those versions after Component
reflect 1.2.8?

 

I set the PATH and LD_LIBRARY_PATH to point to the temp location of my
new build and it still reports 1.2.7.

 

Mike

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Mike Hanby
Sent: Thursday, October 16, 2008 11:07 AM
To: us...@open-mpi.org
Subject: [OMPI users] The --with-sge option

 

Howdy,

 

I'm compiling 1.2.8 on a system with SGE 6.1u4 and came across the
"--with-sge" option on a Grid Engine posting.

 

A couple questions:

1.  I don't see --with-sge mentioned in the "./configure --help" output,
nor can I find much reference to it on the open-mpi site, is this option
really implemented? What does it do?

2.  After compiling openmpi providing the --with-sge switch I ran the
ompi_info binary and grep'd for sge in the output, there isn't any
reference, should there be if the option was successfully passed to
configure?

 

Thanks, Mike



[OMPI users] The --with-sge option

2008-10-16 Thread Mike Hanby
Howdy,

 

I'm compiling 1.2.8 on a system with SGE 6.1u4 and came across the
"--with-sge" option on a Grid Engine posting.

 

A couple questions:

1.  I don't see --with-sge mentioned in the "./configure --help" output,
nor can I find much reference to it on the open-mpi site, is this option
really implemented? What does it do?

2.  After compiling openmpi providing the --with-sge switch I ran the
ompi_info binary and grep'd for sge in the output, there isn't any
reference, should there be if the option was successfully passed to
configure?

 

Thanks, Mike



[OMPI users] Errors compiling OpenMPI 1.2.8 with SUN Studio express (2008/07/10) in 32bit modus

2008-10-16 Thread Paul Kapinos

Hi all,

We tried to install OpenMPI 1.2.8 on Linux in a couple of versions here 
(compiler from intel, pgi, studio, gcc - all 64bit and 32bit).


If we used SUN Studio Express (2008/07/10) and configured to produce 
32bit library, we got following errors (full log see in file 
my_makelog_sun32.txt)


..
gmake[2]: Entering directory 
`/rwthfs/rz/cluster/home/pk224850/OpenMPI/openmpi-1.2.8_studio32/ompi/mca/btl/openib'
source='btl_openib_component.c' object='btl_openib_component.lo' 
libtool=yes \

DEPDIR=.deps depmode=none /bin/sh ../../../../config/depcomp \
	/bin/sh ../../../../libtool --tag=CC   --mode=compile cc 
-DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-DPKGDATADIR=\"/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio/share/openmpi\" 
-I../../../..-DNDEBUG -O2 -m32  -c -o btl_openib_component.lo 
btl_openib_component.c
libtool: compile:  cc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-DPKGDATADIR=\"/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio/share/openmpi\" 
-I../../../.. -DNDEBUG -O2 -m32 -c btl_openib_component.c  -KPIC -DPIC 
-o .libs/btl_openib_component.o
"../../../../opal/include/opal/sys/ia32/atomic.h", line 167: warning: 
impossible constraint for "%1" asm operand
"../../../../opal/include/opal/sys/ia32/atomic.h", line 167: warning: 
parameter in inline asm statement unused: %2
"../../../../opal/include/opal/sys/ia32/atomic.h", line 184: warning: 
impossible constraint for "%1" asm operand
"../../../../opal/include/opal/sys/ia32/atomic.h", line 184: warning: 
parameter in inline asm statement unused: %2
"/usr/include/infiniband/kern-abi.h", line 103: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 109: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 124: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 135: syntax error before or 
at: __u64

...


This seems for us to be an error on linux headers in file kern-abi.h 
which includes  linux/types.h which contains this:



#if defined(__GNUC__) && !defined(__STRICT_ANSI__)
typedef __u64   uint64_t;
typedef __u64   u_int64_t;
typedef __s64   int64_t;
#endif


So, it looks for us so, that by byilding of openmpi 1.2.8 the SUN Studio 
compiler cannot compile some Linux headers because of these are 
programmed in "GNU C" instead of ANSI C.


If so then this is an Linux issue and not OpenMPI's - but, if so, *why* 
did you not seen this problems during of release preparation? That is, 
maybe we have done some mistakes? Maybe the devel headers and/or static 
libs are the problem? (I will try to disable them, but we want to report 
this problem anyway).






We use Scientific Linux 5.1 which is an Red Hat Enterprice 5 Linux.

$ uname -a
Linux linuxhtc01.rz.RWTH-Aachen.DE 2.6.18-53.1.14.el5_lustre.1.6.5custom 
#1 SMP Wed Jun 25 12:17:09 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux



configured with:


 ./configure --enable-static --with-devel-headers CFLAGS="-O2 -m32" 
CXXFLAGS="-O2 -m32" FFLAGS="-O2 -m32" FCFLAGS="-O2 -m32" LDFLAGS="-m32" 
--prefix=/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio



Best regards,

Paul Kapinos
HPC Group
RZ RWTH Aachen











This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.2.8, which was
generated by GNU Autoconf 2.61.  Invocation command line was

  $ ./configure --enable-static --with-devel-headers CFLAGS=-O2 -m32 CXXFLAGS=-O2 -m32 FFLAGS=-O2 -m32 FCFLAGS=-O2 -m32 LDFLAGS=-m32 --prefix=/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio CC=cc CXX=CC FC=f95 --enable-ltdl-convenience --no-create --no-recursion

## - ##
## Platform. ##
## - ##

hostname = linuxhtc01.rz.RWTH-Aachen.DE
uname -m = x86_64
uname -r = 2.6.18-53.1.14.el5_lustre.1.6.5custom
uname -s = Linux
uname -v = #1 SMP Wed Jun 25 12:17:09 CEST 2008

/usr/bin/uname -p = x86_64
/bin/uname -X = unknown

/bin/arch  = x86_64
/usr/bin/arch -k   = x86_64
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo  = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /rwthfs/rz/SW/UTIL/StudioExpress20080724/SUNWspro/bin
PATH: /home/pk224850/bin
PATH: /usr/local_host/sbin
PATH: /usr/local_host/bin
PATH: /usr/local_rwth/sbin
PATH: /usr/local_rwth/bin
PATH: /usr/bin
PATH: /usr/sbin
PATH: /sbin
PATH: /usr/dt/bin
PATH: /usr/bin/X11
PATH: /usr/java/bin
PATH: /usr/local/bin
PATH: /usr/local/sbin
PATH: /opt/csw/bin
PATH: .


## --- ##
## Core tests. ##
## --- ##

configure:2986: checking for a BSD-compatible install
configure:3042: result: /usr/local_rwth/bin/ginstall -c
configure:3053: checking whether build environment is sane
configure:3096: result: yes
configure:3124: checking for

Re: [OMPI users] on SEEK_*

2008-10-16 Thread Rajeev Thakur
In the upcoming 1.0.8 release of MPICH2 (next week or so) we are fixing it
similar to Open MPI, so you shouldn't need to undef anything even in MPICH2.

Rajeev


> Date: Thu, 16 Oct 2008 12:29:01 +0200
> From: Jed Brown 
> Subject: [OMPI users] on SEEK_*
> To: us...@open-mpi.org
> Message-ID: <20081016102901.gg10...@brakk.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
> 
> I've just run into this chunk of code.
> 
> /* MPICH2 will fail if SEEK_* macros are defined
>  * because they are also C++ enums. Undefine them
>  * when including mpi.h and then redefine them
>  * for sanity.
>  */
> #  ifdef SEEK_SET
> #define MB_SEEK_SET SEEK_SET
> #define MB_SEEK_CUR SEEK_CUR
> #define MB_SEEK_END SEEK_END
> #undef SEEK_SET
> #undef SEEK_CUR
> #undef SEEK_END
> #  endif
> #include "mpi.h"
> #  ifdef MB_SEEK_SET
> #define SEEK_SET MB_SEEK_SET
> #define SEEK_CUR MB_SEEK_CUR
> #define SEEK_END MB_SEEK_END
> #undef MB_SEEK_SET
> #undef MB_SEEK_CUR
> #undef MB_SEEK_END
> #  endif
> 
> 
> MPICH2 (1.1.0a1) gives these errors if SEEK_* are present:
> 
> /opt/mpich2/include/mpicxx.h:26:2: error: #error "SEEK_SET is 
> #defined but must not be for the C++ binding of MPI"
> /opt/mpich2/include/mpicxx.h:30:2: error: #error "SEEK_CUR is 
> #defined but must not be for the C++ binding of MPI"
> /opt/mpich2/include/mpicxx.h:35:2: error: #error "SEEK_END is 
> #defined but must not be for the C++ binding of MPI"
> 
> but when SEEK_* is not present and iostream has been 
> included, OMPI-dev
> gives these errors.
> 
> /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:53: error: 
> ?SEEK_SET? was not declared in this scope
> /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:54: error: 
> ?SEEK_CUR? was not declared in this scope
> /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:55: error: 
> ?SEEK_END? was not declared in this scope
> 
> There is a subtle difference between OMPI 1.2.7 and -dev at least with
> GCC 4.3.2.  If iostream was included before mpi.h and then SEEK_* are
> #undef'd then 1.2.7 succeeds while -dev fails with the message above.
> If stdio.h is included and SEEK_* are #undef'd then both OMPI versions
> fail.  MPICH2 requires in both cases that SEEK_* be #undef'd.
> 
> What do you recommend to remain portable?  Is this really an MPICH2
> issue?  The standard doesn't seem to address this issue.  The 
> MPICH2 FAQ
> has this
> 
> http://www.mcs.anl.gov/research/projects/mpich2/support/index.
> php?s=faqs#cxxseek
> 
> 
> Jed
> -- next part --
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 197 bytes
> Desc: not available
> URL: 
>  16/96b11669/attachment.bin>
> 
> --
> 
> Message: 4
> Date: Thu, 16 Oct 2008 07:43:54 -0400
> From: Jeff Squyres 
> Subject: Re: [OMPI users] on SEEK_*
> To: Open MPI Users 
> Message-ID: 
> Content-Type: text/plain; charset=WINDOWS-1252; format=flowed;
>   delsp=yes
> 
> On Oct 16, 2008, at 6:29 AM, Jed Brown wrote:
> 
> > but when SEEK_* is not present and iostream has been 
> included, OMPI- 
> > dev
> > gives these errors.
> >
> > /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:53: error:  
> > ?SEEK_SET? was not declared in this scope
> > /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:54: error:  
> > ?SEEK_CUR? was not declared in this scope
> > /home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:55: error:  
> > ?SEEK_END? was not declared in this scope
> >
> > There is a subtle difference between OMPI 1.2.7 and -dev at 
> least with
> > GCC 4.3.2.  If iostream was included before mpi.h and then 
> SEEK_* are
> > #undef'd then 1.2.7 succeeds while -dev fails with the 
> message above.
> > If stdio.h is included and SEEK_* are #undef'd then both 
> OMPI versions
> > fail.  MPICH2 requires in both cases that SEEK_* be #undef'd.
> 
> Open MPI doesn't require undef'ing of anything.  It should also not  
> require any special ordering of include files.  Specifically, the  
> following codes both compile fine for me with 1.2.8 and the OMPI SVN  
> trunk (which is what I assume you mean by "-dev"?):
> 
> #include 
> #include 
> int a = MPI::SEEK_SET;
> 
> and
> 
> #include 
> #include 
> int a = MPI::SEEK_SET;
> 
> So in short: don't #undef anything and OMPI should do the 
> Right things.
> 
> > What do you recommend to remain portable?  Is this really an MPICH2
> > issue?  The standard doesn't seem to address this issue.  
> The MPICH2  
> > FAQ
> > has this
> >
> > 
> http://www.mcs.anl.gov/research/projects/mpich2/support/index.
> php?s=faqs#cxxseek
> 
> 
> This is actually a problem in the MPI-2 spec; the names  
> "MPI::SEEK_SET" (and friends) were unfortunately chosen poorly.   
> Hopefully that'll be fixed relatively soon, in MPI-2.2.
> 
> MPICH chose to handle this situation a different way than we 
> did, and  
> apparently requires that you either #undef s

Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown
On Thu 2008-10-16 08:21, Jeff Squyres wrote:
> FWIW: https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/20 is a  
> placemarker for discussion for the upcoming MPI Forum meeting (next  
> week).
>
> Also, be aware that OMPI's 1.2.7 solution isn't perfect, either.  You  
> can see from ticket 20 that it actually causes a problem if you try to  
> use SEEK_SET in a switch/case statement.  But we did this a little  
> better in the trunk/v1.3 (see 
> https://svn.open-mpi.org/trac/ompi/changeset/19494); this solution *does* 
> allow for SEEK_SET to be used in a case statement, but it does always 
> bring in  (probably not a huge deal).

I see.

> The real solution is that we're likely going to change these names to  
> something else in the MPI spec itself.  And/or drop the C++ bindings  
> altogether (see http://lists.mpi-forum.org/mpi-22/2008/10/0177.php).

Radical.  I don't use the C++ bindings anyway.  I especially like
proposal (4) Data in User-Defined Callbacks.

On a related note, it would be nice to be able to call an MPI_Op from
user code.  For instance, I have an irregular Reduce-like operation
where each proc needs to reduce data from a few other procs (much fewer
than the entire communicator).  I implement this using a few nonblocking
point-to-point calls followed by a local reduction.  I would like my
special reduction to accept an arbitrary MPI_Op, but I currently use a
function pointer.  Having a public version of ompi_op_reduce would make
this much cleaner.

> Additionally -- I should have pointed this out in my first mail -- you  
> can also just use MPI_SEEK_SET (and friends).  The spec defines that  
> these constants must have the same values as their MPI::SEEK_*  
> counterparts.

Right, MPI::SEEK_* is never used.

Thanks Jeff.

Jed


pgpH59WXzCO57.pgp
Description: PGP signature


Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jeff Squyres
FWIW: https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/20 is a  
placemarker for discussion for the upcoming MPI Forum meeting (next  
week).


Also, be aware that OMPI's 1.2.7 solution isn't perfect, either.  You  
can see from ticket 20 that it actually causes a problem if you try to  
use SEEK_SET in a switch/case statement.  But we did this a little  
better in the trunk/v1.3 (see https://svn.open-mpi.org/trac/ompi/changeset/19494) 
; this solution *does* allow for SEEK_SET to be used in a case  
statement, but it does always bring in  (probably not a huge  
deal).


The real solution is that we're likely going to change these names to  
something else in the MPI spec itself.  And/or drop the C++ bindings  
altogether (see http://lists.mpi-forum.org/mpi-22/2008/10/0177.php).


Additionally -- I should have pointed this out in my first mail -- you  
can also just use MPI_SEEK_SET (and friends).  The spec defines that  
these constants must have the same values as their MPI::SEEK_*  
counterparts.



On Oct 16, 2008, at 7:57 AM, Jed Brown wrote:


On Thu 2008-10-16 07:43, Jeff Squyres wrote:

On Oct 16, 2008, at 6:29 AM, Jed Brown wrote:

Open MPI doesn't require undef'ing of anything.  It should also not
require any special ordering of include files.  Specifically, the
following codes both compile fine for me with 1.2.8 and the OMPI SVN
trunk (which is what I assume you mean by "-dev"?):


That's what I meant.  This, works with 1.2.7 but not with -dev:

#include 
#undef SEEK_SET
#undef SEEK_CUR
#undef SEEK_END
#include 

If iostream is replaced by stdio, then both fail.

This is actually a problem in the MPI-2 spec; the names  
"MPI::SEEK_SET"

(and friends) were unfortunately chosen poorly.  Hopefully that'll be
fixed relatively soon, in MPI-2.2.


It wasn't addressed in the MPI-2.1 spec I was reading, hence my
confusion.  When namespaces and macros don't play well.


MPICH chose to handle this situation a different way than we did, and
apparently requires that you either #undef something or you #define  
an
MPICH-specific macro.  I guess the portable way might be to just  
always

define that MPICH-specific macro.  It should be harmless for OMPI.


I'll go with this, thanks.

FWIW, I was chatting with the MPICH developers at the recent MPI  
Forum

meeting and showed them how we did our SEEK_* solution in Open MPI.


Certainly the OMPI solution is better for users.

Jed
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OPENMPI 1.2.7 & PGI compilers: configure option --disable-ptmalloc2-opt-sbrk

2008-10-16 Thread Francesco Iannone
Hi Jeff
I used the configure option:

--enable-ptmalloc2-opt-sbrk

To solve a segmentation fault in memory allocation with openmpi.1.2.x and
PGI 7.1-4 and 7.2.

I have a simple source code (Callocrash.c) as example of this (see belowe).

Could you test this code on a node with 8 Gbyte of RAM and RedHat enterprise
4+ openmpi 1.2.x, PGI 7.1-4.

I compiled it with:

 pgcc -o Callocrash Callocreash.c   (it's ok)
 gnu4 -o Callocrash Callocreash.c   (it's ok)
 mpicc -o Callocrash Callocreash.c   (Segmentation fault in sysMALLOC when
it has to allocate 622947588 bytes)

However thanks in advance

greetings


Callocrash.c


#include 
#include 

int main( int argc, char *argv[])
{
/*
 *  memory allocations simulation for ~50M nonzeros:
 *  nd=180 md=350 mdy=420
 *
 *  if this program crashes, there is a compiler problem
 */
printf("memory allocations simulation for ~50M nonzeros:  nd=180
md=350 mdy=420\n");
printf("if this program crashes, there check your
compiler/environment configuration\n");

printf("sizeof(int)%d\n",sizeof(int));
printf("sizeof(int*)   %d\n",sizeof(int*));
printf("sizeof(size_t) %d\n",sizeof(size_t));

if( sizeof(size_t)<8 || sizeof(int*)<8 )
{
printf("please compile this program for a 64 bit
environment!\n");
return -1;
}

int *p;

printf("allocation 1/4..\n");
p = calloc(47109185,16);
if(!p)printf("..failed.\n");
printf("allocation 2/4..\n");
p = calloc(47109185,4);
if(!p)printf("..failed.\n");
printf("allocation 3/4..\n");
p = calloc(47109185,4);
if(!p)printf("..failed.\n");
printf("allocation 4/4..\n");

p = calloc(622947588,16);
if(!p)printf("..failed.\n");
if(!p) return -1;

printf("allocations test passed (no crash)\n");
return 0;
}


On 15/10/08 19:42, "Jeff Squyres"  wrote:

> On Oct 15, 2008, at 9:35 AM, Francesco Iannone wrote:
> 
>> I have a cluster of 16 nodes DualCPU DualCore AMD  RAM 16 GB with
>> InfiniBand
>> CISCO HCA and switch InfiniBand.
>> It uses Linux RH Enterprise 4  64 bit , OpenMPI 1.2.7, PGI 7.1-4 and
>> openib-1.2-7.
>> 
>> Hence it means that the option ‹disable-ptmalloc2 is catastrophic in
>> the
>> above configuration.
> 
> Actually, I notice that in your original message, you said "--disable-
> ptmalloc2-opt-sbrk", but here you said "--disable-ptmalloc2".  The
> former is:
> 
>Only trigger callbacks when sbrk is used
> for small
>allocations, rather than every call to
> malloc/free.
>(default: enabled)
> 
> So it should be fine to disable; it shouldn't affect overall MPI
> performance too much.
> 
> The latter disables ptmalloc2 entirely (and you'll likely get lower
> benchmark bandwidth for large messages).
> 
> I'm unaware of either of these options leading to problems with the
> PGI compiler suite; I have tested OMPI v1.2.x with several versions of
> the PGI compiler without problem (although my latest version is PGI
> 7.1-4).

Dr. Francesco Iannone
Associazione EURATOM-ENEA sulla Fusione
C.R. ENEA Frascati
Via E. Fermi 45
00044 Frascati (Roma) Italy
phone 00-39-06-9400-5124
fax 00-39-06-9400-5524
mailto:francesco.iann...@frascati.enea.it
http://www.afs.enea.it/iannone





Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown
On Thu 2008-10-16 07:43, Jeff Squyres wrote:
> On Oct 16, 2008, at 6:29 AM, Jed Brown wrote:
>
> Open MPI doesn't require undef'ing of anything.  It should also not  
> require any special ordering of include files.  Specifically, the  
> following codes both compile fine for me with 1.2.8 and the OMPI SVN  
> trunk (which is what I assume you mean by "-dev"?):

That's what I meant.  This, works with 1.2.7 but not with -dev:

#include 
#undef SEEK_SET
#undef SEEK_CUR
#undef SEEK_END
#include 

If iostream is replaced by stdio, then both fail.

> This is actually a problem in the MPI-2 spec; the names "MPI::SEEK_SET" 
> (and friends) were unfortunately chosen poorly.  Hopefully that'll be 
> fixed relatively soon, in MPI-2.2.

It wasn't addressed in the MPI-2.1 spec I was reading, hence my
confusion.  When namespaces and macros don't play well.

> MPICH chose to handle this situation a different way than we did, and  
> apparently requires that you either #undef something or you #define an  
> MPICH-specific macro.  I guess the portable way might be to just always 
> define that MPICH-specific macro.  It should be harmless for OMPI.

I'll go with this, thanks.

> FWIW, I was chatting with the MPICH developers at the recent MPI Forum  
> meeting and showed them how we did our SEEK_* solution in Open MPI.

Certainly the OMPI solution is better for users.

Jed


pgpnUCoTagZ3S.pgp
Description: PGP signature


Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jeff Squyres

On Oct 16, 2008, at 6:29 AM, Jed Brown wrote:

but when SEEK_* is not present and iostream has been included, OMPI- 
dev

gives these errors.

/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:53: error:  
‘SEEK_SET’ was not declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:54: error:  
‘SEEK_CUR’ was not declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:55: error:  
‘SEEK_END’ was not declared in this scope


There is a subtle difference between OMPI 1.2.7 and -dev at least with
GCC 4.3.2.  If iostream was included before mpi.h and then SEEK_* are
#undef'd then 1.2.7 succeeds while -dev fails with the message above.
If stdio.h is included and SEEK_* are #undef'd then both OMPI versions
fail.  MPICH2 requires in both cases that SEEK_* be #undef'd.


Open MPI doesn't require undef'ing of anything.  It should also not  
require any special ordering of include files.  Specifically, the  
following codes both compile fine for me with 1.2.8 and the OMPI SVN  
trunk (which is what I assume you mean by "-dev"?):


#include 
#include 
int a = MPI::SEEK_SET;

and

#include 
#include 
int a = MPI::SEEK_SET;

So in short: don't #undef anything and OMPI should do the Right things.


What do you recommend to remain portable?  Is this really an MPICH2
issue?  The standard doesn't seem to address this issue.  The MPICH2  
FAQ

has this

http://www.mcs.anl.gov/research/projects/mpich2/support/index.php?s=faqs#cxxseek



This is actually a problem in the MPI-2 spec; the names  
"MPI::SEEK_SET" (and friends) were unfortunately chosen poorly.   
Hopefully that'll be fixed relatively soon, in MPI-2.2.


MPICH chose to handle this situation a different way than we did, and  
apparently requires that you either #undef something or you #define an  
MPICH-specific macro.  I guess the portable way might be to just  
always define that MPICH-specific macro.  It should be harmless for  
OMPI.


FWIW, I was chatting with the MPICH developers at the recent MPI Forum  
meeting and showed them how we did our SEEK_* solution in Open MPI.


--
Jeff Squyres
Cisco Systems




[OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown
I've just run into this chunk of code.

/* MPICH2 will fail if SEEK_* macros are defined
 * because they are also C++ enums. Undefine them
 * when including mpi.h and then redefine them
 * for sanity.
 */
#  ifdef SEEK_SET
#define MB_SEEK_SET SEEK_SET
#define MB_SEEK_CUR SEEK_CUR
#define MB_SEEK_END SEEK_END
#undef SEEK_SET
#undef SEEK_CUR
#undef SEEK_END
#  endif
#include "mpi.h"
#  ifdef MB_SEEK_SET
#define SEEK_SET MB_SEEK_SET
#define SEEK_CUR MB_SEEK_CUR
#define SEEK_END MB_SEEK_END
#undef MB_SEEK_SET
#undef MB_SEEK_CUR
#undef MB_SEEK_END
#  endif


MPICH2 (1.1.0a1) gives these errors if SEEK_* are present:

/opt/mpich2/include/mpicxx.h:26:2: error: #error "SEEK_SET is #defined but must 
not be for the C++ binding of MPI"
/opt/mpich2/include/mpicxx.h:30:2: error: #error "SEEK_CUR is #defined but must 
not be for the C++ binding of MPI"
/opt/mpich2/include/mpicxx.h:35:2: error: #error "SEEK_END is #defined but must 
not be for the C++ binding of MPI"

but when SEEK_* is not present and iostream has been included, OMPI-dev
gives these errors.

/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:53: error: ‘SEEK_SET’ was not 
declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:54: error: ‘SEEK_CUR’ was not 
declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:55: error: ‘SEEK_END’ was not 
declared in this scope

There is a subtle difference between OMPI 1.2.7 and -dev at least with
GCC 4.3.2.  If iostream was included before mpi.h and then SEEK_* are
#undef'd then 1.2.7 succeeds while -dev fails with the message above.
If stdio.h is included and SEEK_* are #undef'd then both OMPI versions
fail.  MPICH2 requires in both cases that SEEK_* be #undef'd.

What do you recommend to remain portable?  Is this really an MPICH2
issue?  The standard doesn't seem to address this issue.  The MPICH2 FAQ
has this

http://www.mcs.anl.gov/research/projects/mpich2/support/index.php?s=faqs#cxxseek


Jed


pgpDbo1XASXHc.pgp
Description: PGP signature