[OMPI users] Is this an OpenMPI bug?

2009-02-20 Thread -Gim
I am trying to use the mpi_bcast function in fortran.  I am using
open-mpi-v-1.2.7

Say x is a real variable of size 100. np =100  I try to bcast this to all
the processors.

I use call mpi_bcast(x,np,mpi_real,0,ierr)

When I do this and try to print the value from the resultant processor,
exactly half the values gets broadcast.  In this case, I get 50 correct
values in the resultant processor and rest are junk.  Same happened when i
tried with np=20.. Exactly 10 values gets populated and rest are junk.!!

ps: I am running this in a single processor. ( Just testing purposes ) I run
this with "mpirun -np 4  "

Cheerio,
Gim


Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers

2009-02-20 Thread Jeff Squyres

Does applying the following patch fix the problem?

Index: ompi/datatype/dt_args.c
===
--- ompi/datatype/dt_args.c (revision 20616)
+++ ompi/datatype/dt_args.c (working copy)
@@ -18,6 +18,9 @@
  */

 #include "ompi_config.h"
+
+#include 
+
 #include "opal/util/arch.h"
 #include "opal/include/opal/align.h"
 #include "ompi/constants.h"



On Feb 20, 2009, at 4:33 PM, Tamara Rogers wrote:


Jeff:
See attached.I'm using the 9.0 version of the intel compilers.  
Interestngly I have no problems on a 32bit intel machine using these  
same compilers. There only seems to be a problem on the 64bit machine.


--- On Fri, 2/20/09, Jeff Squyres  wrote:
From: Jeff Squyres 
Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit  
machine with intel compilers

To: "Open MPI Users" 
Date: Friday, February 20, 2009, 8:37 AM

Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by
configure; I want to see what was put into your mpi.h)

Finally, what version of icc are you using?  I test regularly with  
icc 9.0,
9.1, 10.0, and 10.1 with no problems.  Are you using newer or  
older?  (I

don't have immediate access to 11.x or 8.x)


On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote:

> Can you send your config.log as well?
>
> It looks like you forgot to specify FC=ifort on your configure  
line (i.e.,
you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for  
the Fortran

90 compiler -- this is an Autoconf thing; we didn't make it up).
>
> That shouldn't be the problem here, but I thought I'd mention it.
>
>
> On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote:
>
>>
>> Jeff:
>> You're correct. That was the incorrect config file. I've
attached the correct one as per the recommendations in the help page.
>>
>> Thanks for your help
>>
>> --- On Thu, 2/19/09, Jeff Squyres  wrote:
>> From: Jeff Squyres 
>> Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit
machine with intel compilers
>> To: talmesh...@yahoo.com, "Open MPI Users"

>> Date: Thursday, February 19, 2009, 8:32 AM
>>
>> Your config.log looks incomplete -- it failed saying that your C  
and

C++
>> compilers were incompatible with each other.
>>
>> This does not seem related to what you described -- are you sure
you're
>> sending the right config.log?
>>
>> Specifically, can you send all the information listed here:
>>
>>http://www.open-mpi.org/community/help/
>>
>>
>> On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote:
>>
>> > Hello all:
>> > I was unable to compile the latest version (1.3) on my intel
64bit system
>> with the intel compilers (version 9.0). Configuration goes fine,  
but I

get this
>> error when running make:
>> >
>> > ../../ompi/include/mpi.h(203): error: identifier
"ptrdiff_t" is
>> undefined
>> >  typedef OMPI_PTRDIFF_TYPE MPI_Aint;
>> >
>> > compilation aborted for dt_args.c (cod 21)
>> >
>> > My config line was:
>> > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx
>> >
>> > I've attached my config.log file. Has anyone encourtered
this? I was
>> able to build openmpi on this exact system using the gcc/g++
compilers, however
>> the intel compilers are substantially faster on our system.
>> >
>> > Thanks!
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --Jeff Squyres
>> Cisco Systems
>>
>>
>>
>>
< 
openmp 
-1.3_output.tar.gz>___

>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

< 
openmpi 
-1.3_64_output.tar.gz>___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI 1.3.1 rpm build error

2009-02-20 Thread Jeff Squyres

There won't be an official SRPM until 1.3.1 is released.

But to test if 1.3.1 is on-track to deliver a proper solution to you,  
can you try a nightly tarball, perhaps in conjunction with our  
"buildrpm.sh" script?


https://svn.open-mpi.org/source/xref/ompi_1.3/contrib/dist/linux/buildrpm.sh

It should build a trivial SRPM for you from the tarball.  You'll  
likely need to get the specfile, too, and put it in the same dir as  
buildrpm.sh.  The specfile is in the same SVN directory:



https://svn.open-mpi.org/source/xref/ompi_1.3/contrib/dist/linux/openmpi.spec



On Feb 20, 2009, at 3:51 PM, Jim Kusznir wrote:


As long as I can still build the rpm for it and install it via rpm.
I'm running it on a ROCKS cluster, so it needs to be an RPM to get
pushed out to the compute nodes.

--Jim

On Fri, Feb 20, 2009 at 11:30 AM, Jeff Squyres   
wrote:

On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote:


I just went to www.open-mpi.org, went to download, then source rpm.
Looks like it was actually 1.3-1.  Here's the src.rpm that I pulled
in:

http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm


Ah, gotcha.  Yes, that's 1.3.0, SRPM version 1.  We didn't make up  
this

nomenclature.  :-(

The reason for this upgrade is it seems a user found some bug that  
may

be in the OpenMPI code that results in occasionally an MPI_Send()
message getting lost.  He's managed to reproduce it multiple times,
and we can't find anything in his code that can cause it...He's got
logs of mpi_send() going out, but the matching mpi_receive() never
getting anything, thus killing his code.  We're currently running
1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet).


Ok.  1.3.x is much mo' betta' then 1.2 in many ways.  We could  
probably help
track down the problem, but if you're willing to upgrade to 1.3.x,  
it'll

hopefully just make the problem go away.

Can you try a 1.3.1 nightly tarball?

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers

2009-02-20 Thread Tamara Rogers
Jeff:
See attached.I'm using the 9.0 version of the intel compilers. Interestngly I 
have no problems on a 32bit intel machine using these same compilers. There 
only seems to be a problem on the 64bit machine.

--- On Fri, 2/20/09, Jeff Squyres  wrote:

From: Jeff Squyres 
Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with 
intel compilers
To: "Open MPI Users" 
List-Post: users@lists.open-mpi.org
Date: Friday, February 20, 2009, 8:37 AM

Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by
configure; I want to see what was put into your mpi.h)

Finally, what version of icc are you using?  I test regularly with icc 9.0,
9.1, 10.0, and 10.1 with no problems.  Are you using newer or older?  (I
don't have immediate access to 11.x or 8.x)


On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote:

> Can you send your config.log as well?
> 
> It looks like you forgot to specify FC=ifort on your configure line (i.e.,
you need to specify F77=ifort for the Fortran 77 *and* FC=ifort for the Fortran
90 compiler -- this is an Autoconf thing; we didn't make it up).
> 
> That shouldn't be the problem here, but I thought I'd mention it.
> 
> 
> On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote:
> 
>> 
>> Jeff:
>> You're correct. That was the incorrect config file. I've
attached the correct one as per the recommendations in the help page.
>> 
>> Thanks for your help
>> 
>> --- On Thu, 2/19/09, Jeff Squyres  wrote:
>> From: Jeff Squyres 
>> Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit
machine with intel compilers
>> To: talmesh...@yahoo.com, "Open MPI Users"

>> Date: Thursday, February 19, 2009, 8:32 AM
>> 
>> Your config.log looks incomplete -- it failed saying that your C and
C++
>> compilers were incompatible with each other.
>> 
>> This does not seem related to what you described -- are you sure
you're
>> sending the right config.log?
>> 
>> Specifically, can you send all the information listed here:
>> 
>>http://www.open-mpi.org/community/help/
>> 
>> 
>> On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote:
>> 
>> > Hello all:
>> > I was unable to compile the latest version (1.3) on my intel
64bit system
>> with the intel compilers (version 9.0). Configuration goes fine, but I
get this
>> error when running make:
>> >
>> > ../../ompi/include/mpi.h(203): error: identifier
"ptrdiff_t" is
>> undefined
>> >  typedef OMPI_PTRDIFF_TYPE MPI_Aint;
>> >
>> > compilation aborted for dt_args.c (cod 21)
>> >
>> > My config line was:
>> > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx
>> >
>> > I've attached my config.log file. Has anyone encourtered
this? I was
>> able to build openmpi on this exact system using the gcc/g++
compilers, however
>> the intel compilers are substantially faster on our system.
>> >
>> > Thanks!
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> --Jeff Squyres
>> Cisco Systems
>> 
>> 
>> 
>>
___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





openmpi-1.3_64_output.tar.gz
Description: application/gzip-compressed


Re: [OMPI users] OpenMPI 1.3.1 rpm build error

2009-02-20 Thread Jim Kusznir
As long as I can still build the rpm for it and install it via rpm.
I'm running it on a ROCKS cluster, so it needs to be an RPM to get
pushed out to the compute nodes.

--Jim

On Fri, Feb 20, 2009 at 11:30 AM, Jeff Squyres  wrote:
> On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote:
>
>> I just went to www.open-mpi.org, went to download, then source rpm.
>> Looks like it was actually 1.3-1.  Here's the src.rpm that I pulled
>> in:
>>
>> http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm
>
> Ah, gotcha.  Yes, that's 1.3.0, SRPM version 1.  We didn't make up this
> nomenclature.  :-(
>
>> The reason for this upgrade is it seems a user found some bug that may
>> be in the OpenMPI code that results in occasionally an MPI_Send()
>> message getting lost.  He's managed to reproduce it multiple times,
>> and we can't find anything in his code that can cause it...He's got
>> logs of mpi_send() going out, but the matching mpi_receive() never
>> getting anything, thus killing his code.  We're currently running
>> 1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet).
>
> Ok.  1.3.x is much mo' betta' then 1.2 in many ways.  We could probably help
> track down the problem, but if you're willing to upgrade to 1.3.x, it'll
> hopefully just make the problem go away.
>
> Can you try a 1.3.1 nightly tarball?
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] OpenMPI 1.3.1 rpm build error

2009-02-20 Thread Jeff Squyres

On Feb 20, 2009, at 2:20 PM, Jim Kusznir wrote:


I just went to www.open-mpi.org, went to download, then source rpm.
Looks like it was actually 1.3-1.  Here's the src.rpm that I pulled
in:

http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm


Ah, gotcha.  Yes, that's 1.3.0, SRPM version 1.  We didn't make up  
this nomenclature.  :-(



The reason for this upgrade is it seems a user found some bug that may
be in the OpenMPI code that results in occasionally an MPI_Send()
message getting lost.  He's managed to reproduce it multiple times,
and we can't find anything in his code that can cause it...He's got
logs of mpi_send() going out, but the matching mpi_receive() never
getting anything, thus killing his code.  We're currently running
1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet).


Ok.  1.3.x is much mo' betta' then 1.2 in many ways.  We could  
probably help track down the problem, but if you're willing to upgrade  
to 1.3.x, it'll hopefully just make the problem go away.


Can you try a 1.3.1 nightly tarball?

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI 1.3.1 rpm build error

2009-02-20 Thread Jim Kusznir
I just went to www.open-mpi.org, went to download, then source rpm.
Looks like it was actually 1.3-1.  Here's the src.rpm that I pulled
in:

http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3-1.src.rpm

The reason for this upgrade is it seems a user found some bug that may
be in the OpenMPI code that results in occasionally an MPI_Send()
message getting lost.  He's managed to reproduce it multiple times,
and we can't find anything in his code that can cause it...He's got
logs of mpi_send() going out, but the matching mpi_receive() never
getting anything, thus killing his code.  We're currently running
1.2.8 with ofed support (Haven't tried turning off ofed, etc. yet).


--Jim

On Thu, Feb 19, 2009 at 6:46 PM, Jeff Squyres  wrote:
> There is no 1.3.1 RPM yet (only a 1.3 RPM) -- what file specifically are you
> trying to build?
>
> Could you try building one of the 1.3.1 nightly snapshot tarballs?  I
> *think* the problem you're seeing is a problem due to FORTIFY_SOURCE in the
> VT code in 1.3 and should be fixed by now.
>
>http://www.open-mpi.org/nightly/v1.3/
>
>
> On Feb 19, 2009, at 12:00 PM, Jim Kusznir wrote:
>
>> Hi all:
>>
>> I'm trying to build openmpi RPMs from the included spec file.  The
>> build fails with:
>>
>> gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib
>> -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE
>> -DBINDIR=\"/opt/openmpi-gcc/1.3/bin\"
>> -DDATADIR=\"/opt/openmpi-gcc/1.3/share\" -DRFG -DVT_BFD -DVT_MEMHOOK
>> -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT
>> vt_iowrap.o -MD -MP -MF .deps/vt_iowrap.Tpo -c -o vt_iowrap.o
>> vt_iowrap.c
>> vt_iowrap.c:1242: error: expected declaration specifiers or '...'
>> before numeric constant
>> vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk'
>> make[5]: *** [vt_iowrap.o] Error 1
>>
>>
>> My build command was:
>> rpmbuild -bb --define 'install_in_opt 1' --define 'install_modulefile
>> 1' --define 'modules_rpm_name environment-modules' --define
>> 'build_all_in_one_rpm 0' --define 'configure_options
>> --with-tm=/opt/torque --with-openib=/opt/mlnx-ofed/src/OFED-1.3.1'
>> --define '_name openmpi-gcc' openmpi-1.3.spec
>>
>> This build for the 1.2.8 worked fine; this is my first attempt at
>> building 1.3.1.
>> The system is Rocks 5.1 (CentSO 5.2), GCC 4.1.2-42 (CentOS 5.2 default).
>>
>> Any suggestions?
>>
>> Thanks!
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] lammps MD code fails with Open MPI 1.3

2009-02-20 Thread Jeff Squyres

On Feb 20, 2009, at 10:08 AM, Jeff Pummill wrote:

It's probably not the same issue as this is one of the very few  
codes that I maintain which is C++ and not fortran :-(


Ok.  Note that the error Nysal pointed out was a problem with our  
handling of stdin.  That might be an issue as well; should be fixed in  
any recent 1.3.1 nightly snapshot.


It behaved similarly on another system when I built it against a new  
version (1.0??) of MVAPICH. I had to roll back a version from that  
as well.


I may contact the lammps people and see if they know what's going on  
as well.


Gotcha.

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] WRF, OpenMPI and PGI 7.2

2009-02-20 Thread Ralph Castain
Note that (beginning with 1.3) you can also use "platform files" to  
save configure and default mca params so that you build consistently.  
Check the examples in contrib/platform. Most of us developers use  
these religiously, as do our host organizations, for precisely this  
reason.


I believe there should be something on the FAQ about platform files -  
if not, I'll try to add it in the next few days.


If you want to contribute platform files to support some community  
with similar configurations, please send them to me - we shouldn't  
need a contributors agreement for them as there is no code involved.


Ralph


On Feb 20, 2009, at 8:23 AM, Gus Correa wrote:


Hi Gerry

I usually put configure commands (and environment variables)
on little shell scripts, which I edit to fit the combination
of hardware/compiler(s), and keep them in the build directory.
Otherwise I would forget the details next time I need to build.

If Myrinet and GigE are on separate clusters,
you'll have to install OpenMPI on each one, sorry.
However, if Myrinet and GigE are available on the same cluster,
you can build a single OpenMPI,
and choose the "byte transport layer (BTL)"
to be Myrinet or GigE (or IB, for that matter),
and even the NICs/networks to use,
on your job submission script.

Check the OpenMPI FAQ:
http://www.open-mpi.org/faq/?category=myrinet#myri-btl-gm
http://www.open-mpi.org/faq/?category=tcp#tcp-btl
http://www.open-mpi.org/faq/?category=openfabrics#ib-btl
http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
http://www.open-mpi.org/faq/?category=tcp#tcp-selection


Gus Correa


PS - BTW - Our old non-Rocks cluster has Myrinet-2000 (GM).
After I get the new cluster up and running and in production,
I am thinking of revamping the old cluster, and install Rocks on it.
I would love to learn from your experience with your
Rocks+Myrinet cluster, if you have the time to post a short
"bullet list" of "do's and don'ts".
(The Rocks list may be more appropriate than the OpenMPI for this.)

Last I checked Myrinet had a roll only for Rocks 5.0,
not 5.1, right?
Did you install it with on top of Rocks 5.0 or 5.1?
(For instance, my recollection of old postings on the list,
is that the Torque 5.0 roll worked with Rocks 5.1,
but it is always a risky business to mix different releases.)

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Gerry Creager wrote:

Gus,
I'll give that a try real quick (or as quickly as the compiles can  
run.
I'd not thought of this solution.  I've been context-switching too  
much lately.  I've gotta look at this for a gigabit cluster as well.

Thanks!
Gus Correa wrote:

Hi Gerry

You may need to compile a hybrid OpenMPI
using gcc for C, PGI f90 for Fortran on the OpenMPI configure  
script.

This should give you the required mpicc and mpif90 to do the job.
I guess this is what Elvedin meant on his message.

I have these hybrids for OpenMPI and MPICH2 here
(not Myrinet but GigE), and they work
fine with a WRF relative (CAM3, atmospheric climate).

Two cents from
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Gerry Creager wrote:

Elvedin,

Yeah, I thought about that after finding a reference to this in  
the archives, so I redirected the path to MPI toward the gnu- 
compiled version.  It died in THIS manner:
make[3]: Entering directory `/home/gerry/WRFv3/WRFV3/external/ 
RSL_LITE'

mpicc -cc=gcc -DFSEEKO64_OK  -w -O3 -DDM_PARALLEL   -c c_code.c
pgcc-Error-Unknown switch: -cc=gcc
make[3]: [c_code.o] Error 1 (ignored)

Methinks the wrf configuration script and make file will need  
some tweeks.


Interesting thing: I have another system (alas, with mpich) where  
it compiles just fine.  I'm trying to sort this out, as on 2  
systems, with openMPI, it does odd dances before dying.


I'm still trying things.  I've gotta get this up both for MY  
research and to support other users.


Thanks, Gerry

Elvedin Trnjanin wrote:
WRF almost requires that you use gcc for the C/C++ part and the  
PGI Fortran compilers, if you choose that option. I'd suggest  
compiling OpenMPI in the same way as that has resolved our  
various issues. Have you tried that with the same result?


Gerry Creager wrote:

Howdy,

I'm new to this list.  I've done a little review but likely  
missed something specific to what I'm asking.  I'll keep  
looking but need to resolve this soon.


I'm running a Rocks cluster (centos 5), with PGI 7.2-3  
compilers, Myricom MX2 hardware and drivers, and OpenMPI1.3


I installed the Myricom roll which has OpenMPI compiled with  
gcc.  I recently compiled the openmpi code w/ 

Re: [OMPI users] WRF, OpenMPI and PGI 7.2

2009-02-20 Thread Gus Correa

Hi Gerry

I usually put configure commands (and environment variables)
on little shell scripts, which I edit to fit the combination
of hardware/compiler(s), and keep them in the build directory.
Otherwise I would forget the details next time I need to build.

If Myrinet and GigE are on separate clusters,
you'll have to install OpenMPI on each one, sorry.
However, if Myrinet and GigE are available on the same cluster,
you can build a single OpenMPI,
and choose the "byte transport layer (BTL)"
to be Myrinet or GigE (or IB, for that matter),
and even the NICs/networks to use,
on your job submission script.

Check the OpenMPI FAQ:
http://www.open-mpi.org/faq/?category=myrinet#myri-btl-gm
http://www.open-mpi.org/faq/?category=tcp#tcp-btl
http://www.open-mpi.org/faq/?category=openfabrics#ib-btl
http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
http://www.open-mpi.org/faq/?category=tcp#tcp-selection


Gus Correa


PS - BTW - Our old non-Rocks cluster has Myrinet-2000 (GM).
After I get the new cluster up and running and in production,
I am thinking of revamping the old cluster, and install Rocks on it.
I would love to learn from your experience with your
Rocks+Myrinet cluster, if you have the time to post a short
"bullet list" of "do's and don'ts".
(The Rocks list may be more appropriate than the OpenMPI for this.)

Last I checked Myrinet had a roll only for Rocks 5.0,
not 5.1, right?
Did you install it with on top of Rocks 5.0 or 5.1?
(For instance, my recollection of old postings on the list,
is that the Torque 5.0 roll worked with Rocks 5.1,
but it is always a risky business to mix different releases.)

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Gerry Creager wrote:

Gus,

I'll give that a try real quick (or as quickly as the compiles can run.

I'd not thought of this solution.  I've been context-switching too much 
lately.  I've gotta look at this for a gigabit cluster as well.


Thanks!

Gus Correa wrote:

Hi Gerry

You may need to compile a hybrid OpenMPI
using gcc for C, PGI f90 for Fortran on the OpenMPI configure script.
This should give you the required mpicc and mpif90 to do the job.
I guess this is what Elvedin meant on his message.

I have these hybrids for OpenMPI and MPICH2 here
(not Myrinet but GigE), and they work
fine with a WRF relative (CAM3, atmospheric climate).

Two cents from
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Gerry Creager wrote:

Elvedin,

Yeah, I thought about that after finding a reference to this in the 
archives, so I redirected the path to MPI toward the gnu-compiled 
version.  It died in THIS manner:

make[3]: Entering directory `/home/gerry/WRFv3/WRFV3/external/RSL_LITE'
mpicc -cc=gcc -DFSEEKO64_OK  -w -O3 -DDM_PARALLEL   -c c_code.c
pgcc-Error-Unknown switch: -cc=gcc
make[3]: [c_code.o] Error 1 (ignored)

Methinks the wrf configuration script and make file will need some 
tweeks.


Interesting thing: I have another system (alas, with mpich) where it 
compiles just fine.  I'm trying to sort this out, as on 2 systems, 
with openMPI, it does odd dances before dying.


I'm still trying things.  I've gotta get this up both for MY research 
and to support other users.


Thanks, Gerry

Elvedin Trnjanin wrote:
WRF almost requires that you use gcc for the C/C++ part and the PGI 
Fortran compilers, if you choose that option. I'd suggest compiling 
OpenMPI in the same way as that has resolved our various issues. 
Have you tried that with the same result?


Gerry Creager wrote:

Howdy,

I'm new to this list.  I've done a little review but likely missed 
something specific to what I'm asking.  I'll keep looking but need 
to resolve this soon.


I'm running a Rocks cluster (centos 5), with PGI 7.2-3 compilers, 
Myricom MX2 hardware and drivers, and OpenMPI1.3


I installed the Myricom roll which has OpenMPI compiled with gcc.  
I recently compiled the openmpi code w/ PGI.


I've the MPICH_F90 pointing to the right place, and we're looking 
for the right includes and libs by means of LD_LIBRARY_PATH, etc.


When I tried to run, I got the following error:
make[3]: Entering directory 
`/home/gerry/WRFv3/WRFV3/external/RSL_LITE'

mpicc  -DFSEEKO64_OK  -w -O3 -DDM_PARALLEL   -c c_code.c
PGC/x86-64 Linux 7.2-3: compilation completed with warnings
mpicc  -DFSEEKO64_OK  -w -O3 -DDM_PARALLEL   -c buf_for_proc.c
PGC-S-0036-Syntax error: Recovery attempted by inserting identifier 
.Z before '(' (/share/apps/openmpi-1.3-pgi/include/mpi.h: 889)
PGC-S-0082-Function returning array not allowed 
(/share/apps/openmpi-1.3-pgi/include/mpi.h: 889)
PGC-S-0043-Redefi

Re: [OMPI users] round-robin scheduling question [hostfile]

2009-02-20 Thread Ralph Castain

It is a little bit of both:

* historical, because most MPI's default to mapping by slot, and

* performance, because procs that share a node can communicate via  
shared memory, which is faster than sending messages over an  
interconnect, and most apps are communication-bound


If your app is disk-intensive, then mapping it -bynode may be a better  
option for you. That's why we provide it. Note, however, that you can  
still wind up with multiple procs on a node. All "bynode" means is  
that the ranks are numbered consecutively bynode - it doesn't mean  
that there is only one proc/node.


If you truly want one proc/node, then you should use the -pernode  
option. This maps one proc on each node up to either the number of  
procs you specified or the number of available nodes. If you don't  
specify -np, we just put one proc on each node in your allocation/ 
hostfile.


HTH
Ralph

On Feb 20, 2009, at 1:25 AM, Raymond Wan wrote:



Hi all,

According to FAQ 14 (How do I control how my processes are scheduled  
across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling 
], it says that the default scheduling policy is by slot and not by  
node.  I'm curious why the default is "by slot" since I am thinking  
of explicitly specifying by node but I'm wondering if there is an  
issue which I haven't considered.
I would think that one reason for "by node" is to distribute HDD  
access across machines [as is the case for me since my program is  
HDD access intensive].  Or perhaps I am mistaken?  I'm now thinking  
that "by slot" is the default because processes with ranks that are  
close together might do similar tasks and you would want them on the  
same node?  Is that the reason?


Also, at the end of this FAQ, it says "NOTE:  This is the scheduling  
policy in Open MPI because of a long historical precendent..." --   
does this "This" refer to "the fact that there are two scheduling  
policies" or "the fact that 'by slot' is the default"?  If the  
latter, then that explains why "by slot" is the default, I guess...


Thank you!

Ray



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Strange problem

2009-02-20 Thread Ralph Castain

Hi Gabriele

Could be we have a problem in our LSF support - none of us have a way  
of testing it, so this is somewhat of a blind programming case for us.


From the message, it looks like there is some misunderstanding about  
how many slots were allocated vs how many were mapped to a specific  
host. I don't see your cmd line here - could you pass it along too?


My initial guess is that mpirun is running on node0023, and that we  
then mapped procs local to mpirun such that we exceeded LSF's slot  
allocation on that node. We don't account for mpirun taking a process  
slot in our mapping, and LSF does - hence the error. I think...


You could test this by adding --nolocal to your cmd line. This will  
force mpirun to map all procs on other nodes. If my analysis is  
correct, the job should run.


Ralph

On Feb 20, 2009, at 6:46 AM, Gabriele Fatigati wrote:


Dear OpenMPi developers,
i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and
LSF scheduler. But i got the error attached. I suppose that spawning
process doesn't works well. The same program under OpenMPI 1.2.5 works
well. Could you help me?

Thanks in advance.

--
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] lammps MD code fails with Open MPI 1.3

2009-02-20 Thread Jeff Pummill
It's probably not the same issue as this is one of the very few codes 
that I maintain which is C++ and not fortran :-(


It behaved similarly on another system when I built it against a new 
version (1.0??) of MVAPICH. I had to roll back a version from that as well.


I may contact the lammps people and see if they know what's going on as 
well.



Jeff F. Pummill
Senior Linux Cluster Administrator
TeraGrid Campus Champion - UofA
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"In theory, there is no difference between theory and
practice. But in practice, there is!" /-- anonymous/


Jeff Squyres wrote:
Actually, there was a big Fortran bug that crept in after 1.3 that was 
just fixed on the trunk last night.  If you're using Fortran 
applications with some compilers (e.g., Intel), the 1.3.1 nightly 
snapshots may have hung in some cases.  The problem should be fixed in 
tonight's 1.3.1 nightly snapshot.



On Feb 20, 2009, at 12:46 AM, Nysal Jan wrote:


It could be the same bug reported here
http://www.open-mpi.org/community/lists/users/2009/02/8010.php

Can you try a recent snapshot of 1.3.1
(http://www.open-mpi.org/nightly/v1.3/) to verify if this has been fixed

--Nysal

On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote:

I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which
in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64
box. This Open MPI build was able to generate usable binaries such as
XHPL and NPB, but the lammps binary it generated was not usable.

I tried it with a couple of different versions of the lammps source,
but to no avail. No errors during the builds and a binary was created,
but when executing the job it quickly exits with no messages other
than:

jpummil@stealth:~$ mpirun -np 4 -hostfile
hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small
LAMMPS (22 Jan 2008)

Interestingly, I downloaded Open MPI 1.2.8, built it with the same
configure options I had used with 1.3, and it worked.

I'm getting by fine with 1.2.8. I just wanted to file a possible bug
report on 1.3 and see if others have seen this behavior.

Cheers!

--
Jeff F. Pummill
Senior Linux Cluster Administrator
TeraGrid Campus Champion - UofA
University of Arkansas


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] openmpi 1.3: undefined symbol: mca_base_param_reg_int [was: Re: OpenMPI 1.3:]

2009-02-20 Thread Olaf Lenz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi again!

Sorry for messing up the subject. Also, I wanted to attach the output of
 ompi_info -all.

Olaf
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFJnsSztQ3riQ3oo/oRAhMoAJ0ezp13kNOSEwbph5p/sS2hdMMR+wCgmkus
PuyoW3hfklqfUhYwJXaKvHM=
=7Kl+
-END PGP SIGNATURE-


ompi_info.out.gz
Description: GNU Zip compressed data


[OMPI users] OpenMPI 1.3:

2009-02-20 Thread Olaf Lenz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello!

I have compiled OpenMPI 1.3 with

configure --prefix=$HOME/software

The compilation works fine, and I can run normal MPI programs.

However, I'm using OpenMPI to run a program that we currently develop
(http://www.espresso-pp.de). The software uses Python as a front-end
language, which loads the MPI-enabled shared library. When I start
python with a script using this parallel lib via mpiexec, I get the
following error:

> mpiexec -n 4 python examples/hello.py
python: symbol lookup error:
/people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int
python: symbol lookup error:
/people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int
python: symbol lookup error:
/people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int
python: symbol lookup error:
/people/thnfs/homes/lenzo/software.thop/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int

When I compile OpenMPI 1.3 using

--enable-shared --enable-static

the problem disappears. Note also, that the same program works when I'm
using OpenMPI 1.2.x (tested 1.2.6 and 1.2.9). I do believe that the
problem is connected with the problem described here:

http://www.open-mpi.org/community/lists/devel/2005/09/0359.php

I have found a workaround, but I think the problem is worth reporting.

Let me know if I can help in debugging the problem.

Greetings from Germany

Olaf Lenz

PS: It is not obvious on the OpenMPI web site where to report bugs. When
clicking on "Bug Tracking", which seems most obvious, I'm redirected to
the Trac Timeline, and there is no place where I can report bugs or
anything.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFJnsPwtQ3riQ3oo/oRAmLfAJ9VdcC1eQCiJyQCoXdXF/UsAgECVgCfXYA+
H3ghX4gj3dGze0io6RQC+KE=
=Wu5B
-END PGP SIGNATURE-


[OMPI users] Strange problem

2009-02-20 Thread Gabriele Fatigati
Dear OpenMPi developers,
i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and
LSF scheduler. But i got the error attached. I suppose that spawning
process doesn't works well. The same program under OpenMPI 1.2.5 works
well. Could you help me?

Thanks in advance.

-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


job.196571.err
Description: Binary data


Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers

2009-02-20 Thread Jeff Squyres
Can you also send a copy of your mpi.h? (OMPI's mpi.h is generated by  
configure; I want to see what was put into your mpi.h)


Finally, what version of icc are you using?  I test regularly with icc  
9.0, 9.1, 10.0, and 10.1 with no problems.  Are you using newer or  
older?  (I don't have immediate access to 11.x or 8.x)



On Feb 20, 2009, at 8:09 AM, Jeff Squyres wrote:


Can you send your config.log as well?

It looks like you forgot to specify FC=ifort on your configure line  
(i.e., you need to specify F77=ifort for the Fortran 77 *and*  
FC=ifort for the Fortran 90 compiler -- this is an Autoconf thing;  
we didn't make it up).


That shouldn't be the problem here, but I thought I'd mention it.


On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote:



Jeff:
You're correct. That was the incorrect config file. I've attached  
the correct one as per the recommendations in the help page.


Thanks for your help

--- On Thu, 2/19/09, Jeff Squyres  wrote:
From: Jeff Squyres 
Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit  
machine with intel compilers

To: talmesh...@yahoo.com, "Open MPI Users" 
Date: Thursday, February 19, 2009, 8:32 AM

Your config.log looks incomplete -- it failed saying that your C  
and C++

compilers were incompatible with each other.

This does not seem related to what you described -- are you sure  
you're

sending the right config.log?

Specifically, can you send all the information listed here:

   http://www.open-mpi.org/community/help/


On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote:

> Hello all:
> I was unable to compile the latest version (1.3) on my intel  
64bit system
with the intel compilers (version 9.0). Configuration goes fine,  
but I get this

error when running make:
>
> ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is
undefined
>  typedef OMPI_PTRDIFF_TYPE MPI_Aint;
>
> compilation aborted for dt_args.c (cod 21)
>
> My config line was:
> ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx
>
> I've attached my config.log file. Has anyone encourtered this? I  
was
able to build openmpi on this exact system using the gcc/g++  
compilers, however

the intel compilers are substantially faster on our system.
>
> Thanks!
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems



< 
openmp 
-1.3_output.tar.gz>___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] ptrdiff_t undefined error on intel 64bit machine with intel compilers

2009-02-20 Thread Jeff Squyres

Can you send your config.log as well?

It looks like you forgot to specify FC=ifort on your configure line  
(i.e., you need to specify F77=ifort for the Fortran 77 *and* FC=ifort  
for the Fortran 90 compiler -- this is an Autoconf thing; we didn't  
make it up).


That shouldn't be the problem here, but I thought I'd mention it.


On Feb 19, 2009, at 12:00 PM, Tamara Rogers wrote:



Jeff:
You're correct. That was the incorrect config file. I've attached  
the correct one as per the recommendations in the help page.


Thanks for your help

--- On Thu, 2/19/09, Jeff Squyres  wrote:
From: Jeff Squyres 
Subject: Re: [OMPI users] ptrdiff_t undefined error on intel 64bit  
machine with intel compilers

To: talmesh...@yahoo.com, "Open MPI Users" 
Date: Thursday, February 19, 2009, 8:32 AM

Your config.log looks incomplete -- it failed saying that your C and  
C++

compilers were incompatible with each other.

This does not seem related to what you described -- are you sure  
you're

sending the right config.log?

Specifically, can you send all the information listed here:

http://www.open-mpi.org/community/help/


On Feb 17, 2009, at 5:10 PM, Tamara Rogers wrote:

> Hello all:
> I was unable to compile the latest version (1.3) on my intel 64bit  
system
with the intel compilers (version 9.0). Configuration goes fine, but  
I get this

error when running make:
>
> ../../ompi/include/mpi.h(203): error: identifier "ptrdiff_t" is
undefined
>  typedef OMPI_PTRDIFF_TYPE MPI_Aint;
>
> compilation aborted for dt_args.c (cod 21)
>
> My config line was:
> ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=xxx
>
> I've attached my config.log file. Has anyone encourtered this? I was
able to build openmpi on this exact system using the gcc/g++  
compilers, however

the intel compilers are substantially faster on our system.
>
> Thanks!
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems



< 
openmp 
-1.3_output.tar.gz>___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] lammps MD code fails with Open MPI 1.3

2009-02-20 Thread Jeff Squyres
Actually, there was a big Fortran bug that crept in after 1.3 that was  
just fixed on the trunk last night.  If you're using Fortran  
applications with some compilers (e.g., Intel), the 1.3.1 nightly  
snapshots may have hung in some cases.  The problem should be fixed in  
tonight's 1.3.1 nightly snapshot.



On Feb 20, 2009, at 12:46 AM, Nysal Jan wrote:


It could be the same bug reported here
http://www.open-mpi.org/community/lists/users/2009/02/8010.php

Can you try a recent snapshot of 1.3.1
(http://www.open-mpi.org/nightly/v1.3/) to verify if this has been  
fixed


--Nysal

On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote:

I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which
in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64
box. This Open MPI build was able to generate usable binaries such as
XHPL and NPB, but the lammps binary it generated was not usable.

I tried it with a couple of different versions of the lammps source,
but to no avail. No errors during the builds and a binary was  
created,

but when executing the job it quickly exits with no messages other
than:

jpummil@stealth:~$ mpirun -np 4 -hostfile
hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small
LAMMPS (22 Jan 2008)

Interestingly, I downloaded Open MPI 1.2.8, built it with the same
configure options I had used with 1.3, and it worked.

I'm getting by fine with 1.2.8. I just wanted to file a possible bug
report on 1.3 and see if others have seen this behavior.

Cheers!

--
Jeff F. Pummill
Senior Linux Cluster Administrator
TeraGrid Campus Champion - UofA
University of Arkansas


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] round-robin scheduling question [hostfile]

2009-02-20 Thread Raymond Wan


Hi all,

According to FAQ 14 (How do I control how my processes are scheduled across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling], it says that the default scheduling policy is by slot and not by node.  I'm curious why the default is "by slot" since I am thinking of explicitly specifying by node but I'm wondering if there is an issue which I haven't considered.  


I would think that one reason for "by node" is to distribute HDD access across machines 
[as is the case for me since my program is HDD access intensive].  Or perhaps I am mistaken?  I'm 
now thinking that "by slot" is the default because processes with ranks that are close 
together might do similar tasks and you would want them on the same node?  Is that the reason?

Also, at the end of this FAQ, it says "NOTE:  This is the scheduling policy in Open MPI because of a long historical 
precendent..." --  does this "This" refer to "the fact that there are two scheduling policies" or 
"the fact that 'by slot' is the default"?  If the latter, then that explains why "by slot" is the default, I 
guess...

Thank you!

Ray





Re: [OMPI users] lammps MD code fails with Open MPI 1.3

2009-02-20 Thread Nysal Jan
It could be the same bug reported here
http://www.open-mpi.org/community/lists/users/2009/02/8010.php

Can you try a recent snapshot of 1.3.1
(http://www.open-mpi.org/nightly/v1.3/) to verify if this has been fixed

--Nysal

On Thu, 2009-02-19 at 16:09 -0600, Jeff Pummill wrote:
> I built a fresh version of lammps v29Jan09 against Open MPI 1.3 which
> in turn was built with Gnu compilers v4.2.4 on an Ubuntu 8.04 x86_64
> box. This Open MPI build was able to generate usable binaries such as
> XHPL and NPB, but the lammps binary it generated was not usable.
> 
> I tried it with a couple of different versions of the lammps source,
> but to no avail. No errors during the builds and a binary was created,
> but when executing the job it quickly exits with no messages other
> than:
> 
> jpummil@stealth:~$ mpirun -np 4 -hostfile
> hosts /home/jpummil/lmp_Stealth-OMPI < in.testbench_small
> LAMMPS (22 Jan 2008)
> 
> Interestingly, I downloaded Open MPI 1.2.8, built it with the same
> configure options I had used with 1.3, and it worked.
> 
> I'm getting by fine with 1.2.8. I just wanted to file a possible bug
> report on 1.3 and see if others have seen this behavior.
> 
> Cheers!
> 
> -- 
> Jeff F. Pummill
> Senior Linux Cluster Administrator
> TeraGrid Campus Champion - UofA
> University of Arkansas
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users