Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-25 Thread Sangamesh B
On Fri, Oct 24, 2008 at 11:26 PM, Eugene Loh  wrote:
> Sangamesh B wrote:
>
>> I reinstalled all softwares with -O3 optimization. Following are the
>> performance numbers for a 4 process job on a single node:
>>
>> MPICH2: 26 m 54 s
>> OpenMPI:   24 m 39 s
>
> I'm not sure I'm following.  OMPI is faster here, but is that a result of
> MPICH2 slowing down?  The original post at
> http://www.open-mpi.org/community/lists/users/2008/10/6891.php had:
>
> OpenMPI - 25 m 39 s.
> MPICH2 - 15 m 53 s.
>
> So, did MPICH2 slow down, or can one not compare these timings?
No.
> OpenMPI - 25 m 39 s.
> MPICH2 - 15 m 53 s.

This job is run with 8 processes i.e. on 2 nodes.

> OpenMPI - 25 m 39 s.
> MPICH2 - 15 m 53 s.

This job is run with 4 processes i.e. on 1 node
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-24 Thread Eugene Loh

Sangamesh B wrote:

I reinstalled all softwares with -O3 optimization. Following are the 
performance numbers for a 4 process job on a single node:


MPICH2: 26 m 54 s
OpenMPI:   24 m 39 s


I'm not sure I'm following.  OMPI is faster here, but is that a result 
of MPICH2 slowing down?  The original post at 
http://www.open-mpi.org/community/lists/users/2008/10/6891.php had:


OpenMPI - 25 m 39 s.
MPICH2 - 15 m 53 s.

So, did MPICH2 slow down, or can one not compare these timings?


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-15 Thread Rajeev Thakur
For MPICH2 1.0.7, configure with --with-device=ch3:nemesis. That will use
shared memory within a node unlike ch3:sock which uses TCP. Nemesis is the
default in 1.1a1.

Rajeev


> Date: Wed, 15 Oct 2008 18:21:17 +0530
> From: "Sangamesh B" <forum@gmail.com>
> Subject: Re: [OMPI users] Performance: MPICH2 vs OpenMPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Message-ID:
>   <cb60cbc40810150551sf26acc6qb1113a289ac9d...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> On Fri, Oct 10, 2008 at 10:40 PM, Brian Dobbins 
> <bdobb...@gmail.com> wrote:
> 
> >
> > Hi guys,
> >
> > On Fri, Oct 10, 2008 at 12:57 PM, Brock Palen 
> <bro...@umich.edu> wrote:
> >
> >> Actually I had a much differnt results,
> >>
> >> gromacs-3.3.1  one node dual core dual socket opt2218  
> openmpi-1.2.7
> >>  pgi/7.2
> >> mpich2 gcc
> >>
> >
> >For some reason, the difference in minutes didn't come 
> through, it
> > seems, but I would guess that if it's a medium-large 
> difference, then it has
> > its roots in PGI7.2 vs. GCC rather than MPICH2 vs. OpenMPI. 
>  Though, to be
> > fair, I find GCC vs. PGI (for C code) is often a toss-up - 
> one may beat the
> > other handily on one code, and then lose just as badly on another.
> >
> > I think my install of mpich2 may be bad, I have never 
> installed it before,
> >>  only mpich1, OpenMPI and LAM. So take my mpich2 numbers 
> with salt, Lots of
> >> salt.
> >
> >
> >   I think the biggest difference in performance with 
> various MPICH2 install
> > comes from differences in the 'channel' used..  I tend to 
> make sure that I
> > use the 'nemesis' channel, which may or may not be the 
> default these days.
> > If not, though, most people would probably want it.  I 
> think it has issues
> > with threading (or did ages ago?), but I seem to recall it being
> > considerably faster than even the 'ssm' channel.
> >
> >   Sangamesh:  My advice to you would be to recompile 
> Gromacs and specify,
> > in the *Gromacs* compile / configure, to use the same 
> CFLAGS you used with
> > MPICH2.  Eg, "-O2 -m64", whatever.  If you do that, I bet 
> the times between
> > MPICH2 and OpenMPI will be pretty comparable for your 
> benchmark case -
> > especially when run on a single processor.
> >
> 
> I reinstalled all softwares with -O3 optimization. Following are the
> performance numbers for a 4 process job on a single node:
> 
> MPICH2: 26 m 54 s
> OpenMPI:   24 m 39 s
> 
> More details:
> 
> $ /home/san/PERF_TEST/mpich2/bin/mpich2version
> MPICH2 Version: 1.0.7
> MPICH2 Release date:Unknown, built on Mon Oct 13 18:02:13 IST 2008
> MPICH2 Device:  ch3:sock
> MPICH2 configure:   --prefix=/home/san/PERF_TEST/mpich2
> MPICH2 CC:  /usr/bin/gcc -O3 -O2
> MPICH2 CXX: /usr/bin/g++  -O2
> MPICH2 F77: /usr/bin/gfortran -O3 -O2
> MPICH2 F90: /usr/bin/gfortran  -O2
> 
> 
> $ /home/san/PERF_TEST/openmpi/bin/ompi_info
> Open MPI: 1.2.7
>Open MPI SVN revision: r19401
> Open RTE: 1.2.7
>Open RTE SVN revision: r19401
> OPAL: 1.2.7
>OPAL SVN revision: r19401
>   Prefix: /home/san/PERF_TEST/openmpi
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: san
>Configured on: Mon Oct 13 19:10:13 IST 2008
>   Configure host: locuzcluster.org
> Built by: san
> Built on: Mon Oct 13 19:18:25 IST 2008
>   Built host: locuzcluster.org
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: /usr/bin/gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: /usr/bin/g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: /usr/bin/gfortran
>   Fortran77 compiler abs: /usr/bin/gfortran
>   Fortran90 compiler: /usr/bin/gfortran
>   Fortran90 compiler abs: /usr/bin/gfortran
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: yes
>  mpirun default --prefix: no
> 
> Thanks,
> Sangamesh



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-15 Thread Sangamesh B
On Fri, Oct 10, 2008 at 10:40 PM, Brian Dobbins  wrote:

>
> Hi guys,
>
> On Fri, Oct 10, 2008 at 12:57 PM, Brock Palen  wrote:
>
>> Actually I had a much differnt results,
>>
>> gromacs-3.3.1  one node dual core dual socket opt2218  openmpi-1.2.7
>>  pgi/7.2
>> mpich2 gcc
>>
>
>For some reason, the difference in minutes didn't come through, it
> seems, but I would guess that if it's a medium-large difference, then it has
> its roots in PGI7.2 vs. GCC rather than MPICH2 vs. OpenMPI.  Though, to be
> fair, I find GCC vs. PGI (for C code) is often a toss-up - one may beat the
> other handily on one code, and then lose just as badly on another.
>
> I think my install of mpich2 may be bad, I have never installed it before,
>>  only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of
>> salt.
>
>
>   I think the biggest difference in performance with various MPICH2 install
> comes from differences in the 'channel' used..  I tend to make sure that I
> use the 'nemesis' channel, which may or may not be the default these days.
> If not, though, most people would probably want it.  I think it has issues
> with threading (or did ages ago?), but I seem to recall it being
> considerably faster than even the 'ssm' channel.
>
>   Sangamesh:  My advice to you would be to recompile Gromacs and specify,
> in the *Gromacs* compile / configure, to use the same CFLAGS you used with
> MPICH2.  Eg, "-O2 -m64", whatever.  If you do that, I bet the times between
> MPICH2 and OpenMPI will be pretty comparable for your benchmark case -
> especially when run on a single processor.
>

I reinstalled all softwares with -O3 optimization. Following are the
performance numbers for a 4 process job on a single node:

MPICH2: 26 m 54 s
OpenMPI:   24 m 39 s

More details:

$ /home/san/PERF_TEST/mpich2/bin/mpich2version
MPICH2 Version: 1.0.7
MPICH2 Release date:Unknown, built on Mon Oct 13 18:02:13 IST 2008
MPICH2 Device:  ch3:sock
MPICH2 configure:   --prefix=/home/san/PERF_TEST/mpich2
MPICH2 CC:  /usr/bin/gcc -O3 -O2
MPICH2 CXX: /usr/bin/g++  -O2
MPICH2 F77: /usr/bin/gfortran -O3 -O2
MPICH2 F90: /usr/bin/gfortran  -O2


$ /home/san/PERF_TEST/openmpi/bin/ompi_info
Open MPI: 1.2.7
   Open MPI SVN revision: r19401
Open RTE: 1.2.7
   Open RTE SVN revision: r19401
OPAL: 1.2.7
   OPAL SVN revision: r19401
  Prefix: /home/san/PERF_TEST/openmpi
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: san
   Configured on: Mon Oct 13 19:10:13 IST 2008
  Configure host: locuzcluster.org
Built by: san
Built on: Mon Oct 13 19:18:25 IST 2008
  Built host: locuzcluster.org
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: /usr/bin/gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: /usr/bin/g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /usr/bin/gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: /usr/bin/gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no

Thanks,
Sangamesh

>
>   Cheers,
>   - Brian
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
MPICH2

[san@locuzcluster mpich2-1.0.7]$ ./configure --help

[san@locuzcluster mpich2-1.0.7]$ export CC=`which gcc`

[san@locuzcluster mpich2-1.0.7]$ export CXX=`which g++`

[san@locuzcluster mpich2-1.0.7]$ export F77=`which gfortarn`

[san@locuzcluster mpich2-1.0.7]$ export F90=`which gfortran`

[san@locuzcluster mpich2-1.0.7]$ export CFLAGS=-O3 

[san@locuzcluster mpich2-1.0.7]$ export FFLAGS=-O3  

[san@locuzcluster mpich2-1.0.7]$ ./configure 
--prefix=/home/san/PERF_TEST/mpich2 | tee config_out

[san@locuzcluster mpich2-1.0.7]$ make | tee make_out


OPENMPI

[san@locuzcluster openmpi-1.2.7]$ export FC=`which gfortran`

[san@locuzcluster openmpi-1.2.7]$ ./configure 
--prefix=/home/san/PERF_TEST/openmpi | tee config_out

[san@locuzcluster openmpi-1.2.7]$ make | tee make_out

[san@locuzcluster openmpi-1.2.7]$ make install | tee install_out




FFTW

$ export CC=`which gcc`

$ export CXX=`which g++`

$ export F77=`which gfortran`

$ export CFLAGS=-O3

$ export FFLAGS=-O3



GROMACS

With MPICH2

$ export CC=`which gcc`

$ export CXX=`which g++`

$ export 

Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brian Dobbins
Hi guys,

On Fri, Oct 10, 2008 at 12:57 PM, Brock Palen  wrote:

> Actually I had a much differnt results,
>
> gromacs-3.3.1  one node dual core dual socket opt2218  openmpi-1.2.7
>  pgi/7.2
> mpich2 gcc
>

   For some reason, the difference in minutes didn't come through, it seems,
but I would guess that if it's a medium-large difference, then it has its
roots in PGI7.2 vs. GCC rather than MPICH2 vs. OpenMPI.  Though, to be fair,
I find GCC vs. PGI (for C code) is often a toss-up - one may beat the other
handily on one code, and then lose just as badly on another.

I think my install of mpich2 may be bad, I have never installed it before,
>  only mpich1, OpenMPI and LAM. So take my mpich2 numbers with salt, Lots of
> salt.


  I think the biggest difference in performance with various MPICH2 install
comes from differences in the 'channel' used..  I tend to make sure that I
use the 'nemesis' channel, which may or may not be the default these days.
If not, though, most people would probably want it.  I think it has issues
with threading (or did ages ago?), but I seem to recall it being
considerably faster than even the 'ssm' channel.

  Sangamesh:  My advice to you would be to recompile Gromacs and specify, in
the *Gromacs* compile / configure, to use the same CFLAGS you used with
MPICH2.  Eg, "-O2 -m64", whatever.  If you do that, I bet the times between
MPICH2 and OpenMPI will be pretty comparable for your benchmark case -
especially when run on a single processor.

  Cheers,
  - Brian


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brock Palen

Whoops didn't include the mpich2 numbers,

20M mpich2  same node,

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 12:57 PM, Brock Palen wrote:


Actually I had a much differnt results,

gromacs-3.3.1  one node dual core dual socket opt2218   
openmpi-1.2.7  pgi/7.2

mpich2 gcc

19M OpenMPI
M  Mpich2

So for me OpenMPI+pgi was faster, I don't know how you got such a  
low mpich2 number.


On the other hand if you do this preprocess before you run:

grompp -sort -shuffle -np 4
mdrun -v

With -sort and -shuffle  the OpenMPI run time went down,

12M OpenMPI + sort shuffle

I think my install of mpich2 may be bad, I have never installed it  
before,  only mpich1, OpenMPI and LAM. So take my mpich2 numbers  
with salt, Lots of salt.


On that point though -sort -shuffle may be useful for you, be sure  
to understand what they do before you use them.

Read:
http://cac.engin.umich.edu/resources/software/gromacs.html

Last,  make sure that your using the single precision version of  
gromacs for both runs.  the double is about half the speed of the  
single.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote:




On Thu, Oct 9, 2008 at 7:30 PM, Brock Palen  wrote:
Which benchmark did you use?

Out of 4 benchmarks I used d.dppc benchmark.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:



On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres   
wrote:

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default.  
It really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
 Internal debug support: no
Memory debugging support: no
shell$
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
 Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with  
compilation of GROMACS.


You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-10 Thread Brock Palen

Actually I had a much differnt results,

gromacs-3.3.1  one node dual core dual socket opt2218  openmpi-1.2.7   
pgi/7.2

mpich2 gcc

19M OpenMPI
M  Mpich2

So for me OpenMPI+pgi was faster, I don't know how you got such a low  
mpich2 number.


On the other hand if you do this preprocess before you run:

grompp -sort -shuffle -np 4
mdrun -v

With -sort and -shuffle  the OpenMPI run time went down,

12M OpenMPI + sort shuffle

I think my install of mpich2 may be bad, I have never installed it  
before,  only mpich1, OpenMPI and LAM. So take my mpich2 numbers with  
salt, Lots of salt.


On that point though -sort -shuffle may be useful for you, be sure to  
understand what they do before you use them.

Read:
http://cac.engin.umich.edu/resources/software/gromacs.html

Last,  make sure that your using the single precision version of  
gromacs for both runs.  the double is about half the speed of the  
single.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 10, 2008, at 1:15 AM, Sangamesh B wrote:




On Thu, Oct 9, 2008 at 7:30 PM, Brock Palen  wrote:
Which benchmark did you use?

Out of 4 benchmarks I used d.dppc benchmark.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:



On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres   
wrote:

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default.  
It really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
 Internal debug support: no
Memory debugging support: no
shell$
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
 Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with  
compilation of GROMACS.


You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Anthony Chan

- "Brian Dobbins"  wrote:

> OpenMPI : 120m 6s
> MPICH2 : 67m 44s
> 
> That seems to indicate that something else is going on -- with -np 1,
> there should be no MPI communication, right? I wonder if the memory
> allocator performance is coming into play here.

If the app sends message to its own rank, it could still go through MPI stack
even with -np 1, i.e. it involves at least 1 memcpy() for point-to-point calls.

> I'd be more inclined to double-check how the Gromacs app is being
> compiled in the first place - I wouldn't think the OpenMPI memory
> allocator would make anywhere near that much difference. Sangamesh, do
> you know what command line was used to compile both of these? Someone
> correct me if I'm wrong, but if MPICH2 embeds optimization flags in
> the 'mpicc' command and OpenMPI does not, then if he's not specifying
> any optimization flags in the compilation of Gromacs, MPICH2 will pass
> its embedded ones on to the Gromacs compile and be faster. I'm rusty
> on my GCC, too, though - does it default to an O2 level, or does it
> default to no optimizations?

MPICH2 does pass CFLAGS specified in configure step to mpicc and friends.
If users don't want CFLAGS to be passed to mpicc, they should set
MPICH2LIB_CFLAGS instead. The reason behind passing CFLAGS to mpicc 
is that CFLAGS may contain flag like -m64 or -m32 which is needed in
mpicc to make sure object files compatible with MPICH2 libraries.

I assume default installation here means no CFLAGS is specified, in that
case MPICH2's mpicc will not contain any optimization flag (this is true
in 1.0.7 or later, earlier versions of MPICH2 had various inconsistent
way of handling compiler flags between compiling the libraries and those
used in compiler wrappers.)  If Gromacs is compiled  with mpicc, 
"mpicc -show -c" will show if any optimization flag is used.  Without "-c",
the -show alone displays the link command.  To check what mpich2 libraries
are compiled of, use $bindir/mpich2version.

If I recall correctly, gcc defaults to "-g -O2". Not sure if the newer version
of gcc changes that.

A.Chan

> 
> Since the benchmark is readily available, I'll try running it later
> today.. didn't get a chance last night.
> 
> Cheers,
> - Brian
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Terry Frankcombe
>I'm rusty on my GCC, too, though - does it default to an O2
> level, or does it default to no optimizations?

Default gcc is indeed no optimisation.  gcc seems to like making users
type really long complicated command lines even more than OpenMPI does.

(Yes yes, I know!  Don't tell me!)




Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Eugene Loh




Brian Dobbins wrote:

  
  On Thu, Oct 9, 2008 at 10:13 AM, Jeff
Squyres 
wrote:
  
On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:


OpenMPI
: 120m 6s
MPICH2 :  67m 44s



That seems to indicate that something else is going on -- with -np 1,
there should be no MPI communication, right?
  
  

Wow.  Yes.  Ditto.

  
  
  I'd be more inclined to double-check how the Gromacs app is
being compiled in the first place
  
  

E.g.,

mpicc -show

  
  
  Someone correct me if I'm wrong, but if MPICH2 embeds
optimization flags in the 'mpicc' command and OpenMPI does not, then if
he's not specifying any optimization flags in the compilation of
Gromacs, MPICH2 will pass its embedded ones on to the Gromacs compile
and be faster.
  
  

Yes, I have one established example of this.  I built MPICH2 with
CFLAGS=-O2.  I compiled a non-MPI code with "mpicc" (no flags) and got
optimized performance with MPICH2 but non-optimized performance with
OMPI.  About 3x difference in performance for my particular test case. 
Not a single bit of MPI in the test code.

  
  
  I'm rusty on my GCC, too, though - does it default to an O2
level, or does it default to no optimizations?
  
  

When I tried it, default gcc seemed to be no optimization.  In my
MPICH2 "mpicc" (with optimization built in) I had to specify "mpicc
-O0" explicitly to turn optimization back off again.




Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Brian Dobbins
On Thu, Oct 9, 2008 at 10:13 AM, Jeff Squyres  wrote:

> On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:
>
>> OpenMPI : 120m 6s
>> MPICH2 :  67m 44s
>>
>
> That seems to indicate that something else is going on -- with -np 1, there
> should be no MPI communication, right?  I wonder if the memory allocator
> performance is coming into play here.


  I'd be more inclined to double-check how the Gromacs app is being compiled
in the first place - I wouldn't think the OpenMPI memory allocator would
make anywhere near that much difference.  Sangamesh, do you know what
command line was used to compile both of these?  Someone correct me if I'm
wrong, but *if* MPICH2 embeds optimization flags in the 'mpicc' command and
OpenMPI does not, then if he's not specifying any optimization flags in the
compilation of Gromacs, MPICH2 will pass its embedded ones on to the Gromacs
compile and be faster.  I'm rusty on my GCC, too, though - does it default
to an O2 level, or does it default to no optimizations?

  Since the benchmark is readily available, I'll try running it later
today.. didn't get a chance last night.

  Cheers,
  - Brian


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Jeff Squyres

On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:


I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s
MPICH2 :  67m 44s



That seems to indicate that something else is going on -- with -np 1,  
there should be no MPI communication, right?  I wonder if the memory  
allocator performance is coming into play here.


Try re-configuring/re-building Open MPI with --without-memory-manager.

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Brock Palen

Which benchmark did you use?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 9, 2008, at 8:06 AM, Sangamesh B wrote:




On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres   
wrote:

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default.  
It really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
 Internal debug support: no
Memory debugging support: no
shell$
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
  Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with  
compilation of GROMACS.


You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Sangamesh B
On Thu, Oct 9, 2008 at 5:40 AM, Jeff Squyres  wrote:

> On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:
>
>  Make sure you don't use a "debug" build of Open MPI. If you use trunk, the
>> build system detects it and turns on debug by default. It really kills
>> performance. --disable-debug will remove all those nasty printfs from the
>> critical path.
>>
>
> You can easily tell if you have a debug build of OMPI with the ompi_info
> command:
>
> shell$ ompi_info | grep debug
>  Internal debug support: no
> Memory debugging support: no
> shell$
>
Yes. It is "no"
$ /opt/ompi127/bin/ompi_info -all | grep debug
  Internal debug support: no
Memory debugging support: no

I've tested GROMACS for a single process (mpirun -np 1):
Here are the results:

OpenMPI : 120m 6s

MPICH2 :  67m 44s

I'm trying to bulid the codes with PGI, but facing problem with compilation
of GROMACS.

>
> You want to see "no" for both of those.
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Sangamesh B
On Thu, Oct 9, 2008 at 2:39 AM, Brian Dobbins  wrote:

>
> Hi guys,
>
> [From Eugene Loh:]
>
>> OpenMPI - 25 m 39 s.
>>> MPICH2  -  15 m 53 s.
>>>
>> With regards to your issue, do you have any indication when you get that
>> 25m39s timing if there is a grotesque amount of time being spent in MPI
>> calls?  Or, is the slowdown due to non-MPI portions?
>
>
>   Just to add my two cents: if this job *can* be run on less than 8
> processors (ideally, even on just 1), then I'd recommend doing so.  That is,
> run it with OpenMPI and with MPICH2 on 1, 2 and 4 processors as well.  If
> the single-processor jobs still give vastly different timings, then perhaps
> Eugene is on the right track and it comes down to various computational
> optimizations and not so much the message-passing that's make a difference.
> Timings from 2 and 4 process runs might be interesting as well to see how
> this difference changes with process counts.
>
>   I've seen differences between various MPI libraries before, but nothing
> quite this severe either.  If I get the time, maybe I'll try to set up
> Gromacs tonight -- I've got both MPICH2 and OpenMPI installed here and can
> try to duplicate the runs.   Sangamesh, is this a standard benchmark case
> that anyone can download and run?
>
Yes.
ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz


>
>
>   Cheers,
>   - Brian
>
>
> Brian Dobbins
> Yale Engineering HPC
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Jeff Squyres

On Oct 8, 2008, at 5:25 PM, Aurélien Bouteiller wrote:

Make sure you don't use a "debug" build of Open MPI. If you use  
trunk, the build system detects it and turns on debug by default. It  
really kills performance. --disable-debug will remove all those  
nasty printfs from the critical path.


You can easily tell if you have a debug build of OMPI with the  
ompi_info command:


shell$ ompi_info | grep debug
  Internal debug support: no
Memory debugging support: no
shell$

You want to see "no" for both of those.

--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Eugene Loh

Eugene Loh wrote:


Sangamesh B wrote:


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.


I don't understand MPICH very well, but it seemed as though some of 
the flags used in building MPICH are supposed to be added in 
automatically to the mpicc/etc compiler wrappers.


Again, this may not apply to your case, but I found out some more 
details on my theory.


If you build MPICH2 like this:

   % configure CFLAGS=-O2
   % make

then when you use "mpicc" to build your application, you automatically 
get that optimization flag built in.


What had confused me was that I tried confirming the theory by building 
MPICH2 like this:


   % configure --enable-fast
   % make

That does *NOT* up the mpicc optimization level (despite their 
documentation).


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread George Bosilca
One thing to look for is the process distribution. Based on the  
application communication pattern, the process distribution can have a  
tremendous impact on the execution time. Imagine that the application  
split the processes in two equal groups based on the rank and only  
communicate in each group. If such a group end-up on the same node,  
then it will use sm for communications. On the opposite, if they end- 
up spread across the nodes they will use TCP (which obviously has a  
bigger latency and a smaller bandwidth) and the overall performance  
will be greatly impacted.


By default, Open MPI use the following strategy to distribute  
processes: if a node has several processors, then consecutive ranks  
will be started on the same node. As an example in your case (2 nodes  
with 4 processors each), the ranks 0-3 will be started on the first  
host, while the ranks 4-7 on the second one. I don't know what is the  
default distribution for MPICH2 ...


Anyway, there is a easy way to check if the process distribution is  
the root of your problem. Please execute your application twice, once  
providing to mpirun the --bynode argument, and once with the --byslot.


  george.

On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:


Hi All,

   I wanted to switch from mpich2/mvapich2 to OpenMPI, as  
OpenMPI supports both ethernet and infiniband. Before doing that I  
tested an application 'GROMACS' to compare the performance of MPICH2  
& OpenMPI. Both have been compiled with GNU compilers.


After this benchmark, I came to know that OpenMPI is slower than  
MPICH2.


This benchmark is run on a AMD dual core, dual opteron processor.  
Both have compiled with default configurations.


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.

Any comments ..?

Thanks,
Sangamesh
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Brian Dobbins
Hi guys,

[From Eugene Loh:]

> OpenMPI - 25 m 39 s.
>> MPICH2  -  15 m 53 s.
>>
> With regards to your issue, do you have any indication when you get that
> 25m39s timing if there is a grotesque amount of time being spent in MPI
> calls?  Or, is the slowdown due to non-MPI portions?


  Just to add my two cents: if this job *can* be run on less than 8
processors (ideally, even on just 1), then I'd recommend doing so.  That is,
run it with OpenMPI and with MPICH2 on 1, 2 and 4 processors as well.  If
the single-processor jobs still give vastly different timings, then perhaps
Eugene is on the right track and it comes down to various computational
optimizations and not so much the message-passing that's make a difference.
Timings from 2 and 4 process runs might be interesting as well to see how
this difference changes with process counts.

  I've seen differences between various MPI libraries before, but nothing
quite this severe either.  If I get the time, maybe I'll try to set up
Gromacs tonight -- I've got both MPICH2 and OpenMPI installed here and can
try to duplicate the runs.   Sangamesh, is this a standard benchmark case
that anyone can download and run?

  Cheers,
  - Brian


Brian Dobbins
Yale Engineering HPC


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Eugene Loh

Sangamesh B wrote:

I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI 
supports both ethernet and infiniband. Before doing that I tested an 
application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. 
Both have been compiled with GNU compilers.


After this benchmark, I came to know that OpenMPI is slower than MPICH2.

This benchmark is run on a AMD dual core, dual opteron processor. Both 
have compiled with default configurations.


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.


I agree with Samuel that this difference is strikingly large.

I had a thought that might not apply to your case, but I figured I'd 
share it anyhow.


I don't understand MPICH very well, but it seemed as though some of the 
flags used in building MPICH are supposed to be added in automatically 
to the mpicc/etc compiler wrappers.  That is, if you specified CFLAGS=-O 
to build MPICH, then if you compiled an application with mpicc you would 
automatically get -O.  At least that was my impression.  Maybe I 
misunderstood the documentation.  (If you want to use some flags just 
for building MPICH but you don't want users to get those flags 
automatically when they use mpicc, you're supposed to use flags like 
MPICH2LIB_CFLAGS instead of just CFLAGS when you run "configure".)


Not only may this theory not apply to your case, but I'm not even sure 
it holds water.  I just tried building MPICH2 with --enable-fast turned 
on.  The "configure" output indicates I'm getting CFLAGS=-O2, but when I 
run "mpicc -show" it seems to invoke gcc without any optimization flags 
by default.


So, I guess I'm sending this mail less to help you and more as a request 
that someone might improve my understanding.


With regards to your issue, do you have any indication when you get that 
25m39s timing if there is a grotesque amount of time being spent in MPI 
calls?  Or, is the slowdown due to non-MPI portions?


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Jeff Squyres

On Oct 8, 2008, at 10:58 AM, Ashley Pittman wrote:


You probably already know this but the obvious candidate here is the
memcpy() function, icc sticks in it's own which in some cases is much
better than the libc one.  It's unusual for compilers to have *huge*
differences from code optimisations alone.



Yep -- memcpy is one of the things that we're looking at.  Haven't  
heard back on the results from the next round of testing yet (one of  
the initial suggestions we had was to separate openib vs. sm  
performance and see if one of them yielded an obvious difference).


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Brock Palen


Jeff,

You probably already know this but the obvious candidate here is the
memcpy() function, icc sticks in it's own which in some cases is much
better than the libc one.  It's unusual for compilers to have *huge*
differences from code optimisations alone.


I know this is off topic, but I was interested in this performance,   
I compared dcopy() from blas, memcpy() and just C code with optimizer  
turned up in PGI/7.2


Results are here:

http://www.mlds-networks.com/index.php/component/option,com_mojo/ 
Itemid,29/p,49/






Ashley,

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Ashley Pittman
On Wed, 2008-10-08 at 09:46 -0400, Jeff Squyres wrote:
> - Have you tried compiling Open MPI with something other than GCC?   
> Just this week, we've gotten some reports from an OMPI member that  
> they are sometimes seeing *huge* performance differences with OMPI  
> compiled with GCC vs. any other compiler (Intel, PGI, Pathscale).
> We  
> are working to figure out why; no root cause has been identified yet.

Jeff,

You probably already know this but the obvious candidate here is the
memcpy() function, icc sticks in it's own which in some cases is much
better than the libc one.  It's unusual for compilers to have *huge*
differences from code optimisations alone.

Ashley,



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Jeff Squyres

On Oct 8, 2008, at 10:26 AM, Sangamesh B wrote:

- What version of Open MPI are you using?  Please send the  
information listed here:

1.2.7

   http://www.open-mpi.org/community/help/

- Did you specify to use mpi_leave_pinned?
No
Use "--mca mpi_leave_pinned 1" on your mpirun command line (I don't  
know if leave pinned behavior benefits Gromacs or not, but it likely  
won't hurt)


I see from your other mail that you are not using IB.  If you're only  
using TCP, then mpi_leave_pinned will have little/no effect.



- Did you enable processor affinity?
No
 Use "--mca mpi_paffinity_alone 1" on your mpirun command line.
Will use these options in the next benchmark

- Are you sure that Open MPI didn't fall back to ethernet (and not  
use IB)?  Use "--mca btl openib,self" on your mpirun command line.
I'm using TCP. There is no infiniband support. But eventhough the  
results can be compared?


Yes, they should be comparable.  We've always known that our TCP  
support is "ok" but not "great" (truthfully: we've not tuned it nearly  
as extensively as we've tuned our other transports).  But such a huge  
performance difference is surprising.


It this one 1 or more nodes?  It might be useful to delineate between  
TCP and shared memory performance difference.  I believe that MPICH2's  
shmem performance is likely to be better than OMPI v1.2's, but like  
TCP, it shouldn't be *that* huge.



- Have you tried compiling Open MPI with something other than GCC?
No.
 Just this week, we've gotten some reports from an OMPI member that  
they are sometimes seeing *huge* performance differences with OMPI  
compiled with GCC vs. any other compiler (Intel, PGI, Pathscale).   
We are working to figure out why; no root cause has been identified  
yet.

I'll try for other than gcc and comeback to you


That would be most useful; thanks.

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Sangamesh B
FYI attached here OpenMPI install details

On Wed, Oct 8, 2008 at 7:56 PM, Sangamesh B  wrote:

>
>
> On Wed, Oct 8, 2008 at 7:16 PM, Jeff Squyres  wrote:
>
>> On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:
>>
>>I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI
>>> supports both ethernet and infiniband. Before doing that I tested an
>>> application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both
>>> have been compiled with GNU compilers.
>>>
>>> After this benchmark, I came to know that OpenMPI is slower than MPICH2.
>>>
>>> This benchmark is run on a AMD dual core, dual opteron processor. Both
>>> have compiled with default configurations.
>>>
>>> The job is run on 2 nodes - 8 cores.
>>>
>>> OpenMPI - 25 m 39 s.
>>> MPICH2  -  15 m 53 s.
>>>
>>
>>
>> A few things:
>>
>> - What version of Open MPI are you using?  Please send the information
>> listed here:
>>
> 1.2.7
>
>>
>>http://www.open-mpi.org/community/help/
>>
>> - Did you specify to use mpi_leave_pinned?
>
> No
>
>> Use "--mca mpi_leave_pinned 1" on your mpirun command line (I don't know
>> if leave pinned behavior benefits Gromacs or not, but it likely won't hurt)
>>
>
>> - Did you enable processor affinity?
>
> No
>
>>  Use "--mca mpi_paffinity_alone 1" on your mpirun command line.
>>
> Will use these options in the next benchmark
>
>>
>> - Are you sure that Open MPI didn't fall back to ethernet (and not use
>> IB)?  Use "--mca btl openib,self" on your mpirun command line.
>>
> I'm using TCP. There is no infiniband support. But eventhough the results
> can be compared?
>
>>
>> - Have you tried compiling Open MPI with something other than GCC?
>
> No.
>
>>  Just this week, we've gotten some reports from an OMPI member that they
>> are sometimes seeing *huge* performance differences with OMPI compiled with
>> GCC vs. any other compiler (Intel, PGI, Pathscale).  We are working to
>> figure out why; no root cause has been identified yet.
>>
> I'll try for other than gcc and comeback to you
>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Sangamesh B
On Wed, Oct 8, 2008 at 7:16 PM, Jeff Squyres  wrote:

> On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:
>
>I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI
>> supports both ethernet and infiniband. Before doing that I tested an
>> application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both
>> have been compiled with GNU compilers.
>>
>> After this benchmark, I came to know that OpenMPI is slower than MPICH2.
>>
>> This benchmark is run on a AMD dual core, dual opteron processor. Both
>> have compiled with default configurations.
>>
>> The job is run on 2 nodes - 8 cores.
>>
>> OpenMPI - 25 m 39 s.
>> MPICH2  -  15 m 53 s.
>>
>
>
> A few things:
>
> - What version of Open MPI are you using?  Please send the information
> listed here:
>
1.2.7

>
>http://www.open-mpi.org/community/help/
>
> - Did you specify to use mpi_leave_pinned?

No

> Use "--mca mpi_leave_pinned 1" on your mpirun command line (I don't know if
> leave pinned behavior benefits Gromacs or not, but it likely won't hurt)
>

> - Did you enable processor affinity?

No

>  Use "--mca mpi_paffinity_alone 1" on your mpirun command line.
>
Will use these options in the next benchmark

>
> - Are you sure that Open MPI didn't fall back to ethernet (and not use IB)?
>  Use "--mca btl openib,self" on your mpirun command line.
>
I'm using TCP. There is no infiniband support. But eventhough the results
can be compared?

>
> - Have you tried compiling Open MPI with something other than GCC?

No.

>  Just this week, we've gotten some reports from an OMPI member that they
> are sometimes seeing *huge* performance differences with OMPI compiled with
> GCC vs. any other compiler (Intel, PGI, Pathscale).  We are working to
> figure out why; no root cause has been identified yet.
>
I'll try for other than gcc and comeback to you

>
> --
> Jeff Squyres
> Cisco Systems
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Sangamesh B
On Wed, Oct 8, 2008 at 7:09 PM, Brock Palen  wrote:

> Your doing this on just one node?  That would be using the OpenMPI SM
> transport,  Last I knew it wasn't that optimized though should still be much
> faster than TCP.
>

its on 2 nodes. I'm using TCP only. There is no infiniband hardware.

>
> I am surpised at your result though I do not have MPICH2 on the cluster
> right now I don't have time to compare.
>
> How did you run the job?


MPICH2:

time /opt/mpich2/gnu/bin/mpirun -machinefile ./mach -np 8
/opt/apps/gromacs333/bin/mdrun_mpi | tee gro_bench_8p

OpenMPI:

$ time /opt/ompi127/bin/mpirun -machinefile ./mach -np 8
/opt/apps/gromacs333_ompi/bin/mdrun_mpi | tee gromacs_openmpi_8process


>
>
> Brock Palen
> www.umich.edu/~brockp 
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
>
> On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:
>
>  Hi All,
>>
>>   I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI
>> supports both ethernet and infiniband. Before doing that I tested an
>> application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both
>> have been compiled with GNU compilers.
>>
>> After this benchmark, I came to know that OpenMPI is slower than MPICH2.
>>
>> This benchmark is run on a AMD dual core, dual opteron processor. Both
>> have compiled with default configurations.
>>
>> The job is run on 2 nodes - 8 cores.
>>
>> OpenMPI - 25 m 39 s.
>> MPICH2  -  15 m 53 s.
>>
>> Any comments ..?
>>
>> Thanks,
>> Sangamesh
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Samuel Sarholz

Hi,

my experience is that OpenMPI has slightly less latency and less 
bandwidth than Intel MPI (which is based on mpich2) using InfiniBand.

I don't remember the numbers using shared memory.

As you are seeing a huge difference, I would suspect that either 
something with your compilation is strange or more probable that you hit 
the cc-numa effect of the Opteron.
You might want to bind the MPI processes (and even clean the filesystem 
caches) to avoid that effect.


best regards,
Samuel

Sangamesh B wrote:

Hi All,

   I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI 
supports both ethernet and infiniband. Before doing that I tested an 
application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. 
Both have been compiled with GNU compilers.


After this benchmark, I came to know that OpenMPI is slower than MPICH2.

This benchmark is run on a AMD dual core, dual opteron processor. Both 
have compiled with default configurations.


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.

Any comments ..?

Thanks,
Sangamesh


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Jeff Squyres

On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:

   I wanted to switch from mpich2/mvapich2 to OpenMPI, as  
OpenMPI supports both ethernet and infiniband. Before doing that I  
tested an application 'GROMACS' to compare the performance of MPICH2  
& OpenMPI. Both have been compiled with GNU compilers.


After this benchmark, I came to know that OpenMPI is slower than  
MPICH2.


This benchmark is run on a AMD dual core, dual opteron processor.  
Both have compiled with default configurations.


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.



A few things:

- What version of Open MPI are you using?  Please send the information  
listed here:


http://www.open-mpi.org/community/help/

- Did you specify to use mpi_leave_pinned?  Use "--mca  
mpi_leave_pinned 1" on your mpirun command line (I don't know if leave  
pinned behavior benefits Gromacs or not, but it likely won't hurt)


- Did you enable processor affinity?  Use "--mca mpi_paffinity_alone  
1" on your mpirun command line.


- Are you sure that Open MPI didn't fall back to ethernet (and not use  
IB)?  Use "--mca btl openib,self" on your mpirun command line.


- Have you tried compiling Open MPI with something other than GCC?   
Just this week, we've gotten some reports from an OMPI member that  
they are sometimes seeing *huge* performance differences with OMPI  
compiled with GCC vs. any other compiler (Intel, PGI, Pathscale).  We  
are working to figure out why; no root cause has been identified yet.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Brock Palen
Your doing this on just one node?  That would be using the OpenMPI SM  
transport,  Last I knew it wasn't that optimized though should still  
be much faster than TCP.


I am surpised at your result though I do not have MPICH2 on the  
cluster right now I don't have time to compare.


How did you run the job?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:


Hi All,

   I wanted to switch from mpich2/mvapich2 to OpenMPI, as  
OpenMPI supports both ethernet and infiniband. Before doing that I  
tested an application 'GROMACS' to compare the performance of  
MPICH2 & OpenMPI. Both have been compiled with GNU compilers.


After this benchmark, I came to know that OpenMPI is slower than  
MPICH2.


This benchmark is run on a AMD dual core, dual opteron processor.  
Both have compiled with default configurations.


The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.

Any comments ..?

Thanks,
Sangamesh
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Ray Muno
I would be interested in what others have to say about this as well.

We have been doing a bit of performance testing since we are deploying a
new cluster and it is our first InfiniBand based set up.

In our experience, so far, OpenMPI is coming out faster than MVAPICH.
Comparisons were made with different compilers, PGI and Pathscale. We do
not have a running implementation of OpenMPI with SunStudio compilers.

Our tests were with actual user codes running on up to 600 processors so
far.


Sangamesh B wrote:
> Hi All,
> 
>I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI
> supports both ethernet and infiniband. Before doing that I tested an
> application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both
> have been compiled with GNU compilers.
> 
> After this benchmark, I came to know that OpenMPI is slower than MPICH2.
> 
> This benchmark is run on a AMD dual core, dual opteron processor. Both have
> compiled with default configurations.
> 
> The job is run on 2 nodes - 8 cores.
> 
> OpenMPI - 25 m 39 s.
> MPICH2  -  15 m 53 s.
> 
> Any comments ..?
> 
> Thanks,
> Sangamesh
> 

-Ray Muno
 Aerospace Engineering.


[OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-08 Thread Sangamesh B
Hi All,

   I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI
supports both ethernet and infiniband. Before doing that I tested an
application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both
have been compiled with GNU compilers.

After this benchmark, I came to know that OpenMPI is slower than MPICH2.

This benchmark is run on a AMD dual core, dual opteron processor. Both have
compiled with default configurations.

The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.

Any comments ..?

Thanks,
Sangamesh