Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

2017-03-24 Thread Tim Prince via users

On 3/24/2017 6:10 PM, Reuti wrote:

Hi,

Am 24.03.2017 um 20:39 schrieb Jeff Squyres (jsquyres):


Limiting MPI processes to hyperthreads *helps*, but current generation Intel 
hyperthreads are not as powerful as cores (they have roughly half the resources 
of a core), so -- depending on your application and your exact system setup -- 
you will almost certainly see performance degradation of running N MPI 
processes across N cores vs. across N hyper threads.  You can try it yourself 
by running the same size application over N cores on a single machine, and then 
run the same application over N hyper threads (i.e., N/2 cores) on the same 
machine.

[…]

- Disabling HT in the BIOS means that the one hardware thread left in each core 
will get all the cores resources (buffers, queues, processor units, etc.).
- Enabling HT in the BIOS means that each of the 2 hardware threads will 
statically be allocated roughly half the core's resources (buffers, queues, 
processor units, etc.).


Do you have a reference for the two topics above (sure, I will try next week on 
my own)? My knowledge was, that there is no dedicated HT core, and using all 
cores will not give the result that the real cores get N x 100%, plus the HT 
ones N x 50% (or alike). But the scheduler inside the CPU will balance the 
resources between the double face of a single core and both are equal.



[…]
Spoiler alert: many people have looked at this.  In *most* (but not all) cases, 
using HT is not a performance win for MPI/HPC codes that are designed to run 
processors at 100%.


I think it was also on this mailing list, that someone mentioned that the 
pipelines in the CPU are reorganized in case you switch HT off, as only half of 
them would be needed and these resources are then bound to the real cores too, 
extending their performance. Similar, but not exactly what Jeff mentiones above.

Another aspect is, that even if they are not really doubling the performance, 
one might get 150%. And if you pay per CPU hours, it can be worth to have it 
switched on.

My personal experience is, that it depends not only application, but also on 
the way how you oversubscribe. Using all cores for a single MPI application 
leads to the effect, that all processes are doing the same stuff at the same 
time (at least often) and fight for the same part of the CPU, essentially 
becoming a bottleneck. But using each half of a CPU for two (or even more) 
applications will allow a better interleaving in the demand for resources. To 
allow this in the best way: no taskset or binding to cores, let the Linux 
kernel and CPU do their best - YMMV.

-- Reuti
___

HT implementations vary in some of the details to which you refer.
The most severe limitation in disabling HT on Intel CPUs of the last 5 
years has been that half of the hardware ITLB entries remain 
inaccessible.  This was supposed not to be a serious limitation for many 
HPC applications.
Applications where each thread needs all of L1 or fill (cache lines 
pending update) buffers aren't so suitable for HT.  Intel compilers have 
some ability at -O3 to adjust automatic loop fission and fusion for 
applications with high fill buffer demand, requiring that there be just 
1 thread using those buffers.
HT threading actually reduces in practice the rate at which FPU 
instructions may be issued on Intel "big core" CPUs.
HT together with MPI usually requires effective HT-aware pinning.  It 
seems unusual for MPI ranks to share cores effectively simply under 
control of kernel scheduling (although linux is more capable than 
Windows).  Agree that explicit use of taskset under MPI should have been 
superseded by the options implemented by several MPI including openmpi.


--
Tim Prince
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Rounding errors and MPI

2017-01-16 Thread Tim Prince via users
You might try inserting parentheses so as to specify your preferred order of 
evaluation. If using ifort, you would need -assume protect-parens .

Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone

 Original Message 
From:Oscar Mojica 
Sent:Mon, 16 Jan 2017 08:28:05 -0500
To:Open MPI User's List 
Subject:[OMPI users] Rounding errors and MPI

>___
>users mailing list
>users@lists.open-mpi.org
>https://rfd.newmexicoconsortium.org/mailman/listinfo/users___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openmpi-2.0.1

2016-11-17 Thread Tim Prince via users


On 11/17/2016 8:45 AM, Professor W P Jones wrote:
> Hi
>
> I am trying to install openmpi-2.0.1 togeter with the version 14.0.2
> intel compilers and I an having problems.  The configure script with
> CC=icc CXX=icpc and FC=ifort runs successfully but when i issue make
> all install this fails with the output:
>
>
> Making all in tools/ompi_info
> make[2]: Entering directory
> `/usr/local/src/openmpi-2.0.1/ompi/tools/ompi_info'
>   CCLD ompi_info
> ld: warning: libimf.so, needed by ../../../ompi/.libs/libmpi.so, not
> found (try using -rpath or -rpath-link)
> ld: warning: libsvml.so, needed by ../../../ompi/.libs/libmpi.so, not
> found (try using -rpath or -rpath-link)
> ld: warning: libirng.so, needed by ../../../ompi/.libs/libmpi.so, not
> found (try using -rpath or -rpath-link)
> ld: warning: libintlc.so.5, needed by ../../../ompi/.libs/libmpi.so,
> not found (try using -rpath or -rpath-link)
> ld: .libs/ompi_info: hidden symbol `__intel_cpu_features_init_x' in
> /opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64//libirc.a(cpu_feature_disp.o)
> is referenced by DSO
> ld: final link failed: Bad value
> make[2]: *** [ompi_info] Error 1
> make[2]: Leaving directory
> `/usr/local/src/openmpi-2.0.1/ompi/tools/ompi_info'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/usr/local/src/openmpi-2.0.1/ompi'
> make: *** [all-recursive] Error 1
>
Do you have the Intel compilervars.[c]sh sourced (and associated library
files visible) on each node where you expect to install?

-- 
Tim Prince

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Problems in compiling a code with dynamic linking

2016-03-24 Thread Tim Prince


On 3/24/2016 12:01 AM, Gilles Gouaillardet wrote:
> Elio,
>
> usually, /opt is a local filesystem, so it is possible /opt/intel is
> only available on your login nodes.
>
> your best option is to ask your sysadmin where the mkl libs are on the
> compute nodes, and/or how to use mkl in your jobs.
>
> feel free to submit a dumb pbs script
> ls -l /opt
> ls -l /opt/intel
> ls -l /opt/intel/mkl
> so you can hopefully find that by yourself.
>
> an other option is to use the static mkl libs if they are available
> for example, your LIB line could be
>
> LIB = -static -L/opt/intel/composer_xe_2013_sp1/mkl/lib/intel64
> -lmkl_blas95_lp64 -lmkl_lapack95_lp64  -lmkl_intel_lp64 -lmkl_core 
> -lmkl_sequential -dynamic
>
No, refer to the on-line advisor at
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

-- 
Tim Prince



Re: [OMPI users] How to run OpenMPI C code under Windows 7

2015-11-22 Thread Tim Prince


On 11/22/2015 5:04 PM, Philip Bitar wrote:
> *How to run OpenMPI C code under Windows 7*
>
> I'm trying to get OpenMPI C code to run under Windows 7 any way that I
> can. Evidently there is no current support for running OpenMPI
> directly under Windows 7, so I installed Cygwin. Is there a better way
> to run OpenMPI C code under Windows 7?
>
> Under Cygwin, I installed a GCC C compiler, which works.
>
> I also installed an OpenMPI package. Here is a link to a list of the
> files in the Cygwin OpenMPI package:
>
> https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86%2Flibopenmpi%2Flibopenmpi-1.8.6-1=openmpi
>
> My PATH variable is as follows:
>
> /usr/local/bin:/usr/bin
>
> mpicc will compile, but it won't link. It can't find the following:
>
> -lmpi
> -lopen-rte
> -lopen-pal
>
> The test program includes stdio.h and is nothing more than printf
> hello world. I can compile and run it using the GCC C compiler.
>
> Presumably I need to update the PATH variable so that the link step
> will find the missing components. Are those components file names or
> info contained in some other files? Can I verify that the needed files
> have been installed?
>
> I would also be pleased to obtain a link to material that explains the
> OpenMPI system, in general, and the OpenMPI C functions, in
> particular, so that I can write C programs to use the OpenMPI system.
>
> I looked for this kind of info on the web, but I haven't found it yet.
> Maybe it's on the OpenMPI site, and I missed it.
>
>
You probably want the libopenmpi-devel package from cygwin setup.exe as
well.  If you have windows 7 X64, the x86_64 cygwin is probably
preferable to 32-bit (can't see which you started with).
An alternative, with a build of mingw x86-64, is Walt Brainerd's CAF
build.  If this wasn't discussed in the OpenMPI archives, but has not
been withdrawn, you might ask the author, e.g.
https://groups.google.com/forum/#!searchin/comp.lang.fortran/coarray$20fortran/comp.lang.fortran/P5si9Fj1yIY/ptjM8DMUUzUJ
It's a little difficult to use if you have another MPI installed, as
Windows MPI (like the MPI which comes with linux distros) don't observe
normal methods for keeping distinct paths.
I doubt there is a separate version of OpenMPI docs specific to Windows.

-- 
Tim Prince



Re: [OMPI users] Binding to hardware thread

2015-09-27 Thread Tim Prince


On 9/27/2015 6:02 PM, Saliya Ekanayake wrote:
>
> I couldn't find any option in OpenMPI to bind a process to a hardware
> thread. I am assuming this is not yet supported through binding
> options. Could specifying a rank file be used as a workaround for this?
>
>
Why not start with the FAQ?
https://www.open-mpi.org/faq/?category=openfabrics
Don't go by what the advertisements of other MPI implementations said
based on past defaults.

-- 
Tim Prince



Re: [OMPI users] Anyone successfully running Abaqus with OpenMPI?

2015-06-22 Thread Tim Prince


On 6/22/2015 6:06 PM, Belgin, Mehmet wrote:
>
>
> Abaqus documentation suggests that it may be possible to run it using
> an external MPI stack, and I am hoping to make it work with our stock
> openmpi/1.8.4 that knows how to talk with the scheduler's hwloc.
> Unfortunately, however, all of my attempts failed miserably so far (no
> specific instructions for openmpi).
>
> I was wondering if anyone had success with getting Abaqus running with
> openmpi. Even the information of whether it is possible or not will
> help us a great deal.
>
>
Data types encodings are incompatible between openmpi and mpich
derivatives, and, I think, with the HP or Platform-MPI normally used by
past Abaqus releases.  You should be looking at Abaqus release notes for
your version.
Comparing include files between the various MPI families should give you
a clue about type encoding compatibility.  Lack of instructions for
openmpi probably means something.

-- 
Tim Prince



Re: [OMPI users] mpirun

2015-05-29 Thread Tim Prince
I don't recall Walt 's cases taking all of 5 seconds to start. More annoying is 
the hang after completion.

Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone

 Original Message 
From:Ralph Castain 
Sent:Fri, 29 May 2015 15:35:15 -0400
To:Open MPI Users 
Subject:Re: [OMPI users] mpirun

>I assume you mean on cygwin? Or is this an older version that supported native 
>Windows?
>
>> On May 29, 2015, at 12:34 PM, Walt Brainerd  wrote:
>> 
>> On Windows, mpirun appears to take about 5 seconds
>> to start. I can't try it on Linux. Intel takes no time to
>> start executing its version.
>> 
>> Is this expected?
>> 
>> -- 
>> Walt Brainerd
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/05/26988.php
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2015/05/26989.php


Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-17 Thread Tim Prince
Check by ldd in case you didn't update .so path

Sent via the ASUS PadFone X mini, an AT 4G LTE smartphone

 Original Message 
From:John Bray 
Sent:Mon, 17 Nov 2014 11:41:32 -0500
To:us...@open-mpi.org
Subject:[OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does  
nothing silently

>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/11/25823.php

Re: [OMPI users] Multiple threads for an mpi process

2014-09-12 Thread Tim Prince


On 9/12/2014 9:22 AM, JR Cary wrote:



On 9/12/14, 7:27 AM, Tim Prince wrote:


On 9/12/2014 6:14 AM, JR Cary wrote:

This must be a very old topic.

I would like to run mpi with one process per node, e.g., using
-cpus-per-rank=1.  Then I want to use openmp inside of that.
But other times I will run with a rank on each physical core.

Inside my code I would like to detect which situation I am in.
Is there an openmpi api call to determine that?

omp_get_num_threads() should work.  Unless you want to choose a 
different non-parallel algorithm for this case, a single thread omp 
parallel region works fine.
You should soon encounter cases where you want intermediate choices, 
such as 1 rank per CPU package and 1 thread per core, even if you 
stay away from platforms with more than 12 cores per CPU.


I may not understand, so I will try to ask in more detail.

Suppose I am running on a four-core processor (and my code likes one 
thread per core).


In case 1 I do

  mpiexec -np 2 myexec

and I want to know that each mpi process should use 2 threads.

If instead I did

  mpiexec -np 4 myexec

I want to know that each mpi process should use one thread.

Will omp_get_num_threads() should return a different value for those 
two cases?


Perhaps I am not invoking mpiexec correctly.
I use MPI_Init_thread(, , MPI_THREAD_FUNNELED, 
), and regardless

of what how I invoke mpiexec (-n 1, -n 2, -n 4), I see 2 openmp processes
and 1 openmp threads (have not called omp_set_num_threads).
When I run serial, I see 8 openmp processes and 1 openmp threads.
So I must be missing an arg to mpiexec?

This is a 4-core haswell with hyperthreading to get 8.


Sorry, I assumed you were setting OMP_NUM_THREADS for your runs.  If you 
don't do that, each instance of OpenMP will try to run 8 threads, where 
you probably want just 1 thread per core.  I turn off hyperthreading in 
BIOS on my machines, as I never run anything which would benefit from it.


Re: [OMPI users] Multiple threads for an mpi process

2014-09-12 Thread Tim Prince


On 9/12/2014 6:14 AM, JR Cary wrote:

This must be a very old topic.

I would like to run mpi with one process per node, e.g., using
-cpus-per-rank=1.  Then I want to use openmp inside of that.
But other times I will run with a rank on each physical core.

Inside my code I would like to detect which situation I am in.
Is there an openmpi api call to determine that?

omp_get_num_threads() should work.  Unless you want to choose a 
different non-parallel algorithm for this case, a single thread omp 
parallel region works fine.
You should soon encounter cases where you want intermediate choices, 
such as 1 rank per CPU package and 1 thread per core, even if you stay 
away from platforms with more than 12 cores per CPU.


Re: [OMPI users] openMP and mpi problem

2014-07-05 Thread Tim Prince


On 7/4/2014 11:22 AM, Timur Ismagilov wrote:



1. Intell mpi is located here: /opt/intel/impi/4.1.0/intel64/lib. I 
have added OMPI path at the start and got the same output.



If you can't read your own thread due to your scrambling order of posts, 
I'll simply reiterate what was mentioned before:
ifort has its own mpiexec in the compiler install path to support 
co-array (not true MPI), so your MPI path entries must precede the ifort 
ones.  Thus, it remains important to try checks such as 'which mpiexec' 
and assure that you are running the intended components. ifort co-arrays 
will not cooperate with presence of OpenMPI.


--
Tim Prince



Re: [OMPI users] openmpi linking problem

2014-06-09 Thread Tim Prince


On 6/9/2014 1:14 PM, Sergii Veremieiev wrote:

Dear Sir/Madam,

I'm trying to link a C/FORTRAN code on Cygwin with Open MPI 1.7.5 and 
GCC 4.8.2:


mpicxx ./lib/Multigrid.o ./lib/GridFE.o ./lib/Data.o ./lib/GridFD.o 
./lib/Parameters.o ./lib/MtInt.o ./lib/MtPol.o ./lib/MtDob.o -o 
Test_cygwin_openmpi_gcc  -L./external/MUMPS/lib 
-ldmumps_cygwin_openmpi_gcc -lmumps_common_cygwin_openmpi_gcc 
-lpord_cygwin_openmpi_gcc -L./external/ParMETIS 
-lparmetis_cygwin_openmpi_gcc -lmetis_cygwin_openmpi_gcc 
-L./external/SCALAPACK -lscalapack_cygwin_openmpi_gcc 
-L./external/BLACS/LIB -lblacs-0_cygwin_openmpi_gcc 
-lblacsF77init-0_cygwin_openmpi_gcc -lblacsCinit-0_cygwin_openmpi_gcc 
-lblacs-0_cygwin_openmpi_gcc -L./external/BLAS 
-lblas_cygwin_openmpi_gcc -lmpi -lgfortran


The following error messages are returned:

./external/MUMPS/lib/libdmumps_cygwin_openmpi_gcc.a(dmumps_part3.o): 
In function `dmumps_127_':
/cygdrive/d/Sergey/Research/Codes/Thinfilmsolver/external/MUMPS/src/dmumps_part3.F:6068: 
undefined reference to `mpi_send_'


You appear to need the MPI Fortran libraries (built with your version of 
gfortran) corresponding to mpif.h or use mpi...
If you can use mpifort to link, you would use -lstdc++ in place of -lmpi 
-lgfortran .


--
Tim Prince



Re: [OMPI users] intel compiler and openmpi 1.8.1

2014-05-29 Thread Tim Prince


On 05/29/2014 07:11 AM, Lorenzo Donà wrote:

I compiled openmpi 1.8.1 with intel compiler with this conf.
./configure FC=ifort  CC=icc  CXX=icpc 
--prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/

but when i write mpif90  -v i found:
Using built-in specs.
COLLECT_GCC=/opt/local/bin/gfortran-mp-4.8
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin13/4.8.2/lto-wrapper
Target: x86_64-apple-darwin13
Configured with: 
/opt/local/var/macports/build/_opt_mports_dports_lang_gcc48/gcc48/work/gcc-4.8.2/configure 
--prefix=/opt/local --build=x86_64-apple-darwin13 
--enable-languages=c,c++,objc,obj-c++,lto,fortran,java 
--libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 
--infodir=/opt/local/share/info --mandir=/opt/local/share/man 
--datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local 
--with-system-zlib --disable-nls --program-suffix=-mp-4.8 
--with-gxx-include-dir=/opt/local/include/gcc48/c++/ 
--with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local 
--with-cloog=/opt/local --enable-cloog-backend=isl 
--disable-cloog-version-check --enable-stage1-checking 
--disable-multilib --enable-lto --enable-libstdcxx-time 
--with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld 
--with-ar=/opt/local/bin/ar 
--with-bugurl=https://trac.macports.org/newticket 
--with-pkgversion='MacPorts gcc48 4.8.2_0'

Thread model: posix
gcc version 4.8.2 (MacPorts gcc48 4.8.2_0)

and version i found:
GNU Fortran (MacPorts gcc48 4.8.2_0) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

So I think that is not compiled with intel compiler please can you 
help me.

thanks thanks a lot for your patience and to help me

Perhaps you forgot to make the Intel compilers active in your configure 
session.  Normally this would be done by command such as

source /opt/intel/composer_xe_2013/bin/compilervars.sh intel64

In such a case, if you would examine the configure log, you would expect 
to see a failed attempt to reach ifort, falling back to your gfortran.


On the C and C++ side, the MPI libraries should be compatible between 
gnu and Intel compilers, but the MPI Fortran library would not be 
compatible between gfortran and ifort.


Re: [OMPI users] openMPI in 64 bit

2014-05-15 Thread Tim Prince


On 5/15/2014 3:13 PM, Ajay Nair wrote:


I have been using openMPI for my application with intel visual 
fortran. The version that I am currently using is openMPI-1.6.2. It 
works fine iwth fortran code compiled in 32bit and run it with openMPI 
32 bit files. However recently I moved to a 64 bit machine and even 
though I could compile the code successfully with intel fortran 64 bit 
and also pointing the openMPI to the corresponding 64 bit files, the 
exe would not start and threw the error:

*the application was unable to start correctly (0x7b)*
*
*
This is because the msvcr100d.dll file (this is required by openMPI 
even when I run in 32bit mode) is a 32 bit dll file and it probably 
requires 64 bit equivalent. I could not find any 64 bit equivalent for 
this dll.
My question is why is openMPI looking for this dll file (even in case 
of 32bit compilation). Can i do away with this dependency or is there 
any way I can run it in 64 bit?



64-bit Windows of course includes full 32-bit support, so you might 
still run your 32-bit MPI application.
You would need a full 64-bit build of the MPI libraries for 
compatibility with your 64-bit application.  I haven't seen any 
indication that anyone is supporting openmpi for ifort Windows 64-bit.  
The closest openmpi thing seems to be the cygwin (gcc/gfortran) build.  
Windows seems to be too crowded for so many MPI versions to succeed.


--
Tim Prince



Re: [OMPI users] busy waiting and oversubscriptions

2014-03-26 Thread Tim Prince


On 3/26/2014 6:45 AM, Andreas Schäfer wrote:

On 10:27 Wed 26 Mar , Jeff Squyres (jsquyres) wrote:

Be aware of a few facts, though:

1. There is a fundamental difference between disabling
hyperthreading in the BIOS at power-on time and simply running one
MPI process per core.  Disabling HT at power-on allocates more
hardware resources to the remaining HT that is left is each core
(e.g., deeper queues).

Oh, I didn't know that. That's interesting! Do you have any links with
in-depth info on that?


On certain Intel CPUs, the full size instruction TLB was available to a 
process when HyperThreading was disabled on the BIOS setup menu, and 
that was the only way to make all the Write Combine buffers available to 
a single process.  Those CPUs are no longer in widespread use.


At one time, at Intel, we did a study to evaluate the net effect (on a 
later CPU where this did not recover ITLB size).   The result was buried 
afterwards; possibly it didn't meet an unspecified marketing goal. 
Typical applications ran 1% faster with HyperThreading disabled by BIOS 
menu even with affinities carefully set to use just one process per 
core.  Not all applications showed a loss on all data sets when leaving 
HT enabled.
There are a few MPI applications with specialized threading which could 
gain 10% or more by use of HT.


In my personal opinion, SMT becomes less interesting as the number of 
independent cores increases.
Intel(r) Xeon Phi(tm) is an exception, as the vector processing unit 
issues instructions from a single thread only on alternate cycles. This 
capability is used more effectively by running OpenMP threads under MPI, 
e.g. 6 ranks per coprocessor of 30 threads each, spread across 10 cores 
per rank (exact optimum depending on the application; MKL libraries use 
all available hardware threads for sufficiently large data sets).


--
Tim Prince



Re: [OMPI users] linking with openmpi version 1.6.1

2014-02-24 Thread Tim Prince


On 2/24/2014 4:45 PM, Jeff Squyres (jsquyres) wrote:

This is not an issue with Open MPI; it's an issue with how the Fortran compiler works on 
your Linux system.  It's choosing to put suffix it Fortran symbols with "_" 
(and possibly in some cases


[with long past compilers],


  "__")
, whereas the C compiler is not.  FWIW, this is a fairly common Fortran Linux 
compiler convention.


Or you can use the new Fortran'08 C interop stuff (BIND(C)), in which you can 
specify the C symbol name in the Fortran code.  Be aware that while this is 
supported in some Fortran compilers, it is not yet necessarily supported in the 
version of gfortran that you may be using.
iso_c_binding was introduced in Fortran 03, and supported in gfortran at 
least since version 4.4, which is about as old a version as you have any 
business trying (no older ones have adequate documentation remaining on 
line).


Also, FWIW, OMPI 1.6.1 is ancient.  Can you upgrade to the latest 1.6.x version 
of Open MPI: 1.6.5?





--
Tim Prince



Re: [OMPI users] Use of __float128 with openmpi

2014-02-01 Thread Tim Prince


On 02/01/2014 12:42 PM, Patrick Boehl wrote:

Hi all,

I have a question on datatypes in openmpi:

Is there an (easy?) way to use __float128 variables with openmpi?

Specifically, functions like

MPI_Allreduce

seem to give weird results with __float128.

Essentially all I found was

http://beige.ucs.indiana.edu/I590/node100.html

where they state

MPI_LONG_DOUBLE
   This is a quadruple precision, 128-bit long floating point number.


But as far as I have seen, MPI_LONG_DOUBLE is only used for long doubles.

The Open MPI Version is 1.6.3 and gcc is 4.7.3 on a x86_64 machine.

It seems unlikely that 10 year old course notes on an unspecified MPI 
implementation (hinted to be IBM power3) would deal with specific 
details of openmpi on a different architecture.
Where openmpi refers to "portable C types" I would take long double to 
be the 80-bit hardware format you would have in a standard build of gcc 
for x86_64.  You should be able to gain some insight by examining your 
openmpi build logs to see if it builds for both __float80 and __float128 
(or neither).  gfortran has a 128-bit data type (software floating point 
real(16), corresponding to __float128); you should be able to see in the 
build logs whether that data type was used.





Re: [OMPI users] Use of __float128 with openmpi

2014-02-01 Thread Tim Prince


On 02/01/2014 12:42 PM, Patrick Boehl wrote:

Hi all,

I have a question on datatypes in openmpi:

Is there an (easy?) way to use __float128 variables with openmpi?

Specifically, functions like

MPI_Allreduce

seem to give weird results with __float128.

Essentially all I found was

http://beige.ucs.indiana.edu/I590/node100.html

where they state

MPI_LONG_DOUBLE
   This is a quadruple precision, 128-bit long floating point number.


But as far as I have seen, MPI_LONG_DOUBLE is only used for long doubles.

The Open MPI Version is 1.6.3 and gcc is 4.7.3 on a x86_64 machine.

It seems unlikely that 10 year old course notes on an unspecified MPI 
implementation (hinted to be IBM power3) would deal with specific 
details of openmpi on a different architecture.
Where openmpi refers to "portable C types" I would take long double to 
be the 80-bit hardware format you would have in a standard build of gcc 
for x86_64.  You should be able to gain some insight by examining your 
openmpi build logs to see if it builds for both __float80 and __float128 
(or neither).  gfortran has a 128-bit data type (software floating point 
real(16), corresponding to __float128); you should be able to see in the 
build logs whether that data type was used.





Re: [OMPI users] Running on two nodes slower than running on one node

2014-01-30 Thread Tim Prince


On 1/29/2014 11:30 PM, Ralph Castain wrote:


On Jan 29, 2014, at 7:56 PM, Victor <victor.ma...@gmail.com 
<mailto:victor.ma...@gmail.com>> wrote:


Thanks for the insights Tim. I was aware that the CPUs will choke 
beyond a certain point. From memory on my machine this happens with 5 
concurrent MPI jobs with that benchmark that I am using.


My primary question was about scaling between the nodes. I was not 
getting close to double the performance when running MPI jobs acros 
two 4 core nodes. It may be better now since I have Open-MX in place, 
but I have not repeated the benchmarks yet since I need to get one 
simulation job done asap.


Some of that may be due to expected loss of performance when you 
switch from shared memory to inter-node transports. While it is true 
about saturation of the memory path, what you reported could be more 
consistent with that transition - i.e., it isn't unusual to see 
applications perform better when run on a single node, depending upon 
how they are written, up to a certain size of problem (which your code 
may not be hitting).




Regarding your mention of setting affinities and MPI ranks do you 
have a specific (as in syntactically specific since I am a novice and 
easily confused...) examples how I may want to set affinities to get 
the Westmere node performing better?


mpirun --bind-to-core -cpus-per-rank 2 ...

will bind each MPI rank to 2 cores. Note that this will definitely 
*not* be a good idea if you are running more than two threads in your 
process - if you are, then set --cpus-per-rank to the number of 
threads, keeping in mind that you want things to break evenly across 
the sockets. In other words, if you have two 6 core/socket Westmere's 
on the node, then you either want to run 6 process at cpus-per-rank=2 
if each process runs 2 threads, or 4 processes with cpus-per-rank=3 if 
each process runs 3 threads, or 2 processes with no cpus-per-rank but 
--bind-to-socket instead of --bind-to-core for any other thread number 
> 3.


You would not want to run any other number of processes on the node or 
else the binding pattern will cause a single process to split its 
threads across the sockets - which will definitely hurt performance.



-cpus-per-rank 2 is an effective choice for this platform.  As Ralph 
said, it should work automatically for 2 threads per rank.
Ralph's point about not splitting a process across sockets is an 
important one.  Even splitting a process across internal busses, which 
would happen with 3 threads per process, seems problematical.


--
Tim Prince



Re: [OMPI users] Running on two nodes slower than running on one node

2014-01-30 Thread Tim Prince


On 1/29/2014 10:56 PM, Victor wrote:
Thanks for the insights Tim. I was aware that the CPUs will choke 
beyond a certain point. From memory on my machine this happens with 5 
concurrent MPI jobs with that benchmark that I am using.


Regarding your mention of setting affinities and MPI ranks do you have 
a specific (as in syntactically specific since I am a novice and 
easily ...) examples how I may want to set affinities to get the 
Westmere node performing better?


ompi_info returns this: MCA paffinity: hwloc (MCA v2.0, API v2.0, 
Component v1.6.5)


I haven't worked with current OpenMPI on Intel Westmere, although I do 
have a Westmere as my only dual CPU platform.  Ideally, the current 
scheme OpenMPI uses for MPI/OpenMP hybrid affinity will make it easy to 
allocate adjacent pairs of cores to ranks: [0,1], [2,3],[4,5],
hwloc will not be able to see whether cores [0,1] and [2,3] are actually 
the pairs sharing internal cache buss, and Intel never guaranteed it, 
but that is the only way I've seen it done (presumably controlled by BIOS).
If you had a requirement to run 1 rank per CPU, with 4 threads per CPU, 
you would pin a thread to the each of the core pairs [0,1] and [2,3] 
(and [6,7],[8,9].  If required to run 8 threads per CPU, using 
HyperThreading, you would pin 1 thread to each of the first 4 cores on 
each CPU and 2 threads each to the remaining cores (the ones which don't 
share cache paths).
Likewise, when you are testing pure MPI scaling, you would take care not 
to place a 2nd rank on a core pair wich shares an internal buss until 
you are using all 4 internal buss resources, and you would load up the 2 
CPUs symmetrically.  You might find that 8 ranks with optimized 
placement gave nearly the performance of 12 ranks, and that you need an 
effective hybrid MPI/OpenMP to get perhaps 25% additional performance by 
using the remaining cores.  I've never seen an automated scheme to deal 
with this.  If you ignored the placement requirements, you would find 
that 8 ranks on the 12 core platform didn't perform as well as on the 
similar 8 core platform.
Needless to say, these special requirements of this CPU model have 
eluded even experts, and led to it not being used to full 
effectiveness.  The reason we got into this is your remark that it 
seemed strange to you that you didn't gain performance when you added a 
rank, presumably a 2nd rank on a core pair sharing an internal buss.
You seem to have the impression that MPI performance scaling could be 
linear with the number of cores in use.  Such an expectation is 
unrealistic given that the point of multi-core platforms is to share 
memory and other resources and support more ranks without a linear 
increase in cost.
In your efforts to make an effective cluster out of nodes of dissimilar 
performance levels, you may need to explore means of evening up the 
performance per rank, such as more OpenMP threads per rank on the lower 
performance CPUs.  It really doesn't look like a beginner's project.


--
Tim Prince



Re: [OMPI users] Running on two nodes slower than running on one node

2014-01-29 Thread Tim Prince


On 1/29/2014 8:02 AM, Reuti wrote:

Quoting Victor <victor.ma...@gmail.com>:


Thanks for the reply Reuti,

There are two machines: Node1 with 12 physical cores (dual 6 core 
Xeon) and


Do you have this CPU?

http://ark.intel.com/de/products/37109/Intel-Xeon-Processor-X5560-8M-Cache-2_80-GHz-6_40-GTs-Intel-QPI 



-- Reuti

It's expected on the Xeon Westmere 6-core CPUs to see MPI performance 
saturating when all 4 of the internal buss paths are in use.  For this 
reason, hybrid MPI/OpenMP with 2 cores per MPI rank, with affinity set 
so that each MPI rank has its own internal CPU buss, could out-perform 
plain MPI on those CPUs.
That scheme of pairing cores on selected internal buss paths hasn't been 
repeated.  Some influential customers learned to prefer the 4-core 
version of that CPU, given a reluctance to adopt MPI/OpenMP hybrid with 
affinity.
If you want to talk about "downright strange," start thinking about the 
schemes to optimize performance of 8 threads with 2 threads assigned to 
each internal CPU buss on that CPU model.  Or your scheme of trying to 
balance MPI performance between very different CPU models.

Tim



Node2 with 4 physical cores (i5-2400).

Regarding scaling on the single 12 core node, not it is also not 
linear. In
fact it is downright strange. I do not remember the numbers right now 
but
10 jobs are faster than 11 and 12 are the fastest with peak 
performance of

approximately 66 Msu/s which is also far from triple the 4 core
performance. This odd non-linear behaviour also happens at the lower job
counts on that 12 core node. I understand the decrease in scaling with
increase in core count on the single node as the memory bandwidth is an
issue.

On the 4 core machine the scaling is progressive, ie. every 
additional job
brings an increase in performance. Single core delivers 8.1 Msu/s 
while 4

cores deliver 30.8 Msu/s. This is almost linear.

Since my original email I have also installed Open-MX and recompiled
OpenMPI to use it. This has resulted in approximately 10% better
performance using the existing GbE hardware.


On 29 January 2014 19:40, Reuti <re...@staff.uni-marburg.de> wrote:


Am 29.01.2014 um 03:00 schrieb Victor:

> I am running a CFD simulation benchmark cavity3d available within
http://www.palabos.org/images/palabos_releases/palabos-v1.4r1.tgz
>
> It is a parallel friendly Lattice Botlzmann solver library.
>
> Palabos provides benchmark results for the cavity3d on several 
different

platforms and variables here:
http://wiki.palabos.org/plb_wiki:benchmark:cavity_n400
>
> The problem that I have is that the benchmark performance on my 
cluster

does not scale even close to a linear scale.
>
> My cluster configuration:
>
> Node1: Dual Xeon 5560 48 Gb RAM
> Node2: i5-2400 24 Gb RAM
>
> Gigabit ethernet connection on eth0
>
> OpenMPI 1.6.5 on Ubuntu 12.04.3
>
>
> Hostfile:
>
> Node1 -slots=4 -max-slots=4
> Node2 -slots=4 -max-slots=4
>
> MPI command: mpirun --mca btl_tcp_if_include eth0 --hostfile
/home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400
>
> Problem:
>
> cavity3d 400
>
> When I run mpirun -np 4 on Node1 I get 35.7615 Mega site updates per
second
> When I run mpirun -np 4 on Node2 I get 30.7972 Mega site updates per
second
> When I run mpirun --mca btl_tcp_if_include eth0 --hostfile
/home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 I get 47.3538 Mega 
site

updates per second
>
> I understand that there are latencies with GbE and that there is MPI
overhead, but this performance scaling still seems very poor. Are my
expectations of scaling naive, or is there actually something wrong and
fixable that will improve the scaling? Optimistically I would like each
node to add to the cluster performance, not slow it down.
>
> Things get even worse if I run asymmetric number of mpi jobs in each
node. For instance running -np 12 on Node1

Isn't this overloading the machine with only 8 real cores in total?


> is significantly faster than running -np 16 across Node1 and 
Node2, thus

adding Node2 actually slows down the performance.

The i5-2400 has only 4 cores and no threads.

It depends on the algorithm how much data has to be exchanged 
between the

processes, and this can indeed be worse when used across a network.

Also: is the algorithm scaling linear when used on node1 only with 8
cores? When it's "35.7615 " with 4 cores, what result do you get with 8
cores on this machine.

-- Reuti
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





_______
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Tim Prince



Re: [OMPI users] compilation aborted for Handler.cpp (code 2)

2014-01-28 Thread Tim Prince


On 1/28/2014 10:44 AM, Abdul Rahman Riza wrote:


-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Syed Ahsan Ali
Sent: Sunday, September 22, 2013 9:41 PM
To: Open MPI Users
Subject: Re: [OMPI users] compilation aborted for Handler.cpp (code 2)

Its ok Jeff.
I am not sure about other C++ codes and STL with icpc because it never
happened and I don't know anything about STL.(pardon my less knowledge).
What do you suggest in this case? installation of different version of
openmpi or intel compilers? or any other solution.

On Fri, Sep 20, 2013 at 8:35 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:

Sorry for the delay replying -- I actually replied on the original
thread yesterday, but it got hung up in my outbox and I didn't notice
that it didn't actually go out until a few moments ago.  :-(

I'm *guessing* that this is a problem with your local icpc installation.

Can you compile / run other C++ codes that use the STL with icpc?


On Sep 20, 2013, at 6:59 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote:


Output of make V=1 is attached. Again same error. If intel compiler
is using C++ headers from gfortran then how can we avoid this.

On Fri, Sep 20, 2013 at 11:07 AM, Bert Wesarg
<bert.wes...@googlemail.com> wrote:

Hi,

On Fri, Sep 20, 2013 at 4:49 AM, Syed Ahsan Ali <ahsansha...@gmail.com>

wrote:

I am trying to compile openmpi-1.6.5 on fc16.x86_64 with icc and
ifort but getting the subject error. config.out and make.out is

attached.

Following command was used for configure

./configure CC=icc CXX=icpc FC=ifort F77=ifort F90=ifort
--prefix=/home/openmpi_gfortran -enable-mpi-f90 --enable-mpi-f77 |&
tee config.out

could you also run make with 'make V=1' and send the output. Anyway
it looks like the intel compiler uses the C++ headers from GCC 4.6.3
and I don't know if this is supported.

Bert

icpc expects to pick up headers and libraries, including libstdc++, from 
a simultaneously active g++ installation (normally the g++ which is on 
PATH and LD_LIBRARY_PATH).   g++ 4.7 or 4.8 (with not all the latest 
features supported by icpc) are probably better with the recent icpc 
13.1 and 14.0, but I hope the OpenMP build doesn't depend on c++11. If 
you do use c++11, you need versions of icpc and g++ both supporting it 
via -std=c++11 (where g++ 4.6 may need c++0x).
 You could run into cluster configuration issues if you don't have 
consistent g++ as well as icpc run-times on LD_LIBRARY_PATH everywhere.
You can't mix support for gfortran with support for ifort; for C and C++ 
you should be able to use gcc/g++ and icc/icpc interchangeably, so you 
could configure for gcc and g++ along with ifort and still use icc and 
icpc as you choose.


--
Tim Prince



Re: [OMPI users] [EXTERNAL] MPI_THREAD_SINGLE vs. MPI_THREAD_FUNNELED

2013-10-23 Thread Tim Prince

On 10/23/2013 01:02 PM, Barrett, Brian W wrote:
On 10/22/13 10:23 AM, "Jai Dayal" > wrote:


I, for the life of me, can't understand the difference between
these two init_thread modes.

MPI_THREAD_SINGLE states that "only one thread will execute", but
MPI_THREAD_FUNNELED states "The process may be multi-threaded, but
only the main thread will make MPI calls (all MPI calls are
funneled to the main thread)."

If I use MPI_THREAD_SINGLE, and just create a bunch of pthreads
that dumbly loop in the background, the MPI library will have no
way of detecting this, nor should this have any affects on the
machine.

This is exactly the same as MPI_THREAD_FUNNELED. What exactly does
it mean with "only one thread will execute?" The openmpi library
has absolutely zero way of knowng I've spawned other pthreads, and
since these pthreads aren't actually doing MPI communication, I
fail to see how this would interfere.


Technically, if you call MPI_INIT_THREAD with MPI_THREAD_SINGLE, you 
have made a promise that you will not create any other threads in your 
application.  There was a time where OSes shipped threaded and 
non-threaded malloc, for example, so knowing that might be important 
for that last bit of performance.  There are also some obscure corner 
cases of the memory model of some architectures where you might get 
unexpected results if you made an MPI Receive call in an thread and 
accessed that buffer later from another thread, which may require 
memory barriers inside the implementation, so there could be some 
differences between SINGLE and FUNNELED due to those barriers.


In Open MPI, we'll handle those corner cases whether you init for 
SINGLE or FUNNELED, so there's really no practical difference for Open 
MPI, but you're then slightly less portable.


I'm asking because I'm using an open_mpi build ontop of
infiniband, and the maximum thread mode is MPI_THREAD_SINGLE.


That doesn't seem right; which version of Open MPI are you using?

Brian



As Brian said, you aren't likely to be running on a system like Windows 
98 where non-thread-safe libraries were preferred.  My colleagues at 
NASA insist that any properly built MPI will support MPI_THREAD_FUNNELED 
by default, even when the documentation says explicit setting in 
MPI_Init_thread() is mandatory.  The statement which I see in OpenMPI 
doc says all MPI calls must be made by the thread which calls 
MPI_Init_thread.  Apparently it will work if plain MPI_Init is used 
instead.  This theory appears to hold up for all the MPI implementations 
of interest.  The additional threads referred to are "inside the MPI 
rank," although I suppose additional application threads not involved 
with MPI are possible.




Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI

2013-05-25 Thread Tim Prince

On 5/25/2013 8:26 AM, Jeff Squyres (jsquyres) wrote:

On May 23, 2013, at 9:50 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:


Excellent.  Now I've read the FAQ and noticed that it doesn't mention the issue 
with the Fortran 90 .mod signatures.  Our applications are Fortran.  So your 
replies are very helpful -- now I know it really isn't practical for us to use 
the default OpenMPI shipped with RHEL6 since we use both Intel and PGI 
compilers and have several applications to accommodate.  Presumably if all the 
applications did INCLUDE 'mpif.h'  instead of 'USE MPI' then we could get 
things working, but it's not a great workaround.

No, not even if they use mpif.h.  Here's a chunk of text from the v1.6 README:

- While it is possible -- on some platforms -- to configure and build
   Open MPI with one Fortran compiler and then build MPI applications
   with a different Fortran compiler, this is not recommended.  Subtle
   problems can arise at run time, even if the MPI application
   compiled and linked successfully.

   Specifically, the following two cases may not be portable between
   different Fortran compilers:

   1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE
  will only compare properly to Fortran applications that were
  created with Fortran compilers that that use the same
  name-mangling scheme as the Fortran compiler with which Open MPI
  was configured.

   2. Fortran compilers may have different values for the logical
  .TRUE. constant.  As such, any MPI function that uses the Fortran
  LOGICAL type may only get .TRUE. values back that correspond to
  the the .TRUE. value of the Fortran compiler with which Open MPI
  was configured.  Note that some Fortran compilers allow forcing
  .TRUE. to be 1 and .FALSE. to be 0.  For example, the Portland
  Group compilers provide the "-Munixlogical" option, and Intel
  compilers (version >= 8.) provide the "-fpscomp logicals" option.

   You can use the ompi_info command to see the Fortran compiler with
   which Open MPI was configured.


Even when the name mangling obstacle doesn't arise (it shouldn't for the 
cited case of gfortran vs. ifort), run-time library function usage is 
likely to conflict between the compiler used to build the MPI Fortran 
library and the compiler used to build the application. So there really 
isn't a good incentive to retrogress away from the USE files simply to 
avoid one aspect of mixing incompatible compilers.


--
Tim Prince



Re: [OMPI users] basic questions about compiling OpenMPI

2013-05-22 Thread Tim Prince

On 5/22/2013 11:34 AM, Paul Kapinos wrote:

On 05/22/13 17:08, Blosch, Edwin L wrote:

Apologies for not exploring the FAQ first.


No comments =)



If I want to use Intel or PGI compilers but link against the OpenMPI 
that ships with RedHat Enterprise Linux 6 (compiled with g++ I 
presume), are there any issues to watch out for, during linking?


At least, the Fortran-90 bindings ("use mpi") won't work at all 
(they're compiler-dependent.


So, our way is to compile a version of Open MPI with each compiler. I 
think this is recommended.


Note also that the version of Open MPI shipped with Linux is usuallu a 
bit dusty.



The gfortran build of Fortran library, as well as the .mod USE files, 
won't work with ifort or PGI compilers.  g++ built libraries ought to 
work with sufficiently recent versions of icpc.
As noted above, it's worth while to rebuild yourself, even if you use a 
(preferably more up to date version of) gcc, which you can use along 
with one of the commercial Fortran compilers for linux.


--
Tim Prince



Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5

2013-05-17 Thread Tim Prince

On 05/16/2013 10:13 PM, Tim Prince wrote:

On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote:
Maybe I should add that my Intel C++ and Fortran compilers are 
different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that 
be an issue? Also, when I check for the location of ifort, it seems 
to be in usr/bin - which is different than the C compiler (even 
though I have folders /opt/intel/composer_xe_2013 and 
/opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source 
/opt/intel/bin/ifortvars.sh intel64/ too.


Geraldine


On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote:



I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ 
composer (12.0.2). My OS is OSX 10.7.5.


I am not a computer whizz so I hope I can explain what I did properly:

1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/
and then /echo PATH/ showed:
//opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/ 


/
/
2)/which icc /and /which icpc /showed:
//opt/intel/composerxe-2011.2.142/bin/intel64/icc/
and
//opt/intel/composerxe-2011.2.142/bin/intel64/icpc/
/
/
So that all seems okay to me. Still when I do
/./configure CC=icc CXX=icpc F77=ifort FC=ifort 
--prefix=/opt/openmpi-1.6.4/

from the folder in which the extracted OpenMPI files sit, I get

// 


/== Configuring Open MPI/
// 


/
/
/*** Startup tests/
/checking build system type... x86_64-apple-darwin11.4.2/
/checking host system type... x86_64-apple-darwin11.4.2/
/checking target system type... x86_64-apple-darwin11.4.2/
/checking for gcc... icc/
/checking whether the C compiler works... no/
/configure: error: in 
`/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/

/configure: error: C compiler cannot create executables/
/See `config.log' for more details/
/
/


You do need to examine config.log and show it to us if you don't 
understand it.
Attempting to use the older C compiler and libraries to link  .o files 
made by the newer Fortran is likely to fail.
If you wish to attempt this, assuming the Intel compilers are 
installed in default directories, I would suggest you source the 
environment setting for the older compiler, then the newer one, so 
that the newer libraries will be found first and the older ones used 
only when they aren't duplicated by the newer ones.

You also need the 64-bit g++ active.

It's probably unnecessary to use icpc at all when building OpenMPI. icpc 
is compatible with gcc/g++ built objects,


--
Tim Prince



Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5

2013-05-16 Thread Tim Prince

On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote:
Maybe I should add that my Intel C++ and Fortran compilers are 
different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that be 
an issue? Also, when I check for the location of ifort, it seems to be 
in usr/bin - which is different than the C compiler (even though I 
have folders /opt/intel/composer_xe_2013 and 
/opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source 
/opt/intel/bin/ifortvars.sh intel64/ too.


Geraldine


On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote:



I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ 
composer (12.0.2). My OS is OSX 10.7.5.


I am not a computer whizz so I hope I can explain what I did properly:

1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/
and then /echo PATH/ showed:
//opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/
/
/
2)/which icc /and /which icpc /showed:
//opt/intel/composerxe-2011.2.142/bin/intel64/icc/
and
//opt/intel/composerxe-2011.2.142/bin/intel64/icpc/
/
/
So that all seems okay to me. Still when I do
/./configure CC=icc CXX=icpc F77=ifort FC=ifort 
--prefix=/opt/openmpi-1.6.4/

from the folder in which the extracted OpenMPI files sit, I get

//
/== Configuring Open MPI/
//
/
/
/*** Startup tests/
/checking build system type... x86_64-apple-darwin11.4.2/
/checking host system type... x86_64-apple-darwin11.4.2/
/checking target system type... x86_64-apple-darwin11.4.2/
/checking for gcc... icc/
/checking whether the C compiler works... no/
/configure: error: in 
`/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/

/configure: error: C compiler cannot create executables/
/See `config.log' for more details/
/
/


You do need to examine config.log and show it to us if you don't 
understand it.
Attempting to use the older C compiler and libraries to link  .o files 
made by the newer Fortran is likely to fail.
If you wish to attempt this, assuming the Intel compilers are installed 
in default directories, I would suggest you source the environment 
setting for the older compiler, then the newer one, so that the newer 
libraries will be found first and the older ones used only when they 
aren't duplicated by the newer ones.

You also need the 64-bit g++ active.

--
Tim Prince



Re: [OMPI users] memory per core/process

2013-03-30 Thread Tim Prince

On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I 
change this?


I did $ ulimit -s unlimited to have stack size to be unlimited, and 
the job ran fine!!! So it looks like stack limit is the problem. 
Questions are:


 * how do I set this automatically (and permanently)?
 * should I set all other ulimits to be unlimited?

In our environment, the only solution we found is to have mpirun run a 
script on each node which sets ulimit (as well as environment variables 
which are more convenient to set there than in the mpirun), before 
starting the executable.  We had expert recommendations against this but 
no other working solution.  It seems unlikely that you would want to 
remove any limits which work at default.
Stack size unlimited in reality is not unlimited; it may be limited by a 
system limit or implementation.  As we run up to 120 threads per rank 
and many applications have threadprivate data regions, ability to run 
without considering stack limit is the exception rather than the rule.


--
Tim Prince



Re: [OMPI users] mpivars.sh - Intel Fortran 13.1 conflict with OpenMPI 1.6.3

2013-01-24 Thread Tim Prince

On 01/24/2013 12:40 PM, Michael Kluskens wrote:

This is for reference and suggestions as this took me several hours to track down and the 
previous discussion on "mpivars.sh" failed to cover this point (nothing in the 
FAQ):

I successfully build and installed OpenMPI 1.6.3 using the following on Debian 
Linux:

./configure --prefix=/opt/openmpi/intel131 --disable-ipv6 
--with-mpi-f90-size=medium --with-f90-max-array-dim=4 --disable-vt 
F77=/opt/intel/composer_xe_2013.1.117/bin/intel64/ifort FC=/opt/
intel/composer_xe_2013.1.117/bin/intel64/ifort CXXFLAGS=-m64 CFLAGS=-m64 CC=gcc 
CXX=g++

(disable-vt was required because of an error finding -lz which I gave up on).

My .tcshrc file HAD the following:

set path = (/opt/openmpi/intel131/bin $path)
setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH
setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH
alias mpirun "mpirun --prefix /opt/openmpi/intel131 "
source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64

For years I have used these procedures on Debian Linux and OS X with earlier 
versions of OpenMPI and Intel Fortran.

However, at some point Intel Fortran started including "mpirt", including: 
/opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun

So even through I have the alias set for mpirun, I got the following error:


mpirun -V

.: 131: Can't open 
/opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh

Part of the confusion is that OpenMPI source does include a reference to "mpivars" in 
"contrib/dist/linux/openmpi.spec"

The solution only occurred as I was writing this up, source intel setup first:

source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64
set path = (/opt/openmpi/intel131/bin $path)
setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH
setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH
alias mpirun "mpirun --prefix /opt/openmpi/intel131 "

Now I finally get:


mpirun -V

mpirun (Open MPI) 1.6.3

The mpi runtime should be in the redistributable for their MPI compiler not in 
the base compiler.  The question is how much of 
/opt/intel/composer_xe_2013.1.117/mpirt can I eliminate safely and should I ( 
multi-user machine were each user has their own Intel license, so I don't wish 
to trouble shoot this in the future ) ?



ifort mpirt is a run-time to support co-arrays, but not full MPI. This 
version of the compiler checks in its path setting scripts whether Intel 
MPI is already on LD_LIBRARY_PATH, and so there is a conditional setting 
of the internal mpivars.  I assume the co-array feature would be 
incompatible with OpenMPI and you would want to find a way to avoid any 
reference to that library, possibly by avoiding sourcing that part of 
ifort's compilervars.
If you want a response on this subject from the Intel support team, 
their HPC forum might be a place to bring it up: 
http://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology


--
Tim Prince



Re: [OMPI users] Compiling 1.6.1 with cygwin 1.7 and gcc

2012-09-24 Thread Tim Prince

On 9/24/2012 1:02 AM, Roy Hogan wrote:


I’m trying to build version 1.6.1 on Cygwin (1.7), using the gcc 4.5.3 
compilers. I need to use the Cygwin linux environment specifically so 
I’m not interested in the cmake option on the windows side. I’ve 
searched the archives, but don’t find much on the Cygwin build option 
over the last couple of years.


I’ve attached the logs for my “configure” and “make all” steps. Our 
email filter will not allow me to send zipped files, so I’ve attached 
the two log files. I’d appreciate any advice.




Perhaps you mean cygwin posix environment.
Evidently, your Microsoft-specific macros required in windows.c aren't 
handled by configury under cygwin, at least not if you don't specify 
that you want them. As you hinted, cygwin supports a more linux-like 
environment, although many of those macros should be handled by #include 
"windows.h".
Do you have a reason for withholding information such as which Windows 
version you want to support, and your configure commands?



--
Tim Prince



Re: [OMPI users] 转发:lwkmpi

2012-08-28 Thread Tim Prince

On 8/28/2012 5:11 AM, 清风 wrote:




-- 原始邮 件 --
*发件人:* "295187383"<295187...@qq.com>;
*发送时间:* 2012年8月28日(星期二) 下午4:13
*收件人:* "users"<us...@open-mpi.org>;
*主题:* lwkmpi

Hi everybody,
I'm trying compile openmpi with intel compiler11.1.07 on ubuntu .
I compiled openmpi  many times and I could always find a problem. 
But the error that I'm getting now, gives me no clues where to even 
search for the problem.
It seems I have succeed to configure.While I try "make all",it 
always show problems below:




make[7]: 正在进入目录 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
/opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. 
-I../../..   -DINSIDE_OPENMPI -I/home/lwk/桌面/mnt/Software/openmpi- 
1.6.1/opal/mca/hwloc/hwloc132/hwloc /include   
-I/usr/include/infiniband -I/usr/include/infiniband  -DOPARI_VT -O3 
-DNDEBUG -finline-functions -pthread -MT opari-ompragma_c.o -MD -MP 
-MF .deps/opari-ompragma_c.Tpo -c -o opari-ompragma_c.o `test -f 
'ompragma_c.cc' || echo './'`ompragma_c.cc

/usr/include/c++/4.5/iomanip(64): error: expected an expression
{ return { __mask }; }
 ^



Looks like your icpc is too old to work with your g++.  If you want to 
build with C++ support, you'll need better matching versions of icpc and 
g++.  icpc support for g++4.7 is expected to release within the next 
month; icpc 12.1 should be fine with g++ 4.5 and 4.6.


--
Tim Prince



Re: [OMPI users] mpi.h incorrect format error?

2012-08-06 Thread Tim Prince

 On 08/06/2012 07:35 AM, PattiMichelle wrote:
mpicc  -DFSEEKO64_OK  -w -O3 -c -DLANDREAD_STUB -DDM_PARALLEL 
-DMAX_HISTORY=25  -c buf_for_proc.c

You might need to examine the pre-processed source
 (mpicc -E buf_for_proc.c > buf_for_proc.i)
to see what went wrong in pre-processing at the point where the compiler 
(gcc?) complains.
I suppose you must have built mpicc yourself; you would need to assure 
that the mpicc on PATH is the one built with the C compiler on PATH.


--
Tim Prince



Re: [OMPI users] compilation on windows 7 64-bit

2012-07-27 Thread Tim Prince

On 07/27/2012 12:23 PM, Sayre, Alan N wrote:


During compilation I get warning messages such as :

c:\program files 
(x86)\openmpi_v1.6-x64\include\openmpi/ompi/mpi/cxx/op_inln.h(148): 
warning C4800: 'int' : forcing value to bool 'true' or 'false' 
(performance warning)


  cmsolver.cpp

Which indicates that the openmpi version "openmpi_v1.6-x64" is 64 bit. 
And I'm sure that I installed the 64 bit version. I am compiling on a 
64 bit version of Windows 7.




setting X64 compiler project options?

--
Tim Prince



Re: [OMPI users] undefined reference to `netcdf_mp_nf90_open_'

2012-06-26 Thread Tim Prince

On 6/26/2012 9:20 AM, Jeff Squyres wrote:

Sorry, this looks like an application issue -- i.e., the linker error you're 
getting doesn't look like it's coming from Open MPI.  Perhaps it's a missing 
application/middleware library.

More specifically, you can take the mpif90 command that is being used to generate these 
errors and add "--showme" to the end of it, and you'll see what underlying 
compiler command is being executed under the covers.  That might help you understand 
exactly what is going on.



On Jun 26, 2012, at 7:13 AM, Syed Ahsan Ali wrote:


Dear All

I am getting following error while compilation of an application. Seems like 
something related to netcdf and mpif90. Although I have compiled netcdf with 
mpif90 option, dont why this error is happening. Any hint would be highly 
appreciated.


/home/pmdtest/cosmo/source/cosmo_110525_4.18/obj/src_obs_proc_cdf.o: In 
function `src_obs_proc_cdf_mp_obs_cdf_read_org_':

/home/pmdtest/cosmo/source/cosmo_110525_4.18/src/src_obs_proc_cdf.f90:(.text+0x17aa):
 undefined reference to `netcdf_mp_nf90_open_'

If your mpif90 is properly built and set up with the same Fortran 
compiler you are using, it appears that either you didn't build the 
netcdf Fortran 90 modules with that compiler, or you didn't set the 
include path for the netcdf modules.  This would work the same with 
mpif90 as with the underlying Fortran compiler.



--
Tim Prince


Re: [OMPI users] Cannot compile code with gfortran + OpenMPI when OpenMPI was built with latest intl compilers

2012-05-19 Thread Tim Prince

On 5/19/2012 2:20 AM, Sergiy Bubin wrote:
I built OpenMPI with that set of intel compilers. Everything seems to 
be fine and I can compile my fortran+MPI code with no problem when I 
invoke ifort. I should say that I do not actually invoke the "wrapper" 
mpi compiler. I normally just add flags as MPICOMPFLAGS=$(shell mpif90 
--showme:compile) and MPILINKFLAGS=$(shell mpif90 --showme:link) in my 
makefile. I know it is not the recommended way of doing things but the 
reason I do that is that I absolutely need to be able to use different 
fortran compilers to build my fortran code.
Avoiding the use of mpif90 accomplishes nothing for changing between 
incompatible Fortran compilers.   Run-time libraries are incompatible 
among ifort, gfortran, and Oracle Fortran, so you can't link a mixture 
of objects compiled by incompatible Fortran compilers except in limited 
circumstances.  This includes the MPI Fortran library.
I don't see how it is too great an inconvenience for your Makefile to 
set PATH and LD_LIBRARY_PATH to include the mpif90 corresponding to the 
chosen Fortran compiler.  You may need to build your own mpif90 for 
gfortran as well as the other compilers, so as to configure it to keep 
it off the default PATHs (e.g. --prefix=/opt/ompi1.4gf/), if you can't 
move the Ubuntu ompi.

Surely most of this is implied in the OpenMPI instructions.

--
Tim Prince



Re: [OMPI users] redirecting output

2012-03-30 Thread Tim Prince

 On 03/30/2012 10:41 AM, tyler.bal...@huskers.unl.edu wrote:



I am using the command mpirun -np nprocs -machinefile machines.arch 
Pcrystal and my output strolls across my terminal I would like to send 
this output to a file and I cannot figure out how to do soI have 
tried the general > FILENAME and > log & these generate files 
however they are empty.any help would be appreciated.


If you run under screen your terminal output should be collected in 
screenlog.  Beats me why some sysadmins don't see fit to install screen.


--
Tim Prince



Re: [OMPI users] [EXTERNAL] Possible to build ompi-1.4.3 or 1.4.5 without a C++ compiler?

2012-03-20 Thread Tim Prince

 On 03/20/2012 08:35 AM, Gunter, David O wrote:

I wish it were that easy.  When I go that route, I get error messages like the 
following when trying to compile the parallel code with Intel:

libmpi.so:  undefined reference to `__intel_sse2_strcpy'

and other messages for every single Intel-implemented standard C-function.

-david
--

There was a suggestion in the snipped portion which suggested you use 
gcc/g++ together with ifort; that doesn't appear to be what you mean by 
"that route." (unless you forgot to recompile your .c files by gcc) You 
have built some objects with an Intel compiler (either ifort or 
icc/icpc) which is referring to this Intel library function, but you 
apparently didn't link against the library which provides it.  If you 
use one of those Intel compilers to drive the link, and your environment 
paths are set accordingly, the Intel libraries would be linked 
automatically.
There was a single release of the compiler several years ago (well out 
of support now) where that sse2 library was omitted, although the sse3 
version was present.


--
Tim Prince



Re: [OMPI users] parallelising ADI

2012-03-06 Thread Tim Prince

 On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:

Hi

I am working on a 3D ADI solver for the heat equation. I have implemented it as 
serial. Would anybody be able to indicate the best and more straightforward way 
to parallelise it. Apologies if this is going to the wrong forum.


If it's to be implemented in parallelizable fashion (not SSOR style 
where each line uses updates from the previous line), it should be 
feasible to divide the outer loop into an appropriate number of blocks, 
or decompose the physical domain and perform ADI on individual blocks, 
then update and repeat.


--
Tim Prince



Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Tim Prince

On 12/14/2011 12:52 PM, Micah Sklut wrote:

Hi Gustavo,

Here is the output of :
barells@ip-10-17-153-123:~> /opt/openmpi/intel/bin/mpif90 -showme
gfortran -I/usr/lib64/mpi/gcc/openmpi/include -pthread
-I/usr/lib64/mpi/gcc/openmpi/lib64 -L/usr/lib64/mpi/gcc/openmpi/lib64
-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
-Wl,--export-dynamic -lnsl -lutil -lm -ldl

This points to gfortran.

I do see what you are saying about the 1.4.2 and 1.4.4 components.
I'm not sure why that is, but there seems to be some conflict with the
existing openmpi, before recently installed 1.4.4 and trying to install
with ifort.

This is one of the reasons for recommending complete removal (rpm -e if 
need be) of any MPI which is on a default path (and setting a clean 
path) before building a new one, as well as choosing a unique install 
path for the new one.


--
Tim Prince


Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Tim Prince

On 12/14/2011 1:20 PM, Fernanda Oliveira wrote:

Hi Micah,

I do not know if it is exactly what you need but I know that there are
environment variables to use with intel mpi. They are: I_MPI_CC,
I_MPI_CXX, I_MPI_F77, I_MPI_F90. So, you can set this using 'export'
for bash, for instance or directly when you run.

I use in my bashrc:

export I_MPI_CC=icc
export I_MPI_CXX=icpc
export I_MPI_F77=ifort
export I_MPI_F90=ifort


Let me know if it helps.
Fernanda Oliveira




I didn't see any indication that Intel MPI was in play here.  Of course, 
that's one of the first thoughts, as under Intel MPI,

mpif90 uses gfortran
mpiifort uses ifort
mpicc uses gcc
mpiCC uses g++
mpiicc uses icc
mpiicpc uses icpc
and all the Intel compilers use g++ to find headers and libraries.
The advice to try 'which mpif90' would show whether you fell into this 
bunker.
If you use Intel cluster checker, you will see noncompliance if anyone's 
MPI is on the default paths.  You must set paths explicitly according to 
the MPI you want.  Admittedly, that tool didn't gain a high level of 
adoption.


--
Tim Prince


Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Tim Prince

On 12/14/2011 9:49 AM, Micah Sklut wrote:


I have installed openmpi for gfortran, but am now attempting to install
openmpi as ifort.

I have run the following configuration:
./configure --prefix=/opt/openmpi/intel CC=gcc CXX=g++ F77=ifort FC=ifort

The install works successfully, but when I run
/opt/openmpi/intel/bin/mpif90, it runs as gfortran.
Oddly, when I am user: root, the same mpif90 runs as ifort.

Can someone please alleviate my confusion as to why I mpif90 is not
running as ifort?



You might check your configure logs to be certain that ifort was found 
before gfortran at all stages (did you set paths according to sourcing 
the ifortvars or compilervars scripts which come with ifort?).
'which mpif90' should tell you whether you are executing the one from 
your installation.  You may have another mpif90 coming first on your 
PATH.  You won't be able to override your PATH and LD_LIBRARY_PATH 
correctly simply by specifying absolute path to mpif90.



--
Tim Prince


Re: [OMPI users] How to justify the use MPI codes on multicore systems/PCs?

2011-12-11 Thread Tim Prince

On 12/11/2011 12:16 PM, Andreas Schäfer wrote:

Hey,

on an SMP box threaded codes CAN always be faster than their MPI
equivalents. One reason why MPI sometimes turns out to be faster is
that with MPI every process actually initializes its own
data. Therefore it'll end up in the NUMA domain to which the core
running that process belongs. A lot of threaded codes are not NUMA
aware. So, for instance the initialization is done sequentially
(because it may not take a lot of time), and Linux' first touch policy
makes all memory pages belong to a single domain. In essence, those
codes will use just a single memory controller (and its bandwidth).



Many applications require significant additional RAM and message passing 
communication per MPI rank. Where those are not adverse issues, MPI is 
likely to out-perform pure OpenMP (Andreas just quoted some of the 
reasons), and OpenMP is likely to be favored only where it is an easier 
development model. The OpenMP library also should implement a 
first-touch policy, but it's very difficult to carry out fully in legacy 
applications.
OpenMPI has had effective shared memory message passing from the 
beginning, as did its predecessor (LAM) and all current commercial MPI 
implementations I have seen, so you shouldn't have to beat on an issue 
which was dealt with 10 years ago.  If you haven't been watching this 
mail list, you've missed some impressive reporting of new support 
features for effective pinning by CPU, cache, etc.
When you get to hundreds of nodes, depending on your application and 
interconnect performance, you may need to consider "hybrid" (OpenMP as 
the threading model for MPI_THREAD_FUNNELED mode), if you are running a 
single application across the entire cluster.
The biggest cluster in my neighborhood, which ranked #54 on the recent 
Top500, gave best performance in pure MPI mode for that ranking.  It 
uses FDR infiniband, and ran 16 ranks per node, for 646 nodes, with 
DGEMM running in 4-wide vector parallel.  Hybrid was tested as well, 
with each multiple-thread rank pinned to a single L3 cache.
All 3 MPI implementations which were tested have full shared memory 
message passing and pinning to local cache within each node (OpenMPI and 
2 commercial MPIs).



--
Tim Prince


Re: [OMPI users] EXTERNAL: Re: Question about compilng with fPIC

2011-09-21 Thread Tim Prince

On 9/21/2011 12:22 PM, Blosch, Edwin L wrote:

Thanks Tim.

I'm compiling source units and linking them into an executable.  Or perhaps you 
are talking about how OpenMPI itself is built?  Excuse my ignorance...

The source code units are compiled like this:
/usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad 
-xHost -falign-functions -fpconstant -O2 -I. 
-I/usr/mpi/intel/openmpi-1.4.3/include -c ../code/src/main/main.f90

The link step is like this:
/usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad -xHost 
-falign-functions -fpconstant -static-intel -o ../bin/
-lstdc++

OpenMPI itself was configured like this:
./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-gm --without-elan 
--without-mx --without-slurm --without-loadleveler 
--enable-mpirun-prefix-by-default --enable-contrib-no-build=vt 
--enable-mca-no-build=maffinity --disable-per-user-config-files 
--disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared 
CXX=/appserv/intel/Compiler/11.1/072/bin/intel64/icpc 
CC=/appserv/intel/Compiler/11.1/072/bin/intel64/icc 'CFLAGS=  -O2' 'CXXFLAGS=  
-O2' F77=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 
'FFLAGS=-D_GNU_SOURCE -traceback  -O2' 
FC=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FCFLAGS=-D_GNU_SOURCE 
-traceback  -O2' 'LDFLAGS= -static-intel'

ldd output on the final executable gives:
 linux-vdso.so.1 =>   (0x7fffb77e7000)
 libstdc++.so.6 =>  /usr/lib64/libstdc++.so.6 (0x2b2e2b652000)
 libibverbs.so.1 =>  /usr/lib64/libibverbs.so.1 (0x2b2e2b95e000)
 libdl.so.2 =>  /lib64/libdl.so.2 (0x2b2e2bb6d000)
 libnsl.so.1 =>  /lib64/libnsl.so.1 (0x2b2e2bd72000)
 libutil.so.1 =>  /lib64/libutil.so.1 (0x2b2e2bf8a000)
 libm.so.6 =>  /lib64/libm.so.6 (0x2b2e2c18d000)
 libpthread.so.0 =>  /lib64/libpthread.so.0 (0x2b2e2c3e4000)
 libc.so.6 =>  /lib64/libc.so.6 (0x2b2e2c60)
 libgcc_s.so.1 =>  /lib64/libgcc_s.so.1 (0x2b2e2c959000)
 /lib64/ld-linux-x86-64.so.2 (0x2b2e2b433000)

Do you see anything that suggests I should have been compiling the application 
and/or OpenMPI with -fPIC?

If you were building any OpenMPI shared libraries, those should use 
-fPIC. configure may have made the necessary additions. If your 
application had shared libraries, you would require -fPIC, but 
apparently you had none.  The shared libraries you show presumably 
weren't involved in your MPI or application build, and you must have 
linked in static versions of your MPI libraries, where -fPIC wouldn't be 
required.



--
Tim Prince


Re: [OMPI users] Question about compilng with fPIC

2011-09-21 Thread Tim Prince

On 9/21/2011 11:44 AM, Blosch, Edwin L wrote:

Follow-up to a mislabeled thread:  "How could OpenMPI (or MVAPICH) affect 
floating-point results?"

I have found a solution to my problem, but I would like to understand the 
underlying issue better.

To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked 
with OpenMPI fails.  The earliest symptom I could see was some strange 
difference in numerical values of quantities that should be unaffected by MPI 
calls.  Tim's advice guided me to assume memory corruption. Eugene's advice 
guided me to explore the detailed differences in compilation.

I observed that the MVAPICH mpif90 wrapper adds -fPIC.

I tried adding -fPIC and -mcmodel=medium to the compilation of the 
OpenMPI-linked executable.  Now it works fine. I haven't tried without 
-mcmodel=medium, but my guess is -fPIC did the trick.

Does anyone know why compiling with -fPIC has helped?  Does it suggest an 
application problem or an OpenMPI problem?

To note: This is an Infiniband-based cluster.  The application does pretty 
basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, 
isend, irecv, waitall.  There is one task that uses iprobe with MPI_ANY_TAG, 
but this task is only involved in certain cases (including this one). 
Conversely, cases that do not call iprobe have not yet been observed to crash.  
I am deducing that this function is the problem.



If you are making a .so, the included .o files should be built with 
-fPIC or similar. Ideally, the configure and build tools would enforce this.


--
Tim Prince


Re: [OMPI users] Building with thread support on Windows?

2011-09-21 Thread Tim Prince

On 9/21/2011 11:18 AM, Björn Regnström wrote:

Hi,

I am trying to build Open MPI 1.4.3 with thread support on Windows. A
trivial test program
runs if it calls MPI_Init or MP_Init_thread(int *argc, char ***argv, int
required, int *provide) with
reguired=0 but hangs if required>0. ompi_info for my build reports that
there is no thread
support but MPI_Init_thread returns provide==required.

The only change in the CMake configuration was to check
OMPI_ENABLE_MPI_THREADS.
Is there anything else that needs to be done with the configuration?

I have built 1.4.3 with thread support on several linuxes and mac and it
works fine there.

Not all Windows compilers work well enough with all threading models 
that you could expect satisfactory results; in particular, the compilers 
and thread libraries you use on linux may not be adequate for Windows 
thread support.



--
Tim Prince


Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Tim Prince

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:


It appears to be a side effect of linkage that is able to change a compute-only 
routine's answers.

I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of 
corruption may be going on.



Those intrinsics have direct instruction set translations which 
shouldn't vary from -O1 on up nor with linkage options nor be affected 
by MPI or insertion of WRITEs.


--
Tim Prince


Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Tim Prince

On 9/20/2011 7:25 AM, Reuti wrote:

Hi,

Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:


I am observing differences in floating-point results from an application 
program that appear to be related to whether I link with OpenMPI 1.4.3 or 
MVAPICH 1.2.0.  Both packages were built with the same installation of Intel 
11.1, as well as the application program; identical flags passed to the 
compiler in each case.

I’ve tracked down some differences in a compute-only routine where I’ve printed 
out the inputs to the routine (to 18 digits) ; the inputs are identical.  The 
output numbers are different in the 16th place (perhaps a few in the 15th 
place).  These differences only show up for optimized code, not for –O0.

My assumption is that some optimized math intrinsic is being replaced 
dynamically, but I do not know how to confirm this.  Anyone have guidance to 
offer? Or similar experience?


yes, I face it often but always at a magnitude where it's not of any concern 
(and not related to any MPI). Due to the limited precision in computers, a 
simple reordering of operation (although being equivalent in a mathematical 
sense) can lead to different results. Removing the anomalies with -O0 could 
proof that.

The other point I heard especially for the x86 instruction set is, that the 
internal FPU has still 80 bits, while the presentation in memory is only 64 
bit. Hence when all can be done in the registers, the result can be different 
compared to the case when some interim results need to be stored to RAM. For 
the Portland compiler there is a switch -Kieee -pc64 to force it to stay always 
in 64 bit, and a similar one for Intel is -mp (now -fltconsistency) and -mp1.

Diagnostics below indicate that ifort 11.1 64-bit is in use.  The 
options aren't the same as Reuti's "now" version (a 32-bit compiler 
which hasn't been supported for 3 years or more?).

With ifort 10.1 and more recent, you would set at least
-assume protect_parens -prec-div -prec-sqrt
if you are interested in numerical consistency.  If you don't want 
auto-vectorization of sum reductions, you would use instead

-fp-model source -ftz
(ftz sets underflow mode back to abrupt, while "source" sets gradual).
It may be possible to expose 80-bit x87 by setting the ancient -mp 
option, but such a course can't be recommended without additional cautions.


Quoted comment from OP seem to show a somewhat different question: Does 
OpenMPI implement any operations in a different way from MVAPICH?  I 
would think it probable that the answer could be affirmative for 
operations such as allreduce, but this leads well outside my expertise 
with respect to specific MPI implementations.  It isn't out of the 
question to suspect that such differences might be aggravated when using 
excessively aggressive ifort options such as -fast.




 libifport.so.5 =>  
/opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x2b6e7e081000)
 libifcoremt.so.5 =>  
/opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5 (0x2b6e7e1ba000)
 libimf.so =>  /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so 
(0x2b6e7e45f000)
 libsvml.so =>  /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so 
(0x2b6e7e7f4000)
 libintlc.so.5 =>  
/opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x2b6e7ea0a000)



--
Tim Prince


Re: [OMPI users] OpenMPI vs Intel Efficiency question

2011-07-13 Thread Tim Prince

On 7/12/2011 11:06 PM, Mohan, Ashwin wrote:

Tim,

Thanks for your message. I was however not clear about your suggestions. Would 
appreciate if you could clarify.

You say," So, if you want a sane comparison but aren't willing to study the compiler 
manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc 
-prec-div -prec-sqrt -ansi-alias  and at least (if your linux compiler is g++) mpiCC -O2 
possibly with some of the other options I mentioned earlier."
###From your response above, I understand to use, for Intel, this syntax: "mpiicpc -prec-div 
-prec-sqrt -ansi-alias" and for OPENMPI use "mpiCC -O2". I am not certain about the 
other options you mention.

###Also, I presently use a hostfile while submitting my mpirun. Each node has four slots and my 
hostfile was "nodename slots=4". My compile code is mpiCC -o xxx.xpp.

If you have as ancient a g++ as your indication of FC3 implies, it really isn't 
fair to compare it with a currently supported compiler.
###Do you suggest upgrading the current installation of g++? Would that help?
How much it would help would depend greatly on your source code.  It 
won't help much anyway if you don't choose appropriate options.  Current 
g++ is nearly as good at auto-vectorization as icpc, unless you dive 
into the pragmas and cilk stuff provided with icpc.
You really need to look at the gcc manual to understand those options; 
going into it in any more depth here would try the patience of the list.


###How do I ensure that all 4 slots are active when i submit a mpirun -np 4  command. 
When I do "top", I notice that all 4 slots are active. I noticed this when I did 
"top" with the Intel machine too, that is, it showed four slots active.

Thank you..ashwin.
I was having trouble inferring what platform you are running on, I 
guessed a single core HyperThread, which doesn't seem to agree with your 
"4 slots" terminology.  If you have 2 single core hyperthread CPUs, it 
would be a very unusual application to find a gain for running 2 MPI 
processes per core, but if the sight of 4 processes running on your 
graph was your goal, I won't argue against it.  You must be aware that 
most clusters running CPUs of the past have HT disabled in BIOS setup.


--
Tim Prince


Re: [OMPI users] OpenMPI vs Intel Efficiency question

2011-07-12 Thread Tim Prince

On 7/12/2011 7:45 PM, Mohan, Ashwin wrote:

Hi,

I noticed that the exact same code took 50% more time to run on OpenMPI
than Intel. I use the following syntax to compile and run:
Intel MPI Compiler: (Redhat Fedora Core release 3 (Heidelberg), Kernel
version: Linux 2.6.9-1.667smp x86_64**

mpiicpc -o .cpp  -lmpi

OpenMPI 1.4.3: (Centos 5.5 w/ python 2.4.3, Kernel version: Linux
2.6.18-194.el5 x86_64)**

mpiCC .cpp -o


**Other hardware specs**

 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 15
 model   : 3
 model name  : Intel(R) Xeon(TM) CPU 3.60GHz
 stepping: 4
 cpu MHz : 3591.062
 cache size  : 1024 KB
 physical id : 0
 siblings: 2
 core id : 0
 cpu cores   : 1
 apicid  : 0
 fpu : yes
 fpu_exception   : yes
 cpuid level : 5
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36
 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lmconstant_tsc
pni monitor ds_cpl est tm2
  cid xtpr
  bogomips: 7182.12
 clflush size: 64
 cache_alignment : 128
 address sizes   : 36 bits physical, 48 bits virtual
 power management:

Can the issue of efficiency be deciphered from the above info?

Does the compiler flags have an effect on the efficiency of the
simulation. If so, what flags maybe useful to check to be included for
Open MPI.
The default options for icpc are roughly equivalent to the quite 
aggressive choice
g++ -fno-strict-aliasing -ffast-math -fnocx-limited-range -O3 
-funroll-loops --param max-unroll-times=2
while you apparently used default -O0 for your mpiCC (if it is g++), 
neither of which is a very good initial choice for performance analysis. 
So, if you want a sane comparison but aren't willing to study the 
compiler manuals, you might use (if your source code doesn't violate the 
aliasing rules)

mpiicpc -prec-div -prec-sqrt -ansi-alias
and at least
(if your linux compiler is g++)
mpiCC -O2
possibly with some of the other options I mentioned earlier.
If you have as ancient a g++ as your indication of FC3 implies, it 
really isn't fair to compare it with a currently supported compiler.


Then, Intel MPI, by default, would avoid using HyperThreading, even 
though you have it enabled on your CPU, so, I suppose, if you are 
running on a single core, it will be rotating among your 4 MPI processes 
1 at a time.  The early Intel HyperThread CPUs typically took 15% longer 
to run MPI jobs when running 2 processes per core.


Will including MPICH2 increase efficiency in running simulations using
OpenMPI?

You have to choose a single MPI.  Having MPICH2 installed shouldn't 
affect performance of OpenMPI or Intel MPI, except to break your 
installation if you don't keep things sorted out.
OpenMPI and Intel MPI normally perform very close, if using equivalent 
settings, when working within the environments for which both are suited.

--
Tim Prince


Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1

2011-05-10 Thread Tim Prince

On 5/10/2011 6:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote:


Hi,

I compile a parallel program with OpenMPI 1.4.1 (compiled with intel
compilers 12 from composerxe package) . This program is linked to MUMPS
library 4.9.2, compiled with the same compilers and link with intel MKL.
The OS is linux debian.
No error in compiling or running the job, but the program freeze inside
a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP
routine.

The program is executed on 2 nodes of 12 cores each (westmere
processors) with the following command :

mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh"
--mca btl self,openib -x LD_LIBRARY_PATH ./prog

We have 12 process running on each node. We submit the job with OAR
batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are
specific to this scheduler and are usually working well with openmpi )

via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP :

(gdb) where
#0 0x2b32c1533113 in poll () from /lib/libc.so.6
#1 0x00adf52c in poll_dispatch ()
#2 0x00adcea3 in opal_event_loop ()
#3 0x00ad69f9 in opal_progress ()
#4 0x00a34b4e in mca_pml_ob1_recv ()
#5 0x009b0768 in
ompi_coll_tuned_allreduce_intra_recursivedoubling ()
#6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed ()
#7 0x0097e271 in ompi_comm_allreduce_intra ()
#8 0x0097dd06 in ompi_comm_nextcid ()
#9 0x0097be01 in ompi_comm_dup ()
#10 0x009a0785 in PMPI_Comm_dup ()
#11 0x0097931d in pmpi_comm_dup__ ()
#12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144
#13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...)
at sub_pbdirect_init.f90:44
#14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048


the master wait further :

(gdb) where
#0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6
#1 0x00adf52c in poll_dispatch ()
#2 0x00adcea3 in opal_event_loop ()
#3 0x00ad69f9 in opal_progress ()
#4 0x0098f294 in ompi_request_default_wait_all ()
#5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual ()
#6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck ()
#7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed ()
#8 0x009a0b20 in PMPI_Barrier ()
#9 0x00978c93 in pmpi_barrier__ ()
#10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...)
at sub_pbdirect_init.f90:62
#11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048


Remark :
The same code compiled and run well with intel MPI library, from the
same intel package, on the same nodes.

Did you try compiling with equivalent options in each compiler?  For 
example, (supposing you had gcc 4.6)

gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7
would be equivalent (as closely as I know) to
icc -fp-model source -msse4.2 -ansi-alias

As you should be aware, default settings in icc are more closely 
equivalent to
gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param 
max-unroll-times=2 -fnostrict-aliasing


The options I suggest as an upper limit are probably more aggressive 
than most people have used successfully with OpenMPI.


As to run-time MPI options, Intel MPI has affinity with Westmere 
awareness turned on by default.  I suppose testing without affinity 
settings, particularly when banging against all hyperthreads, is a more 
severe test of your application.   Don't you get better results at 1 
rank per core?

--
Tim Prince


Re: [OMPI users] USE mpi

2011-05-07 Thread Tim Prince

On 5/7/2011 2:35 PM, Dmitry N. Mikushin wrote:

didn't find the icc compiler


Jeff, on 1.4.3 I saw the same issue, even more generally: "make
install" cannot find the compiler, if it is an alien compiler (i.e.
not the default gcc) - same situation for intel or llvm, for example.
The workaround is to specify full paths to compilers with CC=...
FC=... in ./configure params. Could it be "make install" breaks some
env paths?



Most likely reason for not finding an installed icc is that the icc 
environment (source the compilervars script if you have a current 
version) wasn't set prior to running configure.  Setting up the compiler 
in question in accordance with its own instructions is a more likely 
solution than the absolute path choice.
OpenMPI configure, for good reason, doesn't search your system to see 
where a compiler might be installed.  What if you had 2 versions of the 
same named compiler?

--
Tim Prince


Re: [OMPI users] Mixing the FORTRAN and C APIs.

2011-05-06 Thread Tim Prince

On 5/6/2011 10:22 AM, Tim Hutt wrote:

On 6 May 2011 16:45, Tim Hutt<tdh...@gmail.com>  wrote:

On 6 May 2011 16:27, Tim Prince<tcpri...@live.com>  wrote:

If you want to use the MPI Fortran library, don't convert your Fortran to C.
  It's difficult to understand why you would consider f2c a "simplest way,"
but at least it should allow you to use ordinary C MPI function calls.


Sorry, maybe I wasn't clear. Just to clarify, all of *my* code is
written in C++ (because I don't actually know Fortran), but I want to
use some function from PARPACK which is written in Fortran.


Hmm I converted my C++ code to use the C OpenMPI interface instead,
and now I get link errors (undefined references). I remembered I've
been linking with -lmpi -lmpi_f77, so maybe I need to also link with
-lmpi_cxx or -lmpi++  ... what exactly do each of these libraries
contain?

Also I have run into the problem that the communicators are of type
"MPI_Comm" in C, and "integer" in Fortran... I am using MPI_COMM_WORLD
in each case so I assume that will end up referring to the same
thing... but maybe you really can't mix Fortran and C. Expert opinion
would be very very welcome!

If you use your OpenMPI mpicc wrapper to compile and link, the MPI 
libraries should be taken care of.
Style usage in an f2c translation is debatable, but you have an #include 
"f2c.h" or "g2c.h" which translates the Fortran data types to legacy C 
equivalent.  By legacy I mean that in the f2c era, the inclusion of C 
data types in Fortran via USE iso_c_binding had not been envisioned.
One would think that you would use the MPI header data types on both the 
Fortran and the C side, even though you are using legacy interfaces.
Slip-ups in MPI data types often lead to run-time errors.  If you have 
an error-checking MPI library such as the Intel MPI one, you get a 
little better explanation at the failure point.

--
Tim Prince


Re: [OMPI users] Mixing the FORTRAN and C APIs.

2011-05-06 Thread Tim Prince

On 5/6/2011 7:58 AM, Tim Hutt wrote:

Hi,

I'm trying to use PARPACK in a C++ app I have written. This is an
FORTRAN MPI routine used to calculate SVDs. The simplest way I found
to do this is to use f2c to convert it to C, and then call the
resulting functions from my C++ code.

However PARPACK requires that I write some user-defined operations to
be parallel using MPI. So far I have just been calling the FORTRAN
versions of the MPI functions from C, because I wasn't sure whether
you can mix the APIs. I.e. I've been doing this:

-8<-
extern "C"
{
int mpi_init__(integer *);
int mpi_comm_rank__(integer *, integer *, integer *);
int mpi_comm_size__(integer *, integer *, integer *);
int mpi_finalize__(integer *);
int mpi_allgatherv__(doublereal *, integer *, integer *, doublereal
*, integer *, integer *, integer *, integer *);

// OpenMPI version.
const integer MPI_DOUBLE_PRECISION = 17;
}

bool MPI__Init()
{
integer ierr = 0;
mpi_init__();
return ierr == 0;
}
8<

It works so far, but is getting quite tedious and seems like the wrong
way to do it. Also I don't know if it's related but when I use
allgatherv it gives me a segfault:

[panic:20659] *** Process received signal ***
[panic:20659] Signal: Segmentation fault (11)
[panic:20659] Signal code: Address not mapped (1)
[panic:20659] Failing at address: 0x7f4effe8
[panic:20659] [ 0] /lib/libc.so.6(+0x33af0) [0x7f4f8fd62af0]
[panic:20659] [ 1] /usr/lib/libstdc++.so.6(_ZNSolsEi+0x3) [0x7f4f905ec0c3]
[panic:20659] [ 2] ./TDLSM() [0x510322]
[panic:20659] [ 3] ./TDLSM() [0x50ec8d]
[panic:20659] [ 4] ./TDLSM() [0x404ee7]
[panic:20659] [ 5] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f4f8fd4dc4d]
[panic:20659] [ 6] ./TDLSM() [0x404c19]
[panic:20659] *** End of error message ***

So my question is: Can I intermix the C and FORTRAN APIs within one
program? Oh and also I think the cluster I will eventually run this on
(cx1.hpc.ic.ac.uk, if anyone is from Imperial) doesn't use OpenMP, so
what about other MPI implementations?

If you want to use the MPI Fortran library, don't convert your Fortran 
to C.  It's difficult to understand why you would consider f2c a 
"simplest way," but at least it should allow you to use ordinary C MPI 
function calls.
The MPI Fortran library must be built against the same Fortran run-time 
libraries which you use for your own Fortran code.  The header files for 
the Fortran MPI calls probably don't work in C.  It would be a big 
struggle to get them to work with f2c, since f2c doesn't have much 
ability to deal with headers other than its own.
There's no reason you can't make both C and Fortran MPI calls in the 
same application.  If you mean mixing a send from one language with a 
receive in another, I think most would avoid that.
Whether someone uses OpenMP has little to do with choice of MPI 
implementation.  Some of us still may be cursing the choice of OpenMPI 
for the name of an MPI implementation.

--
Tim Prince


Re: [OMPI users] Problem compiling OpenMPI on Ubuntu 11.04

2011-04-19 Thread Tim Prince

 On 04/19/2011 01:24 PM, Sergiy Bubin wrote:

/usr/include/c++/4.5/iomanip(64): error: expected an expression
 { return { __mask }; }
  ^
/usr/include/c++/4.5/iomanip(94): error: expected an expression
 { return { __mask }; }
  ^
/usr/include/c++/4.5/iomanip(125): error: expected an expression
 { return { __base }; }
  ^
/usr/include/c++/4.5/iomanip(193): error: expected an expression
 { return { __n }; }
  ^
/usr/include/c++/4.5/iomanip(223): error: expected an expression
 { return { __n }; }
  ^
/usr/include/c++/4.5/iomanip(163): error: expected an expression
   { return { __c }; }
^

If you're using icpc, this seeming incompatibility  between icpc and g++ 
4.5 has been discussed on the icpc forum

http://software.intel.com/en-us/forums/showthread.php?t=78677=%28iomanip%29
where you should see that you must take care to set option -std=c++0x 
when using current  under icpc, as it is treated as a c++0x 
feature.  You might try adding the option to the CXXFLAGS or whatever 
they are called in openmpi build (or to the icpc.cfg in your icpc 
installation).


--
Tim Prince



Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Tim Prince

On 3/30/2011 10:08 AM, Eugene Loh wrote:

Michele Marena wrote:

I've launched my app with mpiP both when two processes are on
different node and when two processes are on the same node.

The process 0 is the manager (gathers the results only), processes 1
and 2 are workers (compute).

This is the case processes 1 and 2 are on different nodes (runs in 162s).
@--- MPI Time (seconds)
---
Task AppTime MPITime MPI%
0 162 162 99.99
1 162 30.2 18.66
2 162 14.7 9.04
* 486 207 42.56

The case when processes 1 and 2 are on the same node (runs in 260s).
@--- MPI Time (seconds)
---
Task AppTime MPITime MPI%
0 260 260 99.99
1 260 39.7 15.29
2 260 26.4 10.17
* 779 326 41.82

I think there's a contention problem on the memory bus.

Right. Process 0 spends all its time in MPI, presumably waiting on
workers. The workers spend about the same amount of time on MPI
regardless of whether they're placed together or not. The big difference
is that the workers are much slower in non-MPI tasks when they're
located on the same node. The issue has little to do with MPI. The
workers are hogging local resources and work faster when placed on
different nodes.

However, the message size is 4096 * sizeof(double). Maybe I are wrong
in this point. Is the message size too huge for shared memory?

No. That's not very large at all.


Not even large enough to expect the non-temporal storage issue about 
cache eviction to arise.



--
Tim Prince


Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince

On 3/28/2011 3:29 AM, Michele Marena wrote:

Each node have two processors (no dual-core).

which seems to imply that the 2 processors share memory space and a 
single memory buss, and the question is not about what I originally guessed.


--
Tim Prince


Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Tim Prince

On 3/27/2011 2:26 AM, Michele Marena wrote:

Hi,
My application performs good without shared memory utilization, but with
shared memory I get performance worst than without of it.
Do I make a mistake? Don't I pay attention to something?
I know OpenMPI uses /tmp directory to allocate shared memory and it is
in the local filesystem.



I guess you mean shared memory message passing.   Among relevant 
parameters may be the message size where your implementation switches 
from cached copy to non-temporal (if you are on a platform where that 
terminology is used).  If built with Intel compilers, for example, the 
copy may be performed by intel_fast_memcpy, with a default setting which 
uses non-temporal when the message exceeds about some preset size, e.g. 
50% of smallest L2 cache for that architecture.
A quick search for past posts seems to indicate that OpenMPI doesn't 
itself invoke non-temporal, but there appear to be several useful 
articles not connected with OpenMPI.
In case guesses aren't sufficient, it's often necessary to profile 
(gprof, oprofile, Vtune, ) to pin this down.
If shared message slows your application down, the question is whether 
this is due to excessive eviction of data from cache; not a simple 
question, as most recent CPUs have 3 levels of cache, and your 
application may require more or less data which was in use prior to the 
message receipt, and may use immediately only a small piece of a large 
message.


--
Tim Prince


Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-03-21 Thread Tim Prince

On 3/21/2011 5:21 AM, ya...@adina.com wrote:


I am trying to compile our codes with open mpi 1.4.3, by intel
compilers 8.1.

(1) For open mpi 1.4.3 installation on linux beowulf cluster, I use:

./configure --prefix=/home/yiguang/dmp-setup/openmpi-1.4.3
CC=icc
CXX=icpc F77=ifort FC=ifort --enable-static LDFLAGS="-i-static -
static-libcxa" --with-wrapper-ldflags="-i-static -static-libcxa" 2>&1 |
tee config.log

and

make all install 2>&1 | tee install.log

The issue is that I am trying to build open mpi 1.4.3 with intel
compiler libraries statically linked to it, so that when we run
mpirun/orterun, it does not need to dynamically load any intel
libraries. But what I got is mpirun always asks for some intel
library(e.g. libsvml.so) if I do not put intel library path on library
search path($LD_LIBRARY_PATH). I checked the open mpi user
archive, it seems only some kind user mentioned to use
"-i-static"(in my case) or "-static-intel" in ldflags, this is what I did,
but it seems not working, and I did not get any confirmation whether
or not this works for anyone else from the user archive. could
anyone help me on this? thanks!



If you are to use such an ancient compiler (apparently a 32-bit one), 
you must read the docs which come with it, rather than relying on 
comments about a more recent version.  libsvml isn't included 
automatically at link time by that 32-bit compiler, unless you specify 
an SSE option, such as -xW.
It's likely that no one has verified OpenMPI with a compiler of that 
vintage.  We never used the 32-bit compiler for MPI, and we encountered 
run-time library bugs for the ifort x86_64 which weren't fixed until 
later versions.



--
Tim Prince


Re: [OMPI users] Open MPI access the same file in parallel ?

2011-03-10 Thread Tim Prince

On 3/9/2011 11:05 PM, Jack Bryan wrote:

thanks

I am using GNU mpic++ compiler.

Does it can automatically support accessing a file by many parallel
processes ?



It should follow the gcc manual, e.g.
http://www.gnu.org/s/libc/manual/html_node/Opening-Streams.html
I think you want *opentype to evaluate to 'r' (readonly).
--
Tim Prince


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Tim Prince

On 2/23/2011 8:27 AM, Prentice Bisbal wrote:

Jeff Squyres wrote:

On Feb 23, 2011, at 9:48 AM, Tim Prince wrote:


I agree with your logic, but the problem is where the code containing
the error is coming from - it's comping from a header files that's a
part of Open MPI, which makes me think this is a cmpiler error, since
I'm sure there are plenty of people using the same header file. in their
code.


Are you certain that they all find it necessary to re-define identifiers from 
that header file, rather than picking parameter names which don't conflict?


Without seeing the code, it sounds like Tim might be right: someone is trying 
to re-define the MPI_STATUS_SIZE parameter that is being defined by OMPI's 
mpif-config.h header file.  Regardless of include file/initialization ordering 
(i.e., regardless of whether mpif-config.h is the first or Nth entity to try to 
set this parameter), user code should never set this parameter value.

Or any symbol that begins with MPI_, for that matter.  The entire "MPI_" 
namespace is reserved for MPI.



I understand that, and I checked the code to make sure the programmer
didn't do anything stupid like that.

The entire code is only a few hundred lines in two different files. In
the entire program, there is only 1 include statement:

include 'mpif.h'

and MPI_STATUS_SIZE appears only once:

integer ierr,istatus(MPI_STATUS_SIZE)

I have limited knowledge of Fortran programming, but based on this, I
don't see how MPI_STATUS_SIZE could be getting overwritten.


Earlier, you showed a preceding PARAMETER declaration setting a new 
value for that name, which would be required to make use of it in this 
context.  Apparently, you intend to support only compilers which violate 
the Fortran standard by supporting a separate name space for PARAMETER 
identifiers, so that you can violate the MPI standard by using MPI_ 
identifiers in a manner which I believe is called shadowing in C.


--
Tim Prince


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Tim Prince

On 2/23/2011 6:41 AM, Prentice Bisbal wrote:



Tim Prince wrote:

On 2/22/2011 1:41 PM, Prentice Bisbal wrote:

One of the researchers I support is writing some Fortran code that uses
Open MPI. The code is being compiled with the Intel Fortran compiler.
This one line of code:

integer ierr,istatus(MPI_STATUS_SIZE)

leads to these errors:

$ mpif90 -o simplex simplexmain579m.for simplexsubs579
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
error #6406: Conflicting attributes or multiple declaration of name.
[MPI_STATUS_SIZE]
parameter (MPI_STATUS_SIZE=5)
-^
simplexmain579m.for(147): error #6591: An automatic object is invalid in
a main program.   [ISTATUS]
  integer ierr,istatus(MPI_STATUS_SIZE)
-^
simplexmain579m.for(147): error #6219: A specification expression object
must be a dummy argument, a COMMON block object, or an object accessible
through host or use association   [MPI_STATUS_SIZE]
  integer ierr,istatus(MPI_STATUS_SIZE)
-^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6756: A COMMON block data object must not be an automatic object.
[MPI_STATUS_IGNORE]
integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)
--^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6591: An automatic object is invalid in a main program.
[MPI_STATUS_IGNORE]
integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)


Any idea how to fix this? Is this a bug in the Intel compiler, or the
code?



I can't see the code from here.  The first failure to recognize the
PARAMETER definition apparently gives rise to the others.  According to
the message, you already used the name MPI_STATUS_SIZE in mpif-config.h
and now you are trying to give it another usage (not case sensitive) in
the same scope.  If so, it seems good that the compiler catches it.


I agree with your logic, but the problem is where the code containing
the error is coming from - it's comping from a header files that's a
part of Open MPI, which makes me think this is a cmpiler error, since
I'm sure there are plenty of people using the same header file. in their
code.


Are you certain that they all find it necessary to re-define identifiers 
from that header file, rather than picking parameter names which don't 
conflict?


--
Tim Prince


Re: [OMPI users] What's wrong with this code?

2011-02-22 Thread Tim Prince

On 2/22/2011 1:41 PM, Prentice Bisbal wrote:

One of the researchers I support is writing some Fortran code that uses
Open MPI. The code is being compiled with the Intel Fortran compiler.
This one line of code:

integer ierr,istatus(MPI_STATUS_SIZE)

leads to these errors:

$ mpif90 -o simplex simplexmain579m.for simplexsubs579
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
error #6406: Conflicting attributes or multiple declaration of name.
[MPI_STATUS_SIZE]
   parameter (MPI_STATUS_SIZE=5)
-^
simplexmain579m.for(147): error #6591: An automatic object is invalid in
a main program.   [ISTATUS]
 integer ierr,istatus(MPI_STATUS_SIZE)
-^
simplexmain579m.for(147): error #6219: A specification expression object
must be a dummy argument, a COMMON block object, or an object accessible
through host or use association   [MPI_STATUS_SIZE]
 integer ierr,istatus(MPI_STATUS_SIZE)
-^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6756: A COMMON block data object must not be an automatic object.
   [MPI_STATUS_IGNORE]
   integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)
--^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6591: An automatic object is invalid in a main program.
[MPI_STATUS_IGNORE]
   integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)


Any idea how to fix this? Is this a bug in the Intel compiler, or the code?



I can't see the code from here.  The first failure to recognize the 
PARAMETER definition apparently gives rise to the others.  According to 
the message, you already used the name MPI_STATUS_SIZE in mpif-config.h 
and now you are trying to give it another usage (not case sensitive) in 
the same scope.  If so, it seems good that the compiler catches it.

--
Tim Prince


Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Tim Prince

On 1/7/2011 6:49 AM, Jeff Squyres wrote:


My understanding is that hyperthreading can only be activated/deactivated at 
boot time -- once the core resources are allocated to hyperthreads, they can't 
be changed while running.

Whether disabling the hyperthreads or simply telling Linux not to schedule on 
them makes a difference performance-wise remains to be seen.  I've never had 
the time to do a little benchmarking to quantify the difference.  If someone 
could rustle up a few cycles (get it?) to test out what the real-world 
performance difference is between disabling hyperthreading in the BIOS vs. 
telling Linux to ignore the hyperthreads, that would be awesome.  I'd love to 
see such results.

My personal guess is that the difference is in the noise.  But that's a guess.

Applications which depend on availability of full size instruction 
lookaside buffer would be candidates for better performance with 
hyperthreads completely disabled.  Many HPC applications don't stress 
ITLB, but some do.
Most of the important resources are allocated dynamically between 
threads, but the ITLB is an exception.
We reported results of an investigation on Intel Nehalem 4-core 
hyperthreading where geometric mean performance of standard benchmarks 
for certain commercial applications was 2% better with hyperthreading 
disabled at boot time, compared with best 1 rank per core scheduling 
with hyperthreading enabled.  Needless to say, the report wasn't popular 
with marketing.  I haven't seen an equivalent investigation for the 
6-core CPUs, where various strange performance effects have been noted, 
so, as Jeff said, the hyperthreading effect could be "in the noise."



--
Tim Prince



Re: [OMPI users] Call to MPI_Test has large time-jitter

2010-12-18 Thread Tim Prince

On 12/17/2010 6:43 PM, Sashi Balasingam wrote:

Hi,
I recently started on an MPI-based, 'real-time', pipelined-processing 
application, and the application fails due to large time-jitter in 
sending and receiving messages. Here are related info -

1) Platform:
a) Intel Box: Two Hex-core, Intel Xeon, 2.668 GHz (...total of 12 cores),
b) OS: SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l)
c) MPI Rev: (OpenRTE) 1.4, (...Installed OFED package)
d) HCA: InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, 
PCIe 2.0 5GT/s] (rev a0)

2) Application detail
a) Launching 7 processes, for pipelined processing, where each process 
waits for a message (sizes vary between 1 KBytes to 26 KBytes),
then process the data, and outputs a message (sizes vary between 1 
KBytes to 26 KBytes), to next process.

b) MPI transport functions used : "MPI_Isend", MPI_Irecv, MPI_Test.
   i) For Receiving messages, I first make an MPI_Irecv call, followed 
by a busy-loop on MPI_Test, waiting for message
   ii) For Sending message, there is a busy-loop on MPI_Test to ensure 
prior buffer was sent, then use MPI_Isend.
c) When the job starts, all these 7 process are put in High priority 
mode ( SCHED_FIFO policy, with priority setting of 99).
The Job entails an input data packet stream (and a series of MPI 
messages), continually at 40 micro-sec rate, for a few minutes.


3) The Problem:
Most calls to MPI_Test (...which is non-blocking) takes a few 
micro-sec, but around 10% of the job, it has a large jitter, that vary 
from 1 to 100 odd millisec. This causes

some of the application input queues to fill-up  and cause a failure.
Any suggestions to look at on the MPI settings or OS config/issues 
will be much appreciated.


I didn't see anything there about your -mca affinity settings.  Even if 
the defaults don't choose optimum mapping, it's way better than allowing 
them to float as you would with multiple independent jobs running.


--
Tim Prince



Re: [OMPI users] Mac Ifort and gfortran together

2010-12-15 Thread Tim Prince

On 12/15/2010 8:22 PM, Jeff Squyres wrote:

Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, 
and the MPI Forum meeting last week...


On Nov 29, 2010, at 2:12 PM, David Robertson wrote:


I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use 
both Intel Ifort 11.1 and gfortran 4.3 on the same machine and switch between 
them to test and debug code.

I had runtime problems when I compiled openmpi in my usual way of no shared 
libraries so I switched to shared and it runs now.

What problems did you have?  OMPI should work fine when compiled statically.


However, in order for it to work with ifort I ended up needing to add the 
location of my intel compiled Open MPI libraries (/opt/intelsoft/openmpi/lib) 
to my DYLD_LIBRARY_PATH environment variable to to get codes to compile and/or 
run with ifort.

Is this what Intel recommends for anything compiled with ifort on OS X, or is 
this unique to OMPI-compiled MPI applications?


The problem is that adding /opt/intelsoft/openmpi/lib to DYLD_LIBRARY_PATH 
broke my Open MPI for gfortran. Now when I try to compile with mpif90 for 
gfortran it thinks it's actually trying to compile with ifort still. As soon as 
I take the above path out of DYLD_LIBRARY_PATH everything works fine.

Also, when I run ompi_info everything looks right except prefix. It says 
/opt/intelsoft/openmpi rather than /opt/gfortransoft/openmpi like it should. It 
should be noted that having /opt/intelsoft/openmpi in LD_LIBRARY_PATH does not 
produce the same effect.

I'm not quite clear on your setup, but it *sounds* like you're somehow mixing 
up 2 different installations of OMPI -- one in /opt/intelsoft and the other in 
/opt/gfortransoft.

Can you verify that you're using the "right" mpif77 (and friends) when you 
intend to, and so on?

Well, yes, he has to use the MPI Fortran libraries compiled by ifort 
with his ifort application build, and the ones compiled by gfortran with 
a gfortran application build.  There's nothing "strange" about it; the 
PATH for mpif90 and DYLD_LIBRARY_PATH for the Fortran library have to be 
set correctly for each case.  If linking statically with the MPI Fortran 
library, you still must choose the one built with the compatible 
Fortran.  gfortran and ifort can share C run-time libraries but not the 
Fortran ones.  It's the same as on linux (and, likely, Windows).


--
Tim Prince



Re: [OMPI users] meaning of MPI_THREAD_*

2010-12-06 Thread Tim Prince

On 12/6/2010 3:16 AM, Hicham Mouline wrote:

Hello,

1. MPI_THREAD_SINGLE: Only one thread will execute.
Does this really mean the process cannot have any other threads at all, even if 
they doen't deal with MPI at all?
I'm curious as to how this case affects the openmpi implementation?
Essentially, what is the difference between MPI_THREAD_SINGLE and 
MPI_THREAD_FUNNELED?

2. In my case, I'm interested in MPI_THREAD_SERIALIZED. However if it's 
available, I can use MPI_THREAD_FUNNELED.
What cmake flags do I need to enable to allow this mode?

3. Assume I assign only 1 thread in my program to deal with MPI. What is the 
difference between
int MPI::Init_thread(MPI_THREAD_SINGLE)
int MPI::Init_thread(MPI_THREAD_FUNNELED)
int MPI::Init()

You're question is too broad; perhaps you didn't intend it that way.  
Are you trying to do something which may work only with a specific 
version of openmpi, or are you willing to adhere to portable practice?

I tend to believe what it says at
http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm 


including:
A call to MPI_INIT has the same effect as a call to MPI_INIT_THREAD with 
a required = MPI_THREAD_SINGLE


You would likely use one of those if all your MPI calls are from a 
single thread, and you don't perform any threading inside MPI.  MPI 
implementations vary on the extent to which a higher level of threading 
than what is declared can be used successfully (there's no guarantee of 
bad results if you exceed what was set by MPI_INIT).   There shouldn't 
be any bad effect from setting a higher level of thread support which 
you never use.


I would think your question about cmake flags would apply only once you 
chose a compiler.  I have never seen anyone try mixing 
auto-parallelization with MPI; that would require MPI_THREAD_MULTIPLE 
but still appears unpredictable.  MPI_THREAD_FUNNELED is used often with 
OpenMP parallelization inside MPI.


--
Tim Prince



Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits

2010-11-30 Thread Tim Prince

On 11/29/2010 3:03 PM, Gus Correa wrote:

Jeff Squyres wrote:






1- ./configure FC=ifort F77=ifort CC=icc CXX=icpc

2-make all

3 sudo make install all

os passos 1 e 2 operam normalmente, mas quando uso o comando make 
install 

aparece o erro que nao consigo solucionar.


You say only step 3 above fails.
You could try "sudo -E make install".

I take it that sudo -E should copy over the environment variable 
settings.  I haven't been able to find any documentation of this option, 
and I don't currently have an Ubuntu installation to check it.

Not being aware of such an option, I used to do:
sudo
source .. compilervars.sh
make install

--
Tim Prince



Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits

2010-11-29 Thread Tim Prince

On 11/29/2010 11:31 AM, Gus Correa wrote:

Hi Mauricio

Check if you have icc (in the Intel compiler bin 
directory/subdirectories).


Check also if it is in your PATH environment variable.
"which icc" will tell.
If not, add it to PATH.

Actually, the right way to do it
is to run the Intel scripts to set the whole compiler environment,
not only PATH.
The scripts should be called something like iccvars.csh  iccvars.sh 
for C/C++ and  ifortvars.csh  ifortvars.sh for Fortran, and are also 
in the Intel bin directory.


You can source these scripts in your .cshrc/.bashrc file,
using the correct shell (.sh if you use [ba]sh, .csh if you use [t]csh).
This is in the Intel compiler documentation, take a look.
For the icc version mentioned, there is a compilervars.[c]sh which takes 
care of both C++ and Fortran (if present), as do either of the iccvars 
or ifortvars, when the compilers are installed in the same directory.


Also, you can compile OpenMPI with gcc,g++ and gfortran, if you want.
If they are not yet installed in your Ubuntu, you can get them with 
apt-get, or whatever Ubuntu uses to get packages.


icc ought to work interchangeably with gcc, provided the same g++ 
version is always on PATH. icc doesn't work without the g++.  Thus, it 
is entirely reasonable to build openmpi with gcc and use either gcc or 
icc to build the application.  gfortran and ifort, however, involve 
incompatible run-time libraries, and the openmpi fortran libraries won't 
be interchangeable.


You must take care not to mix 32- and 64-bit compilers/libraries.  
Normally you would build everything 64-bit, both openmpi and the 
application.  Ubuntu doesn't follow the standard scheme for location of 
32-bit vs. 64-bit compilers and libraries, but the Intel compiler 
version you mentioned should resolve this automatically.


--
Tim Prince



Re: [OMPI users] link problem on 64bit platform

2010-11-01 Thread Tim Prince

On 11/1/2010 5:24 AM, Jeff Squyres wrote:

On Nov 1, 2010, at 5:20 AM, jody wrote:


jody@aim-squid_0 ~/progs $ mpiCC -g -o HelloMPI HelloMPI.cpp
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/../../../../x86_64-pc-linux-gnu/bin/ld:
skipping incompatible /opt/openmpi-1.4.2/lib/libmpi_cxx.so when
searching for -lmpi_cxx

This is the key message -- it found libmpi_cxx.so, but the linker deemed it 
incompatible, so it skipped it.
Typically, it means that the cited library is a 32-bit one, to which the 
64-bit ld will react in this way.  You could have verified this by

file /opt/openmpi-1.4.2/lib/*
By normal linux conventions a directory named /lib/ as opposed to 
/lib64/ would contain only 32-bit libraries.  If gentoo doesn't conform 
with those conventions, maybe you should do your learning on a distro 
which does.


--
Tim Prince



Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran

2010-10-06 Thread Tim Prince

 On 10/6/2010 12:09 AM, Götz Waschk wrote:

libtool: link: mpif90 -shared  .libs/H5f90global.o
.libs/H5fortran_types.o .libs/H5_ff.o .libs/H5Aff.o .libs/H5Dff.o
.libs/H5Eff.o .libs/H5Fff.o .libs/H5Gff.o .libs/H5Iff.o .libs/H5Lff.o
.libs/H5Off.o .libs/H5Pff.o .libs/H5Rff.o .libs/H5Sff.o .libs/H5Tff.o
.libs/H5Zff.o .libs/H5_DBLE_InterfaceInclude.o .libs/H5f90kit.o
.libs/H5_f.o .libs/H5Af.o .libs/H5Df.o .libs/H5Ef.o .libs/H5Ff.o
.libs/H5Gf.o .libs/H5If.o .libs/H5Lf.o .libs/H5Of.o .libs/H5Pf.o
.libs/H5Rf.o .libs/H5Sf.o .libs/H5Tf.o .libs/H5Zf.o .libs/H5FDmpiof.o
.libs/HDF5mpio.o .libs/H5FDmpioff.o-lmpi -lsz -lz -lm  -m64
-mtune=generic -rpath=/usr/lib64/openmpi/1.4-icc/lib   -soname
libhdf5_fortran.so.6 -o .libs/libhdf5_fortran.so.6.0.4
ifort: command line warning #10156: ignoring option '-r'; no argument required
ifort: command line warning #10156: ignoring option '-s'; no argument required
ld: libhdf5_fortran.so.6: No such file: No such file or directory

Do -Wl,-rpath and -Wl,-soname= work any better?

--
Tim Prince



Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince

 On 9/27/2010 2:50 PM, David Singleton wrote:

On 09/28/2010 06:52 AM, Tim Prince wrote:

On 9/27/2010 12:21 PM, Gabriele Fatigati wrote:

HI Tim,

I have read that link, but I haven't understood if enabling processor
affinity are enabled also memory affinity because is written that:

"Note that memory affinity support is enabled only when processor
affinity is enabled"

Can i set processory affinity without memory affinity? This is my
question..


2010/9/27 Tim Prince<n...@aol.com>

On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:

if OpenMPI is numa-compiled, memory affinity is enabled by default?
Because I didn't find memory affinity alone ( similar) parameter to
set at 1.



The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
has a useful introduction to affinity. It's available in a default
build, but not enabled by default.


Memory affinity is implied by processor affinity. Your system libraries
are set up so as to cause any memory allocated to be made local to the
processor, if possible. That's one of the primary benefits of processor
affinity. Not being an expert in openmpi, I assume, in the absence of
further easily accessible documentation, there's no useful explicit way
to disable maffinity while using paffinity on platforms other than the
specified legacy platforms.



Memory allocation policy really needs to be independent of processor
binding policy.  The default memory policy (memory affinity) of "attempt
to allocate to the NUMA node of the cpu that made the allocation request
but fallback as needed" is flawed in a number of situations.  This is 
true
even when MPI jobs are given dedicated access to processors.  A common 
one is

where the local NUMA node is full of pagecache pages (from the checkpoint
of the last job to complete).  For those sites that support 
suspend/resume

based scheduling, NUMA nodes will generally contain pages from suspended
jobs. Ideally, the new (suspending) job should suffer a little bit of 
paging
overhead (pushing out the suspended job) to get ideal memory placement 
for

the next 6 or whatever hours of execution.

An mbind (MPOL_BIND) policy of binding to the one local NUMA node will 
not
work in the case of one process requiring more memory than that local 
NUMA

node.  One scenario is a master-slave where you might want:
  master (rank 0) bound to processor 0 but not memory bound
  slave (rank i) bound to processor i and memory bound to the local 
memory

of processor i.

They really are independent requirements.

Cheers,
David

___
interesting; I agree with those of your points on which I have enough 
experience to have an opinion.
However, the original question was not whether it would be desirable to 
have independent memory affinity, but whether it is possible currently 
within openmpi to avoid memory placements being influenced by processor 
affinity.
I have seen the case you mention, where performance of a long job 
suffers because the state of memory from a previous job results in an 
abnormal number of allocations falling over to other NUMA nodes, but I 
don't know the practical solution.


--
Tim Prince



Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince

 On 9/27/2010 12:21 PM, Gabriele Fatigati wrote:

HI Tim,

I have read that link, but I haven't understood if enabling processor
affinity are enabled also memory affinity because is written that:

"Note that memory affinity support is enabled only when processor
affinity is enabled"

Can i set processory affinity without memory affinity? This is my question..


2010/9/27 Tim Prince<n...@aol.com>

  On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:

if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I 
didn't find memory affinity alone ( similar)  parameter to set at 1.



  The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a 
useful introduction to affinity.  It's available in a default build, but not 
enabled by default.

Memory affinity is implied by processor affinity.  Your system libraries 
are set up so as to cause any memory allocated to be made local to the 
processor, if possible.  That's one of the primary benefits of processor 
affinity.  Not being an expert in openmpi, I assume, in the absence of 
further easily accessible documentation, there's no useful explicit way 
to disable maffinity while using paffinity on platforms other than the 
specified legacy platforms.


--
Tim Prince



Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince

 On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:


if OpenMPI is numa-compiled, memory affinity is enabled by default? 
Because I didn't find memory affinity alone ( similar)  parameter to 
set at 1.



 The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity 
has a useful introduction to affinity.  It's available in a default 
build, but not enabled by default.


If you mean something other than this, explanation is needed as part of 
your question.
taskset() or numactl() might be relevant, if you require more detailed 
control.


--
Tim Prince



Re: [OMPI users] send and receive buffer the same on root

2010-09-16 Thread Tim Prince

 On 9/16/2010 9:58 AM, David Zhang wrote:
It's compiler specific I think.  I've done this with OpenMPI no 
problem, however on one another cluster with ifort I've gotten error 
messages about not using MPI_IN_PLACE.  So I think if it compiles, it 
should work fine.


On Thu, Sep 16, 2010 at 10:01 AM, Tom Rosmond <rosm...@reachone.com 
<mailto:rosm...@reachone.com>> wrote:


I am working with a Fortran 90 code with many MPI calls like this:

call mpi_gatherv(x,nsize(rank+1),
mpi_real,x,nsize,nstep,mpi_real,root,mpi_comm_world,mstat)

Compiler can't affect what happens here (unless maybe you use x again 
somewhere).  Maybe you mean MPI library?  Intel MPI probably checks this 
at run time and issues an error.
I've dealt with run-time errors (which surfaced along with an ifort 
upgrade) which caused silent failure (incorrect numerics) on openmpi but 
a fatal diagnostic from Intel MPI run-time, due to multiple uses of the 
same buffer.Moral: even if it works for you now with openmpi, you 
could be setting up for unexpected failure in the future.


--
Tim Prince



Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Tim Prince

On 8/12/2010 6:04 PM, Michael E. Thomadakis wrote:

On 08/12/10 18:59, Tim Prince wrote:

On 8/12/2010 3:27 PM, Ralph Castain wrote:
Ick - talk about confusing! I suppose there must be -some- rational 
reason why someone would want to do this, but I can't imagine what 
it would be


I'm no expert on compiler vs lib confusion, but some of my own 
experience would say that this is a bad idea regardless of whether 
or not OMPI is involved. Compiler version interoperability is 
usually questionable, depending upon how far apart the rev levels are.


Only answer I can offer is that you would have to try it. It will 
undoubtedly be a case-by-case basis: some combinations might work, 
others might fail.



On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote:


Hello OpenMPI,

we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem 
cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, 
and one user has the following request:


Can we build OpenMPI version say O.1 against Intel compilers 
version say I.1 but  then built an application with OpenMPI O.1 BUT 
then use a DIFFERENT Intel compiler version say I.2 to built and 
run this MPI application?


I suggested to him to 1) simply try to built and run the 
application with O.1 but use Intel compilers version I.X whatever 
this X is and see if it has any issues.


OR 2) If the above does not work, I would build OpenMPI O.1 against 
Intel version I.X so he can use THIS combination for his 
hypothetical application.


He insists that I build OpenMPI O.1 with some version of Intel 
compilers I.Y but then at run time he would like to use *different* 
Intel run time libs at will I.Z <> I.X.


Can you provide me with a suggestion for a sane solution to this ? :-)

Best regards

Michael
Guessing at what is meant here, if you build MPI with a given version 
of Intel compilers, it ought to work when the application is built 
with a similar or more recent Intel compiler, or when the run-time 
LD_LIBRARY_PATH refers to a similar or newer library (within reason). 
There are similar constraints on glibc version.  "Within reason" 
works over a more restricted range when C++ is involved.  Note that 
the Intel linux compilers link to the gcc and glibc libraries as well 
as those which come with the compiler, and the MPI could be built 
with a combination of gcc and ifort to work with icc or gcc and 
ifort.  gfortran and ifort libraries, however, are incompatible, 
except that libgomp calls can be supported by libiomp5.
The "rational" use I can see is that an application programmer would 
likely wish to test a range of compilers without rebuilding MPI.  
Intel documentation says there is forward compatibility testing of 
libraries, at least to the extent that a build made with 10.1 would 
work with 11.1 libraries.
The most recent Intel library compatibility break was between MKL 9 
and 10.




Dear Tim, I offered to provide myself the combination of OMPI+ Intel 
compilers so that application can use it in stable fashion. When I 
inquired about this application so I can look into this I was told 
that "there is NO application yet (!) that fails but just in case it 
fails ..." I was asked to hack into the OMPI  building process to let 
OMPI use one run-time but then the MPI application using this OMPI ... 
use another!



Thanks for the information on this. We indeed use Intel Compiler set 
11.1.XXX + OMPI 1.4.1 and 1.4.2.


The basic motive in this hypothetical situation is to build the MPI 
application ONCE and then swap run-time libs as newer compilers come 
out I am certain that even if one can get away with it with nearby 
run-time versions there is no guarantee of the stability at-infinitum. 
I end up having to spent more time for technically "awkward" requests 
than the reasonable ones. Reminds me when I was a teacher I had to 
spent more time with all the people trying to avoid doing the work 
than with the good students... hmmm :-)


According to my understanding, your application (or MPI) built with an 
Intel 11.1 compiler should continue working with future Intel 11.1 and 
12.x libraries.  I don't expect Intel to test or support this 
compatibility beyond that.
You will likely want to upgrade your OpenMPI earlier than the time when 
Intel compiler changes require a new MPI build.
If the interest is in getting performance benefits of future hardware 
simply by installing new dynamic libraries without rebuilding an 
application, Intel MKL is the most likely favorable scenario.  The MKL 
with optimizations for AVX is already in  beta test, and should work as 
a direct replacement for the MKL in current releases.


--
Tim Prince



Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Tim Prince

On 8/12/2010 3:27 PM, Ralph Castain wrote:
Ick - talk about confusing! I suppose there must be -some- rational 
reason why someone would want to do this, but I can't imagine what it 
would be


I'm no expert on compiler vs lib confusion, but some of my own 
experience would say that this is a bad idea regardless of whether or 
not OMPI is involved. Compiler version interoperability is usually 
questionable, depending upon how far apart the rev levels are.


Only answer I can offer is that you would have to try it. It will 
undoubtedly be a case-by-case basis: some combinations might work, 
others might fail.



On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote:


Hello OpenMPI,

we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster 
using Intel compilers V 11.1.059 and 11.1.072 respectively, and one 
user has the following request:


Can we build OpenMPI version say O.1 against Intel compilers version 
say I.1 but  then built an application with OpenMPI O.1 BUT then use 
a DIFFERENT Intel compiler version say I.2 to built and run this MPI 
application?


I suggested to him to 1) simply try to built and run the application 
with O.1 but use Intel compilers version I.X whatever this X is and 
see if it has any issues.


OR 2) If the above does not work, I would build OpenMPI O.1 against 
Intel version I.X so he can use THIS combination for his hypothetical 
application.


He insists that I build OpenMPI O.1 with some version of Intel 
compilers I.Y but then at run time he would like to use *different* 
Intel run time libs at will I.Z <> I.X.


Can you provide me with a suggestion for a sane solution to this ? :-)

Best regards

Michael
Guessing at what is meant here, if you build MPI with a given version of 
Intel compilers, it ought to work when the application is built with a 
similar or more recent Intel compiler, or when the run-time 
LD_LIBRARY_PATH refers to a similar or newer library (within reason). 
There are similar constraints on glibc version.  "Within reason" works 
over a more restricted range when C++ is involved.  Note that the Intel 
linux compilers link to the gcc and glibc libraries as well as those 
which come with the compiler, and the MPI could be built with a 
combination of gcc and ifort to work with icc or gcc and ifort.  
gfortran and ifort libraries, however, are incompatible, except that 
libgomp calls can be supported by libiomp5.
The "rational" use I can see is that an application programmer would 
likely wish to test a range of compilers without rebuilding MPI.  Intel 
documentation says there is forward compatibility testing of libraries, 
at least to the extent that a build made with 10.1 would work with 11.1 
libraries.

The most recent Intel library compatibility break was between MKL 9 and 10.



--
Tim Prince



Re: [OMPI users] Help on the big picture..

2010-07-23 Thread Tim Prince

On 7/22/2010 4:11 PM, Gus Correa wrote:

Hi Cristobal

Cristobal Navarro wrote:

yes,
i was aware of the big difference hehe.

now that openMP and openMPI is in talk, i've alwyas wondered if its a
good idea to model a solution on the following way, using both openMP
and openMPI.
suppose you have n nodes, each node has a quadcore, (so you have n*4 
processors)

launch n proceses acorrding to the n nodes available.
set a resource manager like SGE to fill the n*4 slots using round robin.
on each process, make use of the other cores available on the node,
with openMP.

if this is possible, then on each one could make use fo the shared
memory model locally at each node, evading unnecesary I/O through the
nwetwork, what do you think?

Before asking what we think about this, please check the many references 
posted on this subject over the last decade.  Then refine your question 
to what you are interested in hearing about; evidently you have no 
interest in much of this topic.


Yes, it is possible, and many of the atmosphere/oceans/climate codes
that we run is written with this capability. In other areas of
science and engineering this is probably the case too.

However, this is not necessarily better/faster/simpler than dedicate 
all the cores to MPI processes.


In my view, this is due to:

1) OpenMP has a different scope than MPI,
and to some extent is limited by more stringent requirements than MPI;

2) Most modern MPI implementations (and OpenMPI is an example) use 
shared memory mechanisms to communicate between processes that reside

in a single physical node/computer;
The shared memory communication of several MPI implementations does 
greatly improve efficiency of message passing among ranks assigned to 
the same node.  However, these ranks also communicate with ranks on 
other nodes, so there is a large potential advantage for hybrid 
MPI/OpenMP as the number of cores in use increases.  If you aren't 
interested in running on more than 8 nodes or so, perhaps you won't care 
about this.


3) Writing hybrid code with MPI and OpenMP requires more effort,
and much care so as not to let the two forms of parallelism step on
each other's toes.
The MPI standard specifies the use of MPI_init_thread to indicate which 
combination of MPI and threading you intend to use, and to inquire 
whether that model is supported by the active MPI.
In the case where there is only 1 MPI process per node (possibly using 
several cores via OpenMP threading) there is no requirement for special 
affinity support.
If there is more than 1 FUNNELED rank per multiple CPU node, it becomes 
important to maintain cache locality for each rank.


OpenMP operates mostly through compiler directives/pragmas interspersed
on the code.  For instance, you can parallelize inner loops in no time,
granted that there are no data dependencies across the commands within 
the loop.  All it takes is to write one or two directive/pragma lines.

More than loop parallelization can be done with OpenMP, of course,
although not as much as can be done with MPI.
Still, with OpenMP, you are restricted to work in a shared memory 
environment.


By contrast, MPI requires more effort to program, but it takes advantage
of shared memory and networked environments
(and perhaps extended grids too).



snipped tons of stuff rather than attempt to reconcile top postings

--
Tim Prince



Re: [OMPI users] is loop unrolling safe for MPI logic?

2010-07-19 Thread Tim Prince

On 7/18/2010 9:09 AM, Anton Shterenlikht wrote:

On Sat, Jul 17, 2010 at 09:14:11AM -0700, Eugene Loh wrote:
   

Jeff Squyres wrote:

 

On Jul 17, 2010, at 4:22 AM, Anton Shterenlikht wrote:


   

Is loop vectorisation/unrolling safe for MPI logic?
I presume it is, but are there situations where
loop vectorisation could e.g. violate the order
of execution of MPI calls?


 

I *assume* that the intel compiler will not unroll loops that contain MPI 
function calls.  That's obviously an assumption, but I would think that unless 
you put some pragmas in there that tell the compiler that it's safe to unroll, 
the compiler will be somewhat conservative about what it automatically unrolls.


   

More generally, a Fortran compiler that optimizes aggressively could
"break" MPI code.

http://www.mpi-forum.org/docs/mpi-20-html/node236.htm#Node241

That said, you may not need to worry about this in your particular case.
 

This is a very important point, many thanks Eugene.
Fortran MPI programmer definitely needs to pay attention to this.

MPI-2.2 provides a slightly updated version of this guide:

http://www.mpi-forum.org/docs/mpi22-report/node343.htm#Node348

many thanks
anton

   
From the point of view of the compiler developers, auto-vectorization 
and unrolling are distinct questions.  An MPI or other non-inlined 
function call would not be subject to vectorization.  While 
auto-vectorization or unrolling may expose latent bugs, MPI is not 
particularly likely to make them worse.  You have made some misleading 
statements about vectorization along the way, but these aren't likely to 
relate to MPI problems.
Upon my return, I will be working on a case which was developed and 
tested succeessfully under ifort 10.1 and other compilers, which is 
failing under current ifort versions.  Current Intel MPI throws a run 
time error indicating that the receive buffer has been lost; the openmpi 
failure is more obscure.  I will have to change the code to use distinct 
tags for each MPI send/receive pair in order to track it down.  I'm not 
counting on that magically making the bug go away.  ifort is not 
particularly aggressive about unrolling loops which contain MPI calls, 
but I agree that must be considered.


--
Tim Prince



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-10 Thread Tim Prince

On 5/9/2010 8:45 PM, Terry Frankcombe wrote:
   

I don't know what Jeff meant by that, but we haven't seen a feasible way
of disabling HT without rebooting and using the BIOS options.
 

According to this page:
http://dag.wieers.com/blog/is-hyper-threading-enabled-on-a-linux-system
in RHEL5/CentOS-5 it's easy to switch it on and off on the fly.
___
   
That's the same as Jeff explained.  It requires root privilege, and 
affects all users.


--
Tim  Prince



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-07 Thread Tim Prince

On 5/6/2010 10:30 PM, John Hearns wrote:

On 7 May 2010 03:17, Jeff Squyres<jsquy...@cisco.com>  wrote:

   

Indeed.  I have seen some people have HT enabled in the bios just so that they 
can have the software option of turning them off via linux -- then you can run 
with HT and without it and see what it does to your specific codes.
 

I may have missed this on the thread, but how do you do that?
The Nehalem systems I have came delivered with HT enabled in the BIOS
- I know it is not a real pain to reboot and configure, but it would
be a lot easir to leave it on and switch off in software - also if you
wanted to do back-to-back testing of performance with/without HT.

___
   
I don't know what Jeff meant by that, but we haven't seen a feasible way 
of disabling HT without rebooting and using the BIOS options.  It is 
feasible to place 1 MPI process or thread per core.  With careful 
affinity, performance when using 1 logical per core normally is 
practically the same as with HT disabled.



--
Tim Prince



Re: [OMPI users] Fortran support on Windows Open-MPI

2010-05-07 Thread Tim Prince

On 5/6/2010 9:07 PM, Trent Creekmore wrote:

Compaq Visual Fortan for Windows was out, but HP aquired Compaq. HP, later
deciding they did not want it, along with the Alpha processor techonology,
sold them to Intel. So now it's Intel Visual Fortran Compiler for Windows.
In addition, if you don't want that package, instead they do sell a plug-in
for Microsoft Visual Studio. There is also a HPC/Parallel enviroment too for
Visual Studio, but none of these are cheap.

I don't see why you can't include Open MPI libraries in that enviroment.

Trent


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Damien
Sent: Thursday, May 06, 2010 10:53 PM
To: us...@open-mpi.org
Subject: [OMPI users] Fortran support on Windows Open-MPI

Hi all,

Can anyone tell me what the plans are for Fortran 90 support on Windows,
with say the Intel compilers?  I need to get MUMPS built and running
using Open-MPI, with Visual Studio and Intel 11.1.  I know Fortran isn't
part of the regular CMake build for Windows.  If someone's working on
this I'm happy to test or help out.

Damien
___
   
I'm not certain whether the top-post is intended as a reply to the 
original post, but I feel I must protest efforts to add confusion.  
Looking at the instructions for building on Windows, it appears that 
several routes have been taken with reported success, not including 
commercial Fortran.  It seems it should not be a major task to include 
gfortran in the cygwin build.
HP never transferred ownership of Compaq Fortran, not that it's relevant 
to the discussion.
The most popular open source MPI for commercial Windows Fortran has been 
Argonne MPICH2, which offers a pre-built version compatible with Intel 
Fortran.   Intel also offers MPI, derived originally from Argonne 
MPICH2, for both Windows and linux.
I can't imagine OpenMPI libraries being added to the Microsoft HPC 
environment; maybe that's not exactly what the top poster meant.


--
Tim Prince



Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Tim Prince

On 4/26/2010 2:31 AM, Asad Ali wrote:



On Mon, Apr 26, 2010 at 8:01 PM, Ashley Pittman <ash...@pittman.co.uk 
<mailto:ash...@pittman.co.uk>> wrote:



On 25 Apr 2010, at 22:27, Asad Ali wrote:

> Yes I use different machines such as
>
> machine 1 uses AMD Opterons. (Fedora)
>
> machine 2 and 3 use Intel Xeons. (CentOS)
>
> machine 4 uses slightly older Intel Xeons. (Debian)
>
> Only machine 1 gives correct results.  While CentOS and Debian
results are same but are wrong and different from those of machine 1.

Have you verified the are actually wrong or are they just
different?  It's actually perfectly possible for the same program
to get different results from run to run even on the same hardware
and the same OS.  All floating point operations by the MPI library
are expected to be deterministic but changing the process layout
or and MPI settings can affect this and of course anything the
application does can introduce differences as well.

Ashley.


The code is the same with the same input/output and the same constants 
etc. From run to run the results can only be different if you either 
use different input/output or use different random number seeds. Here 
in my case the random number seeds are the same as well. This means 
that this code must give (and it does) the same results no matter how 
many times you run it. I didn't tamper with mpi-settings for any run. 
I have verified that results of only Fedora are correct because I know 
what is in my data and how should my model behave and I get a nearly 
perfect convergence on Fedora OS. Even my dual core laptop with Ubuntu 
9.10 also gives correct results. The other OSs give the same results 
for a few hundred iterations as Fedora but then an unusual thing 
happens and the results start getting wrong.
If you're really interested in solving your "problem,"  you'll have to 
consider important details such as which compiler was used, which 
options (e.g. 387 vs. sse), run-time setting of x87 or SSE control 
registers, 32- vs. 64-bit compilation.  SSE2 is the default for 64-bit 
compilation, but compilers vary on defaults for 32-bit.  If your program 
depends on x87 extra precision of doubles, or efficient mixing of double 
and long double, 387 code may be a better choice, but limits your 
efficiency.


--
Tim Prince



Re: [OMPI users] OpenMPI multithreaded performance

2010-04-07 Thread Tim Prince

On 4/7/2010 1:20 AM, Piero Lanucara wrote:


Dear OpenMPI team
hiw much performances we should expect using MPI multithread 
capability (MPI_init_thread in multiple format).
It seems that no performance exist using some simple test like 
multiple mpi channel activated, overlapping comm and computation and 
so on


Maybe I don't understand your question.  Are you saying that none of the 
references found by search terms such as "hybrid mpi openmp" are useful 
for you?  They cover so many topics, you would have to be much more 
specific about which topics you want in more detail.


--
Tim Prince



Re: [OMPI users] OpenMPI/NAG Fortran: Missing libf52.so.1

2010-03-17 Thread Tim Prince

On 3/16/2010 11:22 PM, Vedran Coralic wrote:


Now, I think I know what the problem is. Basically, the NAG Fortran 
compiler and its libraries are only available on the master node so 
that the remaining nodes cannot access/find the required files. From 
my understanding, the only way to fix this would be put to copy the 
NAG Fortran compiler to all of the nodes in the cluster.

Don't NAG provide static copies of their libraries?
Yes, if you link the dynamic libraries, you must make them visible on 
each node, with the path set in LD_LIBRARY_PATH.  On such a small 
cluster, (or with a fast shared file system), a usual way is to put them 
in a directory mounted across all nodes.
Since you talk about a "work-around," you can copy the library folder to 
your own file system for each node, to check that you've got the hang of it.
The LD_LIBRARY_PATH setting can be done in your user settings so it 
doesn't affect anyone else.


--
Tim Prince



Re: [OMPI users] mpirun only works when -np <4

2009-12-08 Thread Tim Prince

Gus Correa wrote:

Hi Matthew




5) Are you setting processor affinity on mpiexec?

mpiexec -mca mpi_paffinity_alone 1 -np  ... bla, bla ...

Good point.  This option optimizes processor affinity on the assumption 
that no other jobs are running.  If you ran 2 MPI jobs with this option, 
they would attempt to use the same logical processors, rather than 
spreading the work effectively.
I have doubts whether the mpi_affinity could be relied upon with 
HyperThreading enabled; it would work OK if it understood how to avoid 
multiple processes on the same core.
If you don't find an option inside openmpi to specify which logicals 
your jobs should use, you could do it by mpiexec -np 4 taskset...
taking care to use a different core for each process (also different 
between jobs running together).  You would have to check on your machine 
whether the taskset options would be such as -c 0,2,4,6  for separate 
cores on one package and -c 8,10,12,14 for the other, or some other 
scheme.  /proc/cpuinfo would give valuable clues, even more 
/usr/sbin/irqbalance -debug (or wherever it lives on your system).
Without affinity setting, you could also run into problems when running 
out of individual cores and forcing some pairs of processes to run 
(quite slowly) on single cores, while others run full speed on other cores.


Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread Tim Prince

amjad ali wrote:

Hi,
thanks T.Prince,

Your saying:
"I'll just mention that we are well into the era of 3 levels of 
programming parallelization:  vectorization, threaded parallel (e.g. 
OpenMP), and process parallel (e.g. MPI)."  is a really great new 
learning for me. Now I can perceive better.



Can you please explain a bit about:

" This application gains significant benefit from cache blocking, so 
vectorization has more opportunity to gain than for applications which 
have less memory locality."


So now should I conclude from your reply that if we have single core 
processor in a PC, even than we can get benefit of Auto-Vectorization? 
And we do not need free cores for getting benefit of auto-vectorization?


Thank you very much.
Yes, we were using auto-vectorization from before the beginnings of MPI 
back in the days of single core CPUs; in fact, it would often show a 
greater gain than it did on later multi-core CPUs.
The reason for greater effectiveness of auto-vectorization with cache 
blocking and possibly with single core CPUs would be less saturation of 
memory buss.


Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread Tim Prince

amjad ali wrote:

Hi,
Suppose we run a parallel MPI code with 64 processes on a cluster, say 
of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node.


Now all the 64 cores on the cluster running a process. Program is SPMD, 
means all processes has the same workload.


Now if we had done auto-vectorization while compiling the code (for 
example with Intel compilers); Will there be any benefit 
(efficiency/scalability improvement) of having code with the 
auto-vectorization? Or we will get the same performance as without 
Auto-vectorization in this example case?
MEANS THAT if we do not have free cpu cores in a PC or cluster (all 
cores are running MPI processes), still the auto-vertorization is 
beneficial? Or it is beneficial only if we have some free cpu cores 
locally?



How can we really get benefit in performance improvement with 
Auto-Vectorization?


Auto-vectorization should give similar performance benefit under MPI as 
it does in a single process.  That's about all that can be said when you 
say nothing about the nature of your application.  This assumes that 
your MPI domain decomposition, which may not be highly vectorizable, 
doesn't take up too large a fraction of elapsed time.  By the same 
token, auto-vectorization techniques aren't specific to MPI 
applications, so an in-depth treatment isn't topical here.
I'll just mention that we are well into the era of 3 levels of 
programming parallelization:  vectorization, threaded parallel (e.g. 
OpenMP), and process parallel (e.g. MPI).
For an application which I work on, 8 nodes with auto-vectorization give 
about the performance of 12 nodes without, so compilers without 
auto-vectorization capability for such applications fell by the wayside 
a decade ago.  This application gains significant benefit from cache 
blocking, so vectorization has more opportunity to gain than for 
applications which have less memory locality.
I have not seen an application which was effectively vectorized which 
also gained from HyperThreading, but the gain for vectorization should 
be significantly greater than could be gained from HyperThreading. It's 
also common that vectorization gains more on lower clock speed/cheaper 
CPU models (of the same architecture), enabling lower cost of purchase 
or power consumption, but that's true of all forms of parallelization.
Some applications can be vectorized effectively by any of the popular 
auto-vectorizing compilers, including recent gnu compilers, while others 
show much more gain with certain compilers, such as Intel, PGI, or Open64.