[OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Michael E. Thomadakis

 Hello OpenMPI,

we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster 
using Intel compilers V 11.1.059 and 11.1.072 respectively, and one user 
has the following request:


Can we build OpenMPI version say O.1 against Intel compilers version say 
I.1 but  then built an application with OpenMPI O.1 BUT then use a 
DIFFERENT Intel compiler version say I.2 to built and run this MPI 
application?


I suggested to him to 1) simply try to built and run the application 
with O.1 but use Intel compilers version I.X whatever this X is and see 
if it has any issues.


OR 2) If the above does not work, I would build OpenMPI O.1 against 
Intel version I.X so he can use THIS combination for his hypothetical 
application.


He insists that I build OpenMPI O.1 with some version of Intel compilers 
I.Y but then at run time he would like to use *different* Intel run time 
libs at will I.Z <> I.X.


Can you provide me with a suggestion for a sane solution to this ? :-)

Best regards

Michael


Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Michael E. Thomadakis

 On 08/12/10 17:27, Ralph Castain wrote:
Ick - talk about confusing! I suppose there must be -some- rational 
reason why someone would want to do this, but I can't imagine what it 
would be


I'm no expert on compiler vs lib confusion, but some of my own 
experience would say that this is a bad idea regardless of whether or 
not OMPI is involved. Compiler version interoperability is usually 
questionable, depending upon how far apart the rev levels are.


Only answer I can offer is that you would have to try it. It will 
undoubtedly be a case-by-case basis: some combinations might work, 
others might fail.



On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote:



Hi Ralph, I believe the clean and rational solution when an MPI 
Application needs a specific combination of OMPI and Intel Compilers is 
to just build this OMPI against that compiler version statically or 
dynamically so the application can just use it. I feel that the OMPI 
libs + run-time is intimate part of the run-time of the application. 
What people think they can do is to build only ONCE against the same 
OMPI but freely swap in and out any Intel library run-time w/o worries 
and without REBUILDING the application. Nothing in life is free though.

Thanks for the reply 


Michael



Hello OpenMPI,

we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster 
using Intel compilers V 11.1.059 and 11.1.072 respectively, and one 
user has the following request:


Can we build OpenMPI version say O.1 against Intel compilers version 
say I.1 but  then built an application with OpenMPI O.1 BUT then use 
a DIFFERENT Intel compiler version say I.2 to built and run this MPI 
application?


I suggested to him to 1) simply try to built and run the application 
with O.1 but use Intel compilers version I.X whatever this X is and 
see if it has any issues.


OR 2) If the above does not work, I would build OpenMPI O.1 against 
Intel version I.X so he can use THIS combination for his hypothetical 
application.


He insists that I build OpenMPI O.1 with some version of Intel 
compilers I.Y but then at run time he would like to use *different* 
Intel run time libs at will I.Z <> I.X.


Can you provide me with a suggestion for a sane solution to this ? :-)

Best regards

Michael
___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Michael E. Thomadakis

 On 08/12/10 18:59, Tim Prince wrote:

On 8/12/2010 3:27 PM, Ralph Castain wrote:
Ick - talk about confusing! I suppose there must be -some- rational 
reason why someone would want to do this, but I can't imagine what it 
would be


I'm no expert on compiler vs lib confusion, but some of my own 
experience would say that this is a bad idea regardless of whether or 
not OMPI is involved. Compiler version interoperability is usually 
questionable, depending upon how far apart the rev levels are.


Only answer I can offer is that you would have to try it. It will 
undoubtedly be a case-by-case basis: some combinations might work, 
others might fail.



On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote:


Hello OpenMPI,

we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem 
cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, 
and one user has the following request:


Can we build OpenMPI version say O.1 against Intel compilers version 
say I.1 but  then built an application with OpenMPI O.1 BUT then use 
a DIFFERENT Intel compiler version say I.2 to built and run this MPI 
application?


I suggested to him to 1) simply try to built and run the application 
with O.1 but use Intel compilers version I.X whatever this X is and 
see if it has any issues.


OR 2) If the above does not work, I would build OpenMPI O.1 against 
Intel version I.X so he can use THIS combination for his 
hypothetical application.


He insists that I build OpenMPI O.1 with some version of Intel 
compilers I.Y but then at run time he would like to use *different* 
Intel run time libs at will I.Z <> I.X.


Can you provide me with a suggestion for a sane solution to this ? :-)

Best regards

Michael
Guessing at what is meant here, if you build MPI with a given version 
of Intel compilers, it ought to work when the application is built 
with a similar or more recent Intel compiler, or when the run-time 
LD_LIBRARY_PATH refers to a similar or newer library (within reason). 
There are similar constraints on glibc version.  "Within reason" works 
over a more restricted range when C++ is involved.  Note that the 
Intel linux compilers link to the gcc and glibc libraries as well as 
those which come with the compiler, and the MPI could be built with a 
combination of gcc and ifort to work with icc or gcc and ifort.  
gfortran and ifort libraries, however, are incompatible, except that 
libgomp calls can be supported by libiomp5.
The "rational" use I can see is that an application programmer would 
likely wish to test a range of compilers without rebuilding MPI.  
Intel documentation says there is forward compatibility testing of 
libraries, at least to the extent that a build made with 10.1 would 
work with 11.1 libraries.
The most recent Intel library compatibility break was between MKL 9 
and 10.




Dear Tim, I offered to provide myself the combination of OMPI+ Intel 
compilers so that application can use it in stable fashion. When I 
inquired about this application so I can look into this I was told that 
"there is NO application yet (!) that fails but just in case it fails 
..." I was asked to hack into the OMPI  building process to let OMPI use 
one run-time but then the MPI application using this OMPI ... use another!



Thanks for the information on this. We indeed use Intel Compiler set 
11.1.XXX + OMPI 1.4.1 and 1.4.2.


The basic motive in this hypothetical situation is to build the MPI 
application ONCE and then swap run-time libs as newer compilers come 
out I am certain that even if one can get away with it with nearby 
run-time versions there is no guarantee of the stability at-infinitum. I 
end up having to spent more time for technically "awkward" requests than 
the reasonable ones. Reminds me when I was a teacher I had to spent more 
time with all the people trying to avoid doing the work than with the 
good students... hmmm :-)



take care,,,
Mike



--
Tim Prince


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-13 Thread Michael E. Thomadakis

 On 08/12/10 21:53, Jed Brown wrote:


Or OMPI_CC=icc-xx.y mpicc ...




If we enable a different set of run time library paths for Intel 
compilers than those used to build OMPI when we compile and execute the 
MPI app these new run-time libs will be accessible to OMPI libs to run 
against instead of those used when OMPI was being built right? I would 
think that this may cause some problems if for some reason something in 
the modern run-time libfs differs from the ones used when OMPI was built ?


A user is hoping to avoid rebuilding his OMPI app but i guess just 
change LD_LIBRARY_PATH to the latest Intel compile run-time libs and 
just launch it with teh latest and greatest Intel Libs I mentioned 
to him that the right way is to build the combination of OMPI + Intel 
run-time that the application is known to work with (since some may 
fail) but he wants me to insert a fixed run-time lib path for OMPI libs 
but use different and variable one for the run-time libs of the OMPI 
application! It is frustrating with people who get "great ideas" but 
then they presss someone else to make them work instead of doing this 
themselves


anyway thanks

Michael


Jed

On Aug 12, 2010 5:18 PM, "Ralph Castain" <mailto:r...@open-mpi.org>> wrote:



On Aug 12, 2010, at 7:04 PM, Michael E. Thomadakis wrote:

> On 08/12/10 18:59, Tim Prince wrote:
>>...

The "easy" way to accomplish this would be to:

(a) build OMPI with whatever compiler you decide to use as a "baseline"

(b) do -not- use the wrapper compiler to build the application. 
Instead, do "mpicc --showme" (or whatever language equivalent you 
want) to get the compile line, substitute your "new" compiler library 
for the "old" one, and then execute the resulting command manually.


If you then set your LD_LIBRARY_PATH to the "new" libs, it might work 
- but no guarantees. Still, you could try it - and if it worked, you 
could always just explain that this is a case-by-case situation, and 
so it -could- break with other compiler combinations.


Critical note: the app developers would have to validate the code 
with every combination! Otherwise, correct execution will be a 
complete crap-shoot - just because the app doesn't abnormally 
terminate does -not- mean it generated a correct result!





> Thanks for the information on this. We indeed use Intel Compiler 
set 11.1.XXX + OMPI 1.4.1 and ...



___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Michael E. Thomadakis

 Hello OMPI:

We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. 
OMPI was built uisng Intel compilers 11.1.072. I am attaching the 
configuration log and output from ompi_info -a.


The problem we are encountering is that whenever we use option 
'-npernode N' in the mpirun command line we get a segmentation fault as 
in below:



miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  
--tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname


 Map generated by mapping policy: 0402
Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
Num new daemons: 2  New daemon starting vpid 1
Num nodes: 3

 Data for node: Name: login001  Launch id: -1   Arch: 0 State: 2
Num boards: 1   Num sockets/board: 2Num cores/socket: 4
Daemon: [[44812,0],1]   Daemon launched: False
Num slots: 1Slots in use: 2
Num slots allocated: 1  Max slots: 0
Username on node: NULL
Num procs: 1Next node_rank: 1
Data for proc: [[44812,1],0]
Pid: 0  Local rank: 0   Node rank: 0
State: 0App_context: 0  Slot list: NULL

 Data for node: Name: login002  Launch id: -1   Arch: ffc91200  
State: 2

Num boards: 1   Num sockets/board: 2Num cores/socket: 4
Daemon: [[44812,0],0]   Daemon launched: True
Num slots: 1Slots in use: 2
Num slots allocated: 1  Max slots: 0
Username on node: NULL
Num procs: 1Next node_rank: 1
Data for proc: [[44812,1],0]
Pid: 0  Local rank: 0   Node rank: 0
State: 0App_context: 0  Slot list: NULL

 Data for node: Name: login003  Launch id: -1   Arch: 0 State: 2
Num boards: 1   Num sockets/board: 2Num cores/socket: 4
Daemon: [[44812,0],2]   Daemon launched: False
Num slots: 1Slots in use: 2
Num slots allocated: 1  Max slots: 0
Username on node: NULL
Num procs: 1Next node_rank: 1
Data for proc: [[44812,1],0]
Pid: 0  Local rank: 0   Node rank: 0
State: 0App_context: 0  Slot list: NULL
[login002:02079] *** Process received signal ***
[login002:02079] Signal: Segmentation fault (11)
[login002:02079] Signal code: Address not mapped (1)
[login002:02079] Failing at address: 0x50
[login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
[login002:02079] [ 1] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) 
[0x2afa70d25de7]
[login002:02079] [ 2] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) 
[0x2afa70d36088]
[login002:02079] [ 3] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) 
[0x2afa70d37fc7]
[login002:02079] [ 4] 
/g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]

[login002:02079] [ 5] mpirun [0x404c27]
[login002:02079] [ 6] mpirun [0x403e38]
[login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3568e1d994]

[login002:02079] [ 8] mpirun [0x403d69]
[login002:02079] *** End of error message ***
Segmentation fault

We tried version 1.4.1 and this problem did not emerge.

This option is necessary for when our users launch hybrid MPI-OMP code 
were they can request M nodes and n ppn in a *PBS/Torque* setup so they 
can only get the right amount of MPI taks. Unfortunately, as soon as we 
use the 'npernode N' option mprun crashes.


Is this a known issue? I found related problem (of around May, 2010)  
when people were using the same option but in a SLURM environment.


regards

Michael



config.log.gz
Description: GNU Zip compressed data


ompi_info-a.out.gz
Description: GNU Zip compressed data


Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Michael E. Thomadakis

 Hi Jeff,
thanks for the quick reply.

Would using '--cpus-per-proc /N/' in place of '-npernode /N/' or just 
'-bynode' do the trick?


It seems that using '--loadbalance' also crashes mpirun.

best ...

Michael


On 08/23/10 19:30, Jeff Squyres wrote:

Yes, the -npernode segv is a known issue.

We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see 
if that fixes your problem?

 http://www.open-mpi.org/nightly/v1.4/



On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:


Hello OMPI:

We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI was 
built uisng Intel compilers 11.1.072. I am attaching the configuration log and 
output from ompi_info -a.

The problem we are encountering is that whenever we use option '-npernode N' in 
the mpirun command line we get a segmentation fault as in below:


miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  --tag-output 
-np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname

  Map generated by mapping policy: 0402
 Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
 Num new daemons: 2  New daemon starting vpid 1
 Num nodes: 3

  Data for node: Name: login001  Launch id: -1   Arch: 0 State: 2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],1]   Daemon launched: False
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL

  Data for node: Name: login002  Launch id: -1   Arch: ffc91200  State: 
2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],0]   Daemon launched: True
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL

  Data for node: Name: login003  Launch id: -1   Arch: 0 State: 2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],2]   Daemon launched: False
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL
[login002:02079] *** Process received signal ***
[login002:02079] Signal: Segmentation fault (11)
[login002:02079] Signal code: Address not mapped (1)
[login002:02079] Failing at address: 0x50
[login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
[login002:02079] [ 1] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
 [0x2afa70d25de7]
[login002:02079] [ 2] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
 [0x2afa70d36088]
[login002:02079] [ 3] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
 [0x2afa70d37fc7]
[login002:02079] [ 4] 
/g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
[login002:02079] [ 5] mpirun [0x404c27]
[login002:02079] [ 6] mpirun [0x403e38]
[login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3568e1d994]
[login002:02079] [ 8] mpirun [0x403d69]
[login002:02079] *** End of error message ***
Segmentation fault

We tried version 1.4.1 and this problem did not emerge.

This option is necessary for when our users launch hybrid MPI-OMP code were 
they can request M nodes and n ppn in a PBS/Torque setup so they can only get 
the right amount of MPI taks. Unfortunately, as soon as we use the 'npernode N' 
option mprun crashes.

Is this a known issue? I found related problem (of around May, 2010)  when 
people were using the same option but in a SLURM environment.

regards

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-24 Thread Michael E. Thomadakis

 Hi Ralph,

I tried to build 1.4.3.a1r23542 (08/02/2010) with

./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" 
--enable-cxx-exceptions  CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" 
FCFLAGS="-O2"

with the GCC 4.1.2

miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-libgcj-multifile 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi --enable-plugin 
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre 
--with-cpu=generic --host=x86_64-redhat-linux

Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)


but it failed. I am attaching the configure and make logs.

regards

Michael


On 08/23/10 20:53, Ralph Castain wrote:
Nope - none of them will work with 1.4.2. Sorry - bug not discovered 
until after release


On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:


Hi Jeff,
thanks for the quick reply.

Would using '--cpus-per-proc /N/' in place of '-npernode /N/' or just 
'-bynode' do the trick?


It seems that using '--loadbalance' also crashes mpirun.

best ...

Michael


On 08/23/10 19:30, Jeff Squyres wrote:

Yes, the -npernode segv is a known issue.

We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see 
if that fixes your problem?

 http://www.open-mpi.org/nightly/v1.4/



On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:


Hello OMPI:

We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI was 
built uisng Intel compilers 11.1.072. I am attaching the configuration log and 
output from ompi_info -a.

The problem we are encountering is that whenever we use option '-npernode N' in 
the mpirun command line we get a segmentation fault as in below:


miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  --tag-output 
-np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname

  Map generated by mapping policy: 0402
 Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
 Num new daemons: 2  New daemon starting vpid 1
 Num nodes: 3

  Data for node: Name: login001  Launch id: -1   Arch: 0 State: 2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],1]   Daemon launched: False
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL

  Data for node: Name: login002  Launch id: -1   Arch: ffc91200  State: 
2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],0]   Daemon launched: True
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL

  Data for node: Name: login003  Launch id: -1   Arch: 0 State: 2
 Num boards: 1   Num sockets/board: 2Num cores/socket: 4
 Daemon: [[44812,0],2]   Daemon launched: False
 Num slots: 1Slots in use: 2
 Num slots allocated: 1  Max slots: 0
 Username on node: NULL
 Num procs: 1Next node_rank: 1
 Data for proc: [[44812,1],0]
 Pid: 0  Local rank: 0   Node rank: 0
 State: 0App_context: 0  Slot list: NULL
[login002:02079] *** Process received signal ***
[login002:02079] Signal: Segmentation fault (11)
[login002:02079] Signal code: Address not mapped (1)
[login002:02079] Failing at address: 0x50
[login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
[login002:02079] [ 1] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
 [0x2afa70d25de7]
[login002:02079] [ 2] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
 [0x2afa70d36088]
[login002:02079] [ 3] 
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
 [0x2afa70d37fc7]
[login002:02079] [ 4] 
/g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
[login002:02079] [ 5] mpirun [0x404c27]
[login002:02079] [ 6] mpirun [0x403e38]
[login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3568e1d994]
[login002:02079] [ 8] mpirun [0x403d69]
[login002:02079] 

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-24 Thread Michael E. Thomadakis

 On 08/24/10 14:22, Michael E. Thomadakis wrote:

Hi,

I used a 'tee' command to capture the output but I forgot to also redirect
stderr to the file.

This is what a fresh make gave (gcc 4.1.2 again) :

--
ompi_debuggers.c:81: error: missing terminating " character
ompi_debuggers.c:81: error: expected expression before \u2018;\u2019 token
ompi_debuggers.c: In function \u2018ompi_wait_for_debugger\u2019:
ompi_debuggers.c:212: error: \u2018mpidbg_dll_locations\u2019 undeclared
(first use in this function)
ompi_debuggers.c:212: error: (Each undeclared identifier is reported only once
ompi_debuggers.c:212: error: for each function it appears in.)
ompi_debuggers.c:212: warning: passing argument 3 of \u2018check\u2019 from
incompatible pointer type
make[2]: *** [libdebuggers_la-ompi_debuggers.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

--

Is this critical to run OMPI code?

Thanks for the quick reply Ralph,

Michael

On Tue, 24 Aug 2010, Ralph Castain wrote:

| Date: Tue, 24 Aug 2010 13:16:10 -0600
| From: Ralph Castain
| To: Michael E.Thomadakis
| Cc: Open MPI Users, mi...@sc.tamu.edu
| Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when
| "-npernode N" is used at command line
|
| Ummmthe configure log terminates normally, indicating it configured fine. 
The make log ends, but with no error shown - everything was building just fine.
|
| Did you maybe stop it before it was complete? Run out of disk quota? Or...?
|
|
| On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote:
|
|>  Hi Ralph,
|>
|>  I tried to build 1.4.3.a1r23542 (08/02/2010) with
|>
|>  ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" --enable-cxx-exceptions  
CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" FCFLAGS="-O2"
|>  with the GCC 4.1.2
|>
|>  miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
|>  Using built-in specs.
|>  Target: x86_64-redhat-linux
|>  Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-libgcj-multifile 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
--disable-dssi --enable-plugin 
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic 
--host=x86_64-redhat-linux
|>  Thread model: posix
|>  gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
|>
|>
|>  but it failed. I am attaching the configure and make logs.
|>
|>  regards
|>
|>  Michael
|>
|>
|>  On 08/23/10 20:53, Ralph Castain wrote:
|>>
|>>  Nope - none of them will work with 1.4.2. Sorry - bug not discovered until 
after release
|>>
|>>  On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
|>>
|>>>  Hi Jeff,
|>>>  thanks for the quick reply.
|>>>
|>>>  Would using '--cpus-per-proc N' in place of '-npernode N' or just 
'-bynode' do the trick?
|>>>
|>>>  It seems that using '--loadbalance' also crashes mpirun.
|>>>
|>>>  best ...
|>>>
|>>>  Michael
|>>>
|>>>
|>>>  On 08/23/10 19:30, Jeff Squyres wrote:
|>>>>
|>>>>  Yes, the -npernode segv is a known issue.
|>>>>
|>>>>  We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl 
and see if that fixes your problem?
|>>>>
|>>>>  http://www.open-mpi.org/nightly/v1.4/
|>>>>
|>>>>
|>>>>
|>>>>  On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:
|>>>>
|>>>>>  Hello OMPI:
|>>>>>
|>>>>>  We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. 
OMPI was built uisng Intel compilers 11.1.072. I am attaching the configuration log and output 
from ompi_info -a.
|>>>>>
|>>>>>  The problem we are encountering is that whenever we use option 
'-npernode N' in the mpirun command line we get a segmentation fault as in below:
|>>>>>
|>>>>>
|>>>>>  miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  
--tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname
|>>>>>
|>>>>>   Map generated by mapping policy: 0402
|>>>>>  Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
|>>>>>  Num new daemons: 2  New daemon starting vpid 1
|>&

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-24 Thread Michael E. Thomadakis

 Hi Jeff

On 08/24/10 15:24, Jeff Squyres wrote:

I'm a little confused by your configure line:

./configure --prefix=/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2 
--enable-cxx-exceptions CFLAGS=-O2 CXXFLAGS=-O2 FFLAGS=-O2 FCFLAGS=-O2



"oppss" that '2' was some leftover character after I edited the command 
line to configure wrt to GCC (from an original command line configuring 
with Intel compilers) *thanks for noticing this.*


I rerun the configure with

./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2" 
--enable-cxx-exceptions  CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" 
FCFLAGS="-O2"


and run make and I did NOT this time notice any error messages.

*Thanks* for the help with this. I will run now mpirun with various 
options in a PBS/Torque environmnet and see if hybrid MPI+OMP jobs are 
placed on the nodes in a sane fashion


Thanks

Michael




What's the lone "2" in the middle (after the prefix)?

With that extra "2", I'm not able to get configure to complete successfully (because it interprets 
that "2" as a platform name that does not exist).  If I remove that "2", configure 
completes properly and the build completes properly.

I'm afraid I no longer have any RH hosts to test on.  Can you do the following:

cd top_of_build_dir
cd ompi/debuggers
rm ompi_debuggers.lo
make

Then copy-n-paste the gcc command used to compile the ompi_debuggers.o file, remove "-o 
.libs/libdebuggers_la-ompi_debuggers.o", and add "-E", and redirect the output to a 
file.  Then send me that file -- it should give more of a clue as to exactly what the problem is 
that you're seeing.




On Aug 24, 2010, at 3:25 PM, Michael E. Thomadakis wrote:


On 08/24/10 14:22, Michael E. Thomadakis wrote:

Hi,

I used a 'tee' command to capture the output but I forgot to also redirect
stderr to the file.

This is what a fresh make gave (gcc 4.1.2 again) :

--
ompi_debuggers.c:81: error: missing terminating " character
ompi_debuggers.c:81: error: expected expression before \u2018;\u2019 token
ompi_debuggers.c: In function \u2018ompi_wait_for_debugger\u2019:
ompi_debuggers.c:212: error: \u2018mpidbg_dll_locations\u2019 undeclared
(first use in this function)
ompi_debuggers.c:212: error: (Each undeclared identifier is reported only once
ompi_debuggers.c:212: error: for each function it appears in.)
ompi_debuggers.c:212: warning: passing argument 3 of \u2018check\u2019 from
incompatible pointer type
make[2]: *** [libdebuggers_la-ompi_debuggers.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

--

Is this critical to run OMPI code?

Thanks for the quick reply Ralph,

Michael

On Tue, 24 Aug 2010, Ralph Castain wrote:

| Date: Tue, 24 Aug 2010 13:16:10 -0600
| From: Ralph Castain
| To: Michael E.Thomadakis
| Cc: Open MPI Users, mi...@sc.tamu.edu
| Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when
| "-npernode N" is used at command line
|
| Ummmthe configure log terminates normally, indicating it configured fine. 
The make log ends, but with no error shown - everything was building just fine.
|
| Did you maybe stop it before it was complete? Run out of disk quota? Or...?
|
|
| On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote:
|
|>   Hi Ralph,
|>
|>   I tried to build 1.4.3.a1r23542 (08/02/2010) with
|>
|>   ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" --enable-cxx-exceptions  
CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" FCFLAGS="-O2"
|>   with the GCC 4.1.2
|>
|>   miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
|>   Using built-in specs.
|>   Target: x86_64-redhat-linux
|>   Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-libgcj-multifile 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
--disable-dssi --enable-plugin 
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic 
--host=x86_64-redhat-linux
|>   Thread model: posix
|>   gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
|>
|>
|>   but it failed. I am attaching the configure and make logs.
|>
|>   regards
|>
|>   Michael
|>
|>
|>   On 08/23/10 20:53, Ralph Castain wrote:
|>>
|>>   Nope - none of them will work with 1.4.2. Sorry - bug not discovered 
until after release
|>>
|>>   On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
|>>
|>>>   Hi Jeff,
|>>> 

Re: [OMPI users] How to time data transfers?

2010-10-13 Thread Michael E. Thomadakis

 On 10/13/10 13:23, Eugene Loh wrote:

Ed Peddycoart wrote:


I need to do some performance tests on my mpi app.  I simply want to 
determine how long it takes for my sends from one process to be 
received by another process.


That should work once the code is corrected.  Can you use 
MPI_Wtime()?  (Not necessarily a big deal, but should be a portable 
way of getting high-quality timings in MPI programs.)  In what sense 
does it not capture the complete time?



[ .. ]

Does MPI_Wtime of OMPI 1.4.3/1.5.0 rely on high resolution clocks  (for 
Linux) or does still rely on gettimeofday() ? How would one request at 
OMPI built time to let it use high resolution clocks?



thanks ...

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How to time data transfers?

2010-10-14 Thread Michael E. Thomadakis

 On 10/14/10 07:37, Jeff Squyres wrote:

On Oct 13, 2010, at 4:52 PM, Michael E. Thomadakis wrote:


Does MPI_Wtime of OMPI 1.4.3/1.5.0 rely on high resolution clocks  (for Linux) 
or does still rely on gettimeofday() ? How would one request at OMPI built time 
to let it use high resolution clocks?

Check the man page for MPI_Wtime(3):

On  POSIX  platforms, this function may utilize a timer that is cheaper
to invoke than the gettimeofday() system call, but will  fall  back  to
gettimeofday()  if a cheap high-resolution timer is not available.  The
ompi_info command can be consulted to see if Open MPI supports a native
high-resolution  timer  on  your platform; see the value for "MPI_WTIME
support" (or "options:mpi-wtime" when viewing the parsable output).  If
this value is "native", a method that is likely to be cheaper than get-
timeofday() will be used to obtain the time when MPI_Wtime is  invoked.

IIRC, the problem on Linux is that the native x86 timers are a per-chip value 
(e.g., on a multi-socket system, the value is different on each socket).  
Hence, if you MPI_Wtime() while your MPI process is on socket A, but then the 
OS moves it to socket B and you call MPI_Wtime() again, the two values are 
completely unrelated.

That being said, I see clock_gettime() has CLOCK_REALTIME, which is supposed to 
be system-wide.  Is it cheaper and/or more accurate than gettimeofday?  I see 
that it has nanosecond precision in the man page, but what is it (typically) 
actually implemented as?

In the POSIX compliant UNIX world clock_gettime() always had s 2-3 
orders of magnitude better sampling resolution vs gettimeofday(). I know 
that OMPI 1.4.2 is relying on gettimeofday() and I was wondering if this 
changed with 1.4.3 and 1.5.0.


I just read on some Internet postings that stating with Linux kernel 
2.6.18, gettimeofday()  uses the

clock_gettime(CLOCK_REALTIME)
as source so things may not be as "ugly" any more when one samples the 
gettimofday();



Additionally, I suppose that if a process is bound to a single native clock scope (I 
don't know offhand if it's per socket or per core -- I said "socket" up above, 
but that was a guess), we could make the *assumption* that the process will never move 
and might be able to use the native x86 timer (there's some complications here, but we 
might be able to figure it out).  Bad Things could happen if the process ever moved, 
though (e.g., if the application ever manually changed the binding).

This APIs were put together by the Real-Time work group of the POSIX 
community to address exactly these issues of non-global clocks or of 
clocks which back-step in time when NTP for instance adjusts the clock 
(with CLOCK_MONOTONIC). So the question now is if OMPI relies on 
gettimeofday() and if indeed Linux 2.6.18 uses as its time source the 
clock_gettime(CLOCK_REALTIME) and if Linux complies with the glbal per 
system CLOCK_REALTIME requirements. Another problem is when power 
management adjusts the frequency of the CPU clock


anyways thanks for the answer

Michael








Re: [OMPI users] openmpi tar.gz for 1.6.1 or 1.6.2

2012-07-16 Thread Michael E. Thomadakis
When is the expected date for the official 1.6.1 (or 1.6.2 ?) to be 
available ?


mike

On 07/16/2012 01:44 PM, Ralph Castain wrote:

You can get it here:

http://www.open-mpi.org/nightly/v1.6/

On Jul 16, 2012, at 10:22 AM, Anne M. Hammond wrote:


Hi,

For benchmarking, we would like to use openmpi with
--num-sockets 1

This fails in 1.6, but Bug Report #3119 indicates it is changed in
1.6.1.

Is 1.6.1 or 1.6.2 available in tar.gz form?

Thanks!
Anne



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-19 Thread Michael E. Thomadakis

Hello,

I would like to build OMPI V1.4.2 and make it available to our users at the
Supercomputing Center at TAMU. Our system is a 2-socket, 4-core Nehalem
@2.8GHz, 24GiB DRAM / node, 324 nodes connected to 4xQDR Voltaire fabric,
CentOS/RHEL 5.4.



I have been trying to find the following information :

1) high-resolution timers: how do I specify the HRT linux timers in the
--with-timer=TYPE
 line of ./configure ?

2) I have installed blcr V0.8.2 but when I try to built OMPI and I point to the
full installation it complains it cannot find it. Note that I build BLCR with
GCC but I am building OMPI with Intel compilers (V11.1)


3) Does OMPI by default use SHM for intra-node message IPC but revert to IB for
inter-node ?

4) How could I select the high-speed transport, say DAPL or OFED IB verbs ? Is
there any preference as to the specific high-speed transport over QDR IB?

5) When we launch MPI jobs via PBS/TORQUE do we have control on the task and
thread placement on nodes/cores ?

6) Can we suspend/restart cleanly OMPI jobs with the above scheduler ? Any
caveats on suspension / resumption of OMPI jobs ?

7) Do you have any performance data comparing OMPI vs say MVAPICVHv2 and
IntelMPI ? This is not a political issue since I am groing to be providing all
these MPI stacks to our users.




Thank you so much for the great s/w ...

best
Michael



%  \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu   Texas A&M University \
% web:http://alphamike.tamu.edu  Supercomputing Center \
% Voice:  979-862-3931Teague Research Center, 104B \
% FAX:979-847-8643  College Station, TX 77843, USA \
%  \



Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-21 Thread Michael E. Thomadakis

Hello,

I am resending this because I am not sure if it was sent out to the OMPI 
list.


Any help would be greatly appreciated.

best 

Michael

On 05/19/10 13:19, Michael E. Thomadakis wrote:

Hello,
I would like to build OMPI V1.4.2 and make it available to our users at the
Supercomputing Center at Texas A&M Univ. Our system is a 2-socket, 4-core 
Nehalem
@2.8GHz, 24GiB DRAM / node, 324 nodes connected to 4xQDR Voltaire fabric,
CentOS/RHEL 5.4.



I have been trying to find the following information :

1) high-resolution timers: how do I specify the HRT linux timers in the
--with-timer=TYPE
  line of ./configure ?

2) I have installed blcr V0.8.2 but when I try to built OMPI and I point to the
full installation it complains it cannot find it. Note that I build BLCR with
GCC but I am building OMPI with Intel compilers (V11.1)


3) Does OMPI by default use SHM for intra-node message IPC but reverts to IB for
inter-node ?

4) How could I select the high-speed transport, say DAPL or OFED IB verbs ? Is
there any preference as to the specific high-speed transport over 
Mellanox/Voltaire QDR IB?

5) When we launch MPI jobs via PBS/TORQUE do we have control on the task and
thread placement on nodes/cores ?

6) Can we suspend/restart cleanly OMPI jobs with the above scheduler ? Any
caveats on suspension / resumption of OMPI jobs ?

7) Do you have any performance data comparing OMPI vs say MVAPICHv2 and
IntelMPI ? This is not a political issue since I am groing to be providing all
these MPI stacks to our users (IntelMPI V4.0 already installed).




Thank you so much for the great s/w ...

best
Michael



%  \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu   Texas A&M University \
% web:http://alphamike.tamu.edu   Supercomputing Center \
% Voice:  979-862-3931Teague Research Center, 104B \
% FAX:979-847-8643  College Station, TX 77843, USA \
%  \
   


Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-26 Thread Michael E. Thomadakis

Hi Josh

thanks for the reply. pls see below ...


On 05/26/10 09:24, Josh Hursey wrote:

(Sorry for the delay, I missed the C/R question in the mail)

On May 25, 2010, at 9:35 AM, Jeff Squyres wrote:


On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote:

| > 2) I have installed blcr V0.8.2 but when I try to built OMPI and 
I point to the
| > full installation it complains it cannot find it. Note that I 
build BLCR with

| > GCC but I am building OMPI with Intel compilers (V11.1)
|
| Can you be more specific here?

I pointed to the insatllation path for BLCR but config complained 
that it
couldn't find it. If BLCR is only needed for checkpoint / restart 
then we can

leave without it. Is BLCR needed for suspend/resume of mpi jobs ?


You mean suspend with ctrl-Z?  If so, correct -- BLCR is *only* used 
for checkpoint/restart.  Ctrl-Z just uses the SIGSTP functionality.


So BLCR is used for the checkpoint/restart functionality in Open MPI. 
We have a webpage with some more details and examples at the link below:

  http://osl.iu.edu/research/ft/ompi-cr/

You should be able to suspend/resume an Open MPI job using 
SIGSTOP/SIGCONT without the C/R functionality. We have FAQ item that 
talks about how to enable this functionality:

  http://www.open-mpi.org/faq/?category=running#suspend-resume

You can combine the C/R and the SIGSTOP/SIGCONT functionality so that 
when you 'suspend' a job a checkpoint is taken and the process is 
stopped. You can continue the job by sending SIGCONT as normal. 
Additionally, this way if the job needs to be terminated for some 
reason (e.g., memory footprint, maintenance), it can be safely 
terminated and restarted from the checkpoint. I have a example of how 
this works at the link below:

  http://osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-stop

As far as C/R integration with schedulers/resource managers, I know 
that the BLCR folks have been working with Torque to better integrate 
Open MPI+BLCR+Torque. If this is of interest, you might want to check 
with them on the progress of that project.


So suspend/resume of OpenMPI jobs does not require BLCR. OK so I will 
proceed w/o it.


best regards,

Michael



-- Josh

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
% -------- \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu   Texas A&M University \
% web:http://alphamike.tamu.edu  Supercomputing Center \
% Voice:  979-862-3931Teague Research Center, 104B \
% FAX:979-847-8643  College Station, TX 77843, USA \
%  \



Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-26 Thread Michael E. Thomadakis

Hi jeff,

thanks for the reply. Pls see below .

And a new question:

How do you handle thread/task and memory affinity? Do you pass the 
requested affinity desires to the batch scheduler and them let it issue 
the specific placements for threads to the nodes ?


This is something we are concerned as we are running multiple jobs on 
same node and we don't want to oversubscribe cores by binding there 
threads inadvertandly.


Looking at ompi_info
 $ ompi_info | grep -i aff
   MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)

does this mean we have the full affinity support included or do I need 
to involve HWLOC in any way ?




On 05/25/10 08:35, Jeff Squyres wrote:

On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote:

   

|>  1) high-resolution timers: how do I specify the HRT linux timers in the
|>  --with-timer=TYPE
|>   line of ./configure ?
|
| You shouldn't need to do anything; the "linux" timer component of Open MPI
| should get automatically selected.  You should be able to see this in the
| stdout of Open MPI's "configure", and/or if you run ompi_info | grep timer
| -- there should only be one entry: linux.

If nothing is menioned, will it by default select 'linux' timers?
 

Yes.

   

Or I have to specify in th configure

 --with-timer=linux ?
 

Nope.  The philosophy of Open MPI is that whenever possible, we try to choose a 
sensible default.  It never hurts to double check, but we try to do the Right 
Thing whenever it's possible to automatically choose it (within reason, of 
course).

You can also check the output of ompi_info -- ompi_info tells you lots of 
things about your Open MPI installation.

   

I actually spent some time looking around in the source trying to see which
actual timer is the base. Is this a high-resolution timer such as a POSIX
timers (timer_gettime or clock_nanosleep, etc.) or Intel processor's TSC ?

I am just trying to stay away from gettimeofday()
 

Understood.

Ugh; I just poked into the code -- it's complicated how we resolve the timer 
functions.  It looks like we put in the infrastructure into getting high 
resolution timers, but at least for Linux, we don't use it (the code falls back 
to gettimeofday).  It looks like we're only using the high-resolution timers on 
AIX (!) and Solaris.

Patches would be greatly appreciated; I'd be happy to walk someone through what 
to do.

   


Which HRtimer is recommended for a Linux environment ? timer_gettime 
usually gives decent resolution and it is portable. I don't want to 
promise anything as I am already bogged down with several ongoing 
projects. You can give me *brief*  instructions to see if this can be 
squeezed in.

...


Justr as a feedback from one of the many HPC centers, for us it is most
important to have

a) a light-weight efficient MPI stack which makes the underlying IB h/w
capabilities available and

b) it can smoothly cooperate withe a batch scheduler / resource manager so
that a mixture of jobs get a decent allocation of the cluster resources.
 

Cools; good to know.  We try to make these things very workable in Open MPI -- 
it's been a goal from day 1 to integrate with job schedulers, etc.  And without 
high performance, we wouldn't have much to talk about.

Please be sure to let us know of questions / problems / etc.  I admit that 
we're sometimes a little slow to answer on the users list, but we do the best 
we can.  So don't hesitate to bump us if we don't reply.

Thanks!

   


Thanks again...
michael


--
%  \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu   Texas A&M University \
% web:http://alphamike.tamu.edu  Supercomputing Center \
% Voice:  979-862-3931Teague Research Center, 104B \
% FAX:979-847-8643  College Station, TX 77843, USA \
%  \