[OMPI users] Question on using Github to see bugs fixed in past versions

2016-10-04 Thread Blosch, Edwin L
Apologies for the dumb question... There used to be a way to dive in to see 
exactly what bugs and features came into 1.10.4, 1.10.3, and on back to 1.8.8.  
Is there a way to do that on github?

Ed

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Question on OpenMPI backwards compatibility

2016-02-26 Thread Blosch, Edwin L
I am confused about backwards-compatibility.

FAQ #111 says:
Open MPI reserves the right to break ABI compatibility at new feature release 
series. . MPI applications compiled/linked against Open MPI 1.6.x will not 
be ABI compatible with Open MPI 1.7.x

But the versioning documentation says:
  * Minor: The minor number is the second integer in the version string.    
Backwards compatibility will still be preserved with prior releases that have 
the same major version number (e.g., v2.5.3 is backwards compatible with 
v2.3.1). 

These two examples and statements appear inconsistent to me:

Can I use OpenMPI 1.7.x run-time and options to execute codes built with 
OpenMPI 1.6.x?   No (FAQ #111)

Can I use OpenMPI 2.5.x run-time and options to execute codes built with 
OpenMPI 2.3.x?   Yes (s/w versioning documentation)

Can I use OpenMPI 1.8.x run-time and options to execute codes built with 
OpenMPI 1.6.x?   Who knows?!  I tested this once, and it failed.  I made the 
assumption that 1.8.x wouldn't run a 1.6.x code, and I moved on.  But I realize 
now that I could have made a mistake.  The test I performed could have failed 
for some other reason. 

Can anyone shed some light?






Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-08-11 Thread Blosch, Edwin L
I had another observation of the problem, with a little more insight.  I can 
confirm that the job has been running several hours before dying with the 'ORTE 
was unable to reliably start' message.  Somehow it is possible.   I had used 
the following options to try and get some more diagnostics:   --output-filename 
mpirun-stdio -mca btl ^tcp --mca plm_base_verbose 10 --mca btl_base_verbose 30

In the stack traces of each process, I saw roughly half of them reported dying 
at an MPI_BARRIER() call.  The rest had progressed further, and they were at an 
MPI_WAITALL command.  It is implemented like this:  Every process posts 
non-blocking receives (IRECV), hits an MPI_BARRIER, then everybody posts 
non-blocking sends (ISEND), then MPI_WAITALL.  This entire exchange process 
happens twice in a row, sending different sets of variables.  The application 
type is unstructured CFD, so any given process is talking to 10 to 15 other 
processes exchanging data across domain boundaries.  There are a range of 
message sizes flying around, some as small as 500 bytes, others as large as 1 
MB.  I'm using 480 processes.  

I'm wondering if I'm kicking off too many of these non-blocking messages and 
some network resource is getting exhausted, and perhaps orted is doing some 
kind of 'ping' to make sure everyone is still alive, and it can't reach some 
process, and so the error suggests a startup problem.  Wild guesses, no idea 
really.

For what it's worth, the barrier wasn't in an earlier implementation of this 
routine.  I was seeing some jobs dying suddenly with MxM library errors, and I 
put this barrier in place, and those problems seemed to go away.  So it just 
got committed and forgotten a couple years ago.  I thought (still think) the 
code is correct without the barrier.  

Also, I am running under MVAPICH at the moment and not having the same problems 
yet.

Finally, using the same exact model and application, I had a failure that left 
a different message:
--
ORTE has lost communication with its daemon located on node:

  hostname:  k2n01

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.

--



-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, July 29, 2016 7:38 PM
To: Open MPI Users 
Subject: Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was 
unable to reliably start"

Really scratching my head over this one. The app won’t start running until 
after all the daemons have been launched, so this doesn’t seem possible at 
first glance. I’m wondering if something else is going on that might lead to a 
similar error? Does the application call comm_spawn, for example? Or is it a 
script that eventually attempts to launch another job?


> On Jul 28, 2016, at 6:24 PM, Blosch, Edwin L  wrote:
> 
> Cray CS400, RedHat 6.5, PBS Pro (but OpenMPI is built --without-tm), 
> OpenMPI 1.8.8, ssh
> 
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
> Ralph Castain
> Sent: Thursday, July 28, 2016 4:07 PM
> To: Open MPI Users 
> Subject: EXTERNAL: Re: [OMPI users] Question on run-time error "ORTE was 
> unable to reliably start"
> 
> What kind of system was this on? ssh, slurm, ...?
> 
> 
>> On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L  wrote:
>> 
>> I am running cases that are starting just fine and running for a few hours, 
>> then they die with a message that seems like a startup type of failure.  
>> Message shown below.  The message appears in standard output from rank 0 
>> process.  I'm assuming there is a failing card or port or something.
>> 
>> What diagnostic flags can I add to mpirun to help shed light on the problem?
>> 
>> What kinds of problems could cause this kind of message, which looks 
>> start-up related, after the job has already been running many hours?
>> 
>> Ed
>> 
>> -
>> -
>>  ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>> 
>> * not finding the required libraries and/or binaries on  one or more 
>> nodes. Please check your PATH and LD_LIBRARY_PATH  settings, or 
>> configure OMPI with --enable-orterun-prefix-by-default
>> 
>> * lack of authority to execute on one or more specified nodes.
>> Please verify your allocation and authorities.
>> 
>> * the inability to write star

[OMPI users] How to diagnose bus error with 1.6.4

2013-06-05 Thread Blosch, Edwin L
I am running into a bus error that does not happen with MVAPICH, and I am 
guessing it has something to do with shared-memory communication.  Has anyone 
had a similar experience or have any insights on what this could be?

Thanks

[k1n08:12688] mca: base: components_open: Looking for shmem components
[k1n08:12688] mca: base: components_open: opening shmem components
[k1n08:12688] mca: base: components_open: found loaded component mmap
[k1n08:12688] mca: base: components_open: component mmap register function 
successful
[k1n08:12688] mca: base: components_open: component mmap open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component posix
[k1n08:12688] mca: base: components_open: component posix has no register 
function
[k1n08:12688] mca: base: components_open: component posix open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component sysv
[k1n08:12688] mca: base: components_open: component sysv has no register 
function
[k1n08:12688] mca: base: components_open: component sysv open function 
successful
[k1n08:12688] shmem: base: runtime_query: Auto-selecting shmem components
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[mmap]
[k1n08:12688] shmem: base: runtime_query: (shmem) Query of component [mmap] set 
priority to 50
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[posix]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [posix]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[sysv]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [sysv]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Selected component [mmap]
[k1n08:12688] mca: base: close: unloading component posix
[k1n08:12688] mca: base: close: unloading component sysv
[k1n08:12688] *** Process received signal ***
[k1n08:12688] Signal: Bus error (7)
[k1n08:12688] Signal code: Non-existant physical address (2)
[k1n08:12688] Failing at address: 0x2ac1e088e030
[k1n08:12688] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2ac1de7c0500]
[k1n08:12688] [ 1] 
/applocal/cfd/test/bin/test_openmpi(__intel_ssse3_rep_memcpy+0xcdb) [0x1495cab]
[k1n08:12688] [ 2] 
/applocal/cfd/test/bin/test_openmpi(opal_convertor_pack+0x101) [0x125c111]
[k1n08:12688] [ 3] 
/applocal/cfd/test/bin/test_openmpi(mca_btl_sm_prepare_src+0xc5) [0x13aab25]
[k1n08:12688] [ 4] 
/applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_send_request_start_rndv+0x67) 
[0x12fa9a7]
[k1n08:12688] [ 5] /applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_isend+0x3ab) 
[0x12ef02b]
[k1n08:12688] [ 6] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_sendrecv_actual+0x94) 
[0x12d67f4]
[k1n08:12688] [ 7] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_split_bintree+0x94d)
 [0x12d45fd]
[k1n08:12688] [ 8] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_dec_fixed+0x143)
 [0x12d5dd3]
[k1n08:12688] [ 9] 
/applocal/cfd/test/bin/test_openmpi(mca_coll_sync_bcast+0x66) [0x12d6aa6]
[k1n08:12688] [10] /applocal/cfd/test/bin/test_openmpi(MPI_Bcast+0x5a) 
[0x11f95da]
[k1n08:12688] [11] /applocal/cfd/test/bin/test_openmpi(mpi_bcast_f+0x6e) 
[0x11dca5e]
[k1n08:12688] [12] 
/applocal/cfd/test/bin/test_openmpi(wpf_calc_mod_mp_wpf_calc_+0x10f0) [0x541be0]
[k1n08:12688] [13] 
/applocal/cfd/test/bin/test_openmpi(special_init_mod_mp_special_init_geom_+0x3f4)
 [0x683254]
[k1n08:12688] [14] 
/applocal/cfd/test/bin/test_openmpi(setup_mod_mp_setup_domains_+0x56b) 
[0x53effb]
[k1n08:12688] [15] /applocal/cfd/test/bin/test_openmpi(MAIN__+0x1ab7) [0x5e8be7]
[k1n08:12688] [16] /applocal/cfd/test/bin/test_openmpi(main+0x3c) [0x4ff82c]
[k1n08:12688] [17] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ac1de9eccdd]
[k1n08:12688] [18] /applocal/cfd/test/bin/test_openmpi() [0x4ff729]
[k1n08:12688] *** End of error message ***


Re: [OMPI users] How to diagnose bus error with 1.6.4

2013-06-05 Thread Blosch, Edwin L
I've dug a little deeper and thing the problem has something to do with 10MB 
sized /tmp filesystem.

[bloscel@k1n11 ~]$ df -h
FilesystemSize  Used Avail Use% Mounted on
compute_x86_64 32G  1.1G   31G   4% /
tmpfs  32G 0   32G   0% /dev/shm
tmpfs  10M   80K   10M   1% /tmp
tmpfs  10M 0   10M   0% /var/tmp
/dev/lb53T  109G   53T   1% /gpfs/lb
/dev/sb   3.3T   38G  3.3T   2% /gpfs/sb

[bloscel@k1n11 ~]$ mktemp
/tmp/tmp.L8owhNH1AN

[bloscel@k1n11 ~]$ ompi_info -a | grep /dev/shm
   MCA shmem: parameter "shmem_mmap_backing_file_base_dir" (current 
value: , data source: default value)

[bloscel@k1n11 ~]$ ompi_info -a | grep orte_tmpdir_base
MCA orte: parameter "orte_tmpdir_base" (current value: , 
data source: default value)
[bloscel@k1n11 ~]$

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Wednesday, June 05, 2013 11:14 AM
To: Open MPI Users (us...@open-mpi.org)
Subject: EXTERNAL: [OMPI users] How to diagnose bus error with 1.6.4

I am running into a bus error that does not happen with MVAPICH, and I am 
guessing it has something to do with shared-memory communication.  Has anyone 
had a similar experience or have any insights on what this could be?

Thanks

[k1n08:12688] mca: base: components_open: Looking for shmem components
[k1n08:12688] mca: base: components_open: opening shmem components
[k1n08:12688] mca: base: components_open: found loaded component mmap
[k1n08:12688] mca: base: components_open: component mmap register function 
successful
[k1n08:12688] mca: base: components_open: component mmap open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component posix
[k1n08:12688] mca: base: components_open: component posix has no register 
function
[k1n08:12688] mca: base: components_open: component posix open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component sysv
[k1n08:12688] mca: base: components_open: component sysv has no register 
function
[k1n08:12688] mca: base: components_open: component sysv open function 
successful
[k1n08:12688] shmem: base: runtime_query: Auto-selecting shmem components
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[mmap]
[k1n08:12688] shmem: base: runtime_query: (shmem) Query of component [mmap] set 
priority to 50
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[posix]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [posix]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[sysv]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [sysv]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Selected component [mmap]
[k1n08:12688] mca: base: close: unloading component posix
[k1n08:12688] mca: base: close: unloading component sysv
[k1n08:12688] *** Process received signal ***
[k1n08:12688] Signal: Bus error (7)
[k1n08:12688] Signal code: Non-existant physical address (2)
[k1n08:12688] Failing at address: 0x2ac1e088e030
[k1n08:12688] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2ac1de7c0500]
[k1n08:12688] [ 1] 
/applocal/cfd/test/bin/test_openmpi(__intel_ssse3_rep_memcpy+0xcdb) [0x1495cab]
[k1n08:12688] [ 2] 
/applocal/cfd/test/bin/test_openmpi(opal_convertor_pack+0x101) [0x125c111]
[k1n08:12688] [ 3] 
/applocal/cfd/test/bin/test_openmpi(mca_btl_sm_prepare_src+0xc5) [0x13aab25]
[k1n08:12688] [ 4] 
/applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_send_request_start_rndv+0x67) 
[0x12fa9a7]
[k1n08:12688] [ 5] /applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_isend+0x3ab) 
[0x12ef02b]
[k1n08:12688] [ 6] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_sendrecv_actual+0x94) 
[0x12d67f4]
[k1n08:12688] [ 7] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_split_bintree+0x94d)
 [0x12d45fd]
[k1n08:12688] [ 8] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_dec_fixed+0x143)
 [0x12d5dd3]
[k1n08:12688] [ 9] 
/applocal/cfd/test/bin/test_openmpi(mca_coll_sync_bcast+0x66) [0x12d6aa6]
[k1n08:12688] [10] /applocal/cfd/test/bin/test_openmpi(MPI_Bcast+0x5a) 
[0x11f95da]
[k1n08:12688] [11] /applocal/cfd/test/bin/test_openmpi(mpi_bcast_f+0x6e) 
[0x11dca5e]
[k1n08:12688] [12] 
/applocal/cfd/test/bin/test_openmpi(wpf_calc_mod_mp_wpf_calc_+0x10f0) [0x541be0]
[k1n08:12688] [13] 
/applocal/cfd/test/bin/test_openmpi(special_init_mod_mp_special_init_geom_+0x3f4)
 [0x683254]
[k1n08:12688] [14] 
/applocal/cfd/test/bin/test_openmpi(setup_mod_mp_setup_domains_+0x56b) 
[0x53effb]
[k1n08:12688] [15] /applocal/cfd/test/bin/test_openmpi(MAIN__+0x1ab7) [0x5e8be7]
[k1n08:12688] [16] /applocal/cfd/test/bin/test_openmpi(main+0x3c) [0x4ff82c]
[k1n08:12688] [17] /lib64/libc.so.6(__libc_start_m

[OMPI users] Sandy Bridge performance question

2013-06-06 Thread Blosch, Edwin L
I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling.

I'm using -bind-to-core without any other options (default is -bycore I 
believe).

These numbers indicate number of cores first, then the second digit is the run 
number (except for n=1, all runs repeated 3 times).  Any thought why n15 should 
be so much slower than n16?   I also measure the RSS of the running processes, 
and the rank 0 process for n=15 cases uses about 2x more memory than all the 
other ranks, whereas all the ranks use the same amount of memory for the n=16 
cases.

Thanks for insights,

Ed

n1.1:6.9530
n2.1:7.0185
n2.2:7.0313
n3.1:8.2069
n3.2:8.1628
n3.3:8.1311
n4.1:7.5307
n4.2:7.5323
n4.3:7.5858
n5.1:9.5693
n5.2:9.5104
n5.3:9.4821
n6.1:8.9821
n6.2:8.9720
n6.3:8.9541
n7.1:10.640
n7.2:10.650
n7.3:10.638
n8.1:8.6822
n8.2:8.6630
n8.3:8.6903
n9.1:9.5058
n9.2:9.5255
n9.3:9.4809
n10.1:10.484
n10.2:10.452
n10.3:10.516
n11.1:11.327
n11.2:11.316
n11.3:11.318
n12.1:12.285
n12.2:12.303
n12.3:12.272
n13.1:13.127
n13.2:13.113
n13.3:13.113
n14.1:14.035
n14.2:13.989
n14.3:14.021
n15.1:14.533
n15.2:14.529
n15.3:14.586
n16.1:8.6542
n16.2:8.6731
n16.3:8.6586
~


Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Blosch, Edwin L
My bad. Just a dumb mistake. Load-balance, as Ralph suggested. I had decomposed 
into 16 equally sized parts which didn't map well to 15 cores.

Regarding VTune, we have a code that doesn't scale well so that's a good tip.  
I have access to VTune, I've used it.  But I only remember looking at OpenMP, I 
didn't know it could handle MPI runs. That would be great.  Is VampirTrace (?) 
is another option for identifying communication bottlenecks, serial content, 
etc.?

Thanks


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jeff 
Squyres (jsquyres) [jsquy...@cisco.com]
Sent: Friday, June 07, 2013 6:00 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Sandy Bridge performance question

+1

Depending on how much you care, you might also want to look at some performance 
analysis tools to look and see what is happening under the covers.  The Intel 
VTune suite is the gold standard -- it shows all the counters and statistics 
from the CPUs themselves (be aware that there's a bit of a learning curve) -- 
to include things like cache statistics, instructions per clock, ...etc.  Lots 
and lots and lots of info.

Other tools are good, too -- google around (e.g., the cachegrind tool in 
valgrind, etc.).



On Jun 6, 2013, at 4:42 PM, Ralph Castain  wrote:

> It depends on the application you are using. Some are "balanced" - i.e., they 
> run faster if the number of processes is a power of two. You'll see that n8 
> is faster than n7, so this is likely the situation.
>
>
> On Jun 6, 2013, at 4:10 PM, "Blosch, Edwin L"  wrote:
>
>> I am running single-node Sandy Bridge cases with OpenMPI and looking at 
>> scaling.
>>
>> I’m using –bind-to-core without any other options (default is –bycore I 
>> believe).
>>
>> These numbers indicate number of cores first, then the second digit is the 
>> run number (except for n=1, all runs repeated 3 times).  Any thought why n15 
>> should be so much slower than n16?   I also measure the RSS of the running 
>> processes, and the rank 0 process for n=15 cases uses about 2x more memory 
>> than all the other ranks, whereas all the ranks use the same amount of 
>> memory for the n=16 cases.
>>
>> Thanks for insights,
>>
>> Ed
>>
>> n1.1:6.9530
>> n2.1:7.0185
>> n2.2:7.0313
>> n3.1:8.2069
>> n3.2:8.1628
>> n3.3:8.1311
>> n4.1:7.5307
>> n4.2:7.5323
>> n4.3:7.5858
>> n5.1:9.5693
>> n5.2:9.5104
>> n5.3:9.4821
>> n6.1:8.9821
>> n6.2:8.9720
>> n6.3:8.9541
>> n7.1:10.640
>> n7.2:10.650
>> n7.3:10.638
>> n8.1:8.6822
>> n8.2:8.6630
>> n8.3:8.6903
>> n9.1:9.5058
>> n9.2:9.5255
>> n9.3:9.4809
>> n10.1:10.484
>> n10.2:10.452
>> n10.3:10.516
>> n11.1:11.327
>> n11.2:11.316
>> n11.3:11.318
>> n12.1:12.285
>> n12.2:12.303
>> n12.3:12.272
>> n13.1:13.127
>> n13.2:13.113
>> n13.3:13.113
>> n14.1:14.035
>> n14.2:13.989
>> n14.3:14.021
>> n15.1:14.533
>> n15.2:14.529
>> n15.3:14.586
>> n16.1:8.6542
>> n16.2:8.6731
>> n16.3:8.6586
>> ~
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't 
know where to start looking. This is an Infiniband FDR network with Sandy 
Bridge nodes.  I am using affinity (--bind-to-core) but no other options. As 
the number of cores goes up, the message sizes are typically going down. There 
seem to be lots of options in the FAQ, and I would welcome any advice on where 
to start.  All these timings are on a completely empty system except for me.

Thanks


MPI  # cores   Ave. Rate   Std. Dev. %  # timings   Speedup
Efficiency

MVAPICH|   16   |8.6783  |   0.995 % |   2  |   16.000  |  
1.
MVAPICH|   48   |8.7665  |   1.937 % |   3  |   47.517  |  
0.9899
MVAPICH|   80   |8.8900  |   2.291 % |   3  |   78.095  |  
0.9762
MVAPICH|  160   |8.9897  |   2.409 % |   3  |  154.457  |  
0.9654
MVAPICH|  320   |8.9780  |   2.801 % |   3  |  309.317  |  
0.9666
MVAPICH|  480   |8.9704  |   2.316 % |   3  |  464.366  |  
0.9674
MVAPICH|  640   |9.0792  |   1.138 % |   3  |  611.739  |  
0.9558
MVAPICH|  720   |9.1328  |   1.052 % |   3  |  684.162  |  
0.9502
MVAPICH|  800   |9.1945  |   0.773 % |   3  |  755.079  |  
0.9438
OpenMPI|   16   |8.6743  |   2.335 % |   2  |   16.000  |  
1.
OpenMPI|   48   |8.7826  |   1.605 % |   2  |   47.408  |  
0.9877
OpenMPI|   80   |8.8861  |   0.120 % |   2  |   78.093  |  
0.9762
OpenMPI|  160   |8.9774  |   0.785 % |   2  |  154.598  |  
0.9662
OpenMPI|  320   |   12.0585  |  16.950 % |   2  |  230.191  |  
0.7193
OpenMPI|  480   |   14.8330  |   1.300 % |   2  |  280.701  |  
0.5848
OpenMPI|  640   |   17.1723  |   2.577 % |   3  |  323.283  |  
0.5051
OpenMPI|  720   |   18.2153  |   2.798 % |   3  |  342.868  |  
0.4762
OpenMPI|  800   |   19.3603  |   2.254 % |   3  |  358.434  |  
0.4480


Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
16.  dual-socket Xeon, E5-2670.

I am trying a larger model to see if the performance drop-off happens at a 
different number of cores.
Also I'm running some intermediate core-count sizes to refine the curve a bit.
I also added mpi_show_mca_params all, and at the same time, 
btl_openib_use_eager_rdma 1, just to see if that does anything.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Sunday, June 09, 2013 5:04 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem

Looks to me like things are okay thru 160, and then things fall apart after 
that point. How many cores are on a node?


On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:


I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't 
know where to start looking. This is an Infiniband FDR network with Sandy 
Bridge nodes.  I am using affinity (--bind-to-core) but no other options. As 
the number of cores goes up, the message sizes are typically going down. There 
seem to be lots of options in the FAQ, and I would welcome any advice on where 
to start.  All these timings are on a completely empty system except for me.

Thanks


MPI  # cores   Ave. Rate   Std. Dev. %  # timings   Speedup
Efficiency

MVAPICH|   16   |8.6783  |   0.995 % |   2  |   16.000  |  
1.
MVAPICH|   48   |8.7665  |   1.937 % |   3  |   47.517  |  
0.9899
MVAPICH|   80   |8.8900  |   2.291 % |   3  |   78.095  |  
0.9762
MVAPICH|  160   |8.9897  |   2.409 % |   3  |  154.457  |  
0.9654
MVAPICH|  320   |8.9780  |   2.801 % |   3  |  309.317  |  
0.9666
MVAPICH|  480   |8.9704  |   2.316 % |   3  |  464.366  |  
0.9674
MVAPICH|  640   |9.0792  |   1.138 % |   3  |  611.739  |  
0.9558
MVAPICH|  720   |9.1328  |   1.052 % |   3  |  684.162  |  
0.9502
MVAPICH|  800   |9.1945  |   0.773 % |   3  |  755.079  |  
0.9438
OpenMPI|   16   |8.6743  |   2.335 % |   2  |   16.000  |  
1.
OpenMPI|   48   |8.7826  |   1.605 % |   2  |   47.408  |  
0.9877
OpenMPI|   80   |8.8861  |   0.120 % |   2  |   78.093  |  
0.9762
OpenMPI|  160   |8.9774  |   0.785 % |   2  |  154.598  |  
0.9662
OpenMPI|  320   |   12.0585  |  16.950 % |   2  |  230.191  |  
0.7193
OpenMPI|  480   |   14.8330  |   1.300 % |   2  |  280.701  |  
0.5848
OpenMPI|  640   |   17.1723  |   2.577 % |   3  |  323.283  |  
0.5051
OpenMPI|  720   |   18.2153  |   2.798 % |   3  |  342.868  |  
0.4762
OpenMPI|  800   |   19.3603  |   2.254 % |   3  |  358.434  |  
0.4480
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
Correct.  20 nodes, 8 cores per dual-socket on each node = 360.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Sunday, June 09, 2013 6:18 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

So, just to be sure - when you run 320 "cores", you are running across 20 nodes?

Just want to ensure we are using "core" the same way - some people confuse 
cores with hyperthreads.

On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:


16.  dual-socket Xeon, E5-2670.

I am trying a larger model to see if the performance drop-off happens at a 
different number of cores.
Also I'm running some intermediate core-count sizes to refine the curve a bit.
I also added mpi_show_mca_params all, and at the same time, 
btl_openib_use_eager_rdma 1, just to see if that does anything.

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of 
Ralph Castain
Sent: Sunday, June 09, 2013 5:04 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem

Looks to me like things are okay thru 160, and then things fall apart after 
that point. How many cores are on a node?


On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:



I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't 
know where to start looking. This is an Infiniband FDR network with Sandy 
Bridge nodes.  I am using affinity (--bind-to-core) but no other options. As 
the number of cores goes up, the message sizes are typically going down. There 
seem to be lots of options in the FAQ, and I would welcome any advice on where 
to start.  All these timings are on a completely empty system except for me.

Thanks


MPI  # cores   Ave. Rate   Std. Dev. %  # timings   Speedup
Efficiency

MVAPICH|   16   |8.6783  |   0.995 % |   2  |   16.000  |  
1.
MVAPICH|   48   |8.7665  |   1.937 % |   3  |   47.517  |  
0.9899
MVAPICH|   80   |8.8900  |   2.291 % |   3  |   78.095  |  
0.9762
MVAPICH|  160   |8.9897  |   2.409 % |   3  |  154.457  |  
0.9654
MVAPICH|  320   |8.9780  |   2.801 % |   3  |  309.317  |  
0.9666
MVAPICH|  480   |8.9704  |   2.316 % |   3  |  464.366  |  
0.9674
MVAPICH|  640   |9.0792  |   1.138 % |   3  |  611.739  |  
0.9558
MVAPICH|  720   |9.1328  |   1.052 % |   3  |  684.162  |  
0.9502
MVAPICH|  800   |9.1945  |   0.773 % |   3  |  755.079  |  
0.9438
OpenMPI|   16   |8.6743  |   2.335 % |   2  |   16.000  |  
1.
OpenMPI|   48   |8.7826  |   1.605 % |   2  |   47.408  |  
0.9877
OpenMPI|   80   |8.8861  |   0.120 % |   2  |   78.093  |  
0.9762
OpenMPI|  160   |8.9774  |   0.785 % |   2  |  154.598  |  
0.9662
OpenMPI|  320   |   12.0585  |  16.950 % |   2  |  230.191  |  
0.7193
OpenMPI|  480   |   14.8330  |   1.300 % |   2  |  280.701  |  
0.5848
OpenMPI|  640   |   17.1723  |   2.577 % |   3  |  323.283  |  
0.5051
OpenMPI|  720   |   18.2153  |   2.798 % |   3  |  342.868  |  
0.4762
OpenMPI|  800   |   19.3603  |   2.254 % |   3  |  358.434  |  
0.4480
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-11 Thread Blosch, Edwin L
I tried adding "-mca btl openib,sm,self"  but it did not make any difference.

Jesus' e-mail this morning has got me thinking.  In our system, each cabinet 
has 224 cores, and we are reaching a different level of the system architecture 
when we go beyond 224.  I got an additional data point at 256 and found that 
performance is already falling off. Perhaps I did not build OpenMPI properly to 
support the Mellanox adapters that are used in the backplane, or I need some 
configuration setting similar to FAQ #19 in the Tuning/Openfabrics section.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Sunday, June 09, 2013 6:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

Strange - it looks like a classic oversubscription behavior. Another 
possibility is that it isn't using IB for some reason when extended to the 
other nodes. What does your cmd line look like? Have you tried adding "-mca btl 
openib,sm,self" just to ensure it doesn't use TCP for some reason?


On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:


Correct.  20 nodes, 8 cores per dual-socket on each node = 360.

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of 
Ralph Castain
Sent: Sunday, June 09, 2013 6:18 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

So, just to be sure - when you run 320 "cores", you are running across 20 nodes?

Just want to ensure we are using "core" the same way - some people confuse 
cores with hyperthreads.

On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:



16.  dual-socket Xeon, E5-2670.

I am trying a larger model to see if the performance drop-off happens at a 
different number of cores.
Also I'm running some intermediate core-count sizes to refine the curve a bit.
I also added mpi_show_mca_params all, and at the same time, 
btl_openib_use_eager_rdma 1, just to see if that does anything.

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of 
Ralph Castain
Sent: Sunday, June 09, 2013 5:04 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem

Looks to me like things are okay thru 160, and then things fall apart after 
that point. How many cores are on a node?


On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:




I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't 
know where to start looking. This is an Infiniband FDR network with Sandy 
Bridge nodes.  I am using affinity (--bind-to-core) but no other options. As 
the number of cores goes up, the message sizes are typically going down. There 
seem to be lots of options in the FAQ, and I would welcome any advice on where 
to start.  All these timings are on a completely empty system except for me.

Thanks


MPI  # cores   Ave. Rate   Std. Dev. %  # timings   Speedup
Efficiency

MVAPICH|   16   |8.6783  |   0.995 % |   2  |   16.000  |  
1.
MVAPICH|   48   |8.7665  |   1.937 % |   3  |   47.517  |  
0.9899
MVAPICH|   80   |8.8900  |   2.291 % |   3  |   78.095  |  
0.9762
MVAPICH|  160   |8.9897  |   2.409 % |   3  |  154.457  |  
0.9654
MVAPICH|  320   |8.9780  |   2.801 % |   3  |  309.317  |  
0.9666
MVAPICH|  480   |8.9704  |   2.316 % |   3  |  464.366  |  
0.9674
MVAPICH|  640   |9.0792  |   1.138 % |   3  |  611.739  |  
0.9558
MVAPICH|  720   |9.1328  |   1.052 % |   3  |  684.162  |  
0.9502
MVAPICH|  800   |9.1945  |   0.773 % |   3  |  755.079  |  
0.9438
OpenMPI|   16   |8.6743  |   2.335 % |   2  |   16.000  |  
1.
OpenMPI|   48   |8.7826  |   1.605 % |   2  |   47.408  |  
0.9877
OpenMPI|   80   |8.8861  |   0.120 % |   2  |   78.093  |  
0.9762
OpenMPI|  160   |8.9774  |   0.785 % |   2  |  154.598  |  
0.9662
OpenMPI|  320   |   12.0585  |  16.950 % |   2  |  230.191  |  
0.7193
OpenMPI|  480   |   14.8330  |   1.300 % |   2  |  280.701  |  
0.5848
OpenMPI|  640   |   17.1723  |   2.577 % |   3  |  323.283  |  
0.5051
OpenMPI|  720   |   18.2153  |   2.798 % |   3  |  342.868  |  
0.4762
OpenMPI|  800   |   19.3603  |   2.254 % |   3  |  358.43

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-12 Thread Blosch, Edwin L
The version of mxm is reports as:  Version : 1.5.dc8c171
The version of OFED reports as:  MLNX_OFED_LINUX-2.0-2.0.5:

Here are some revised scaling numbers after configuring OpenMPI to use MXM.  
I'm not sure if I posted medium or small case last time, but this is the 
"small" case.  By the time you get out to 800 cores, each process talks to 
between 10 to 16 other processes (this is a physical domain decomposition), and 
the message sizes can be described by saying there is a distribution from 1K 
bytes up to 10K bytes (25%), 3 times larger (50%), and 3 times smaller (25%). 
On the "medium" case, the difference between OpenMPI and MVAPICH is smaller, 
but OpenMPI is still doing better.


Scalability - 1 domain per process
   MPI  # cores   Ave. Rate   Std. Dev. %  # timings   Speedup
Efficiency

MVAPICH|   16   |7.5822  |   0.171 % |   3  |   16.000  |  
1.
MVAPICH|   48   |7.7416  |   0.804 % |   3  |   47.011  |  
0.9794
MVAPICH|   80   |7.6365  |   0.252 % |   3  |   79.431  |  
0.9929
MVAPICH|  160   |7.4802  |   0.887 % |   3  |  162.182  |  
1.0136
MVAPICH|  256   |7.7930  |   1.554 % |   3  |  249.073  |  
0.9729
MVAPICH|  320   |7.7346  |   0.423 % |   3  |  313.695  |  
0.9803
MVAPICH|  480   |7.9225  |   2.594 % |   3  |  459.378  |  
0.9570
MVAPICH|  640   |8.3111  |   2.416 % |   3  |  583.866  |  
0.9123
MVAPICH|  800   |8.9315  |   5.059 % |   3  |  679.137  |  
0.8489
OpenMPI|   16   |7.5919  |   0.879 % |   3  |   16.000  |  
1.
OpenMPI|   48   |7.7469  |   0.478 % |   3  |   47.040  |  
0.9800
OpenMPI|   80   |7.6654  |   0.544 % |   3  |   79.233  |  
0.9904
OpenMPI|  160   |7.7252  |   2.202 % |   3  |  157.239  |  
0.9827
OpenMPI|  256   |7.7043  |   0.563 % |   3  |  252.265  |  
0.9854
OpenMPI|  320   |7.6727  |   6.086 % |   3  |  316.629  |  
0.9895
OpenMPI|  480   |7.7016  |   0.450 % |   3  |  473.163  |  
0.9858
OpenMPI|  640   |8.0357  |   0.572 % |   3  |  604.651  |  
0.9448
OpenMPI|  800   |8.4328  |   3.198 % |   3  |  720.223  |  
0.9003

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Mike Dubman
Sent: Wednesday, June 12, 2013 7:01 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you use?

On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain 
mailto:r...@open-mpi.org>> wrote:
Great! Would you mind showing the revised table? I'm curious as to the relative 
performance.


On Jun 11, 2013, at 4:53 PM, eblo...@1scom.net<mailto:eblo...@1scom.net> wrote:

> Problem solved. I did not configure with --with-mxm=/opt/mellanox/mcm and
> this location was not auto-detected.  Once I rebuilt with this option,
> everything worked fine. Scaled better than MVAPICH out to 800. MVAPICH
> configure log showed that it had found this component of the OFED stack.
>
> Ed
>
>
>> If you run at 224 and things look okay, then I would suspect something in
>> the upper level switch that spans cabinets. At that point, I'd have to
>> leave it to Mellanox to advise.
>>
>>
>> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" 
>> mailto:edwin.l.blo...@lmco.com>>
>> wrote:
>>
>>> I tried adding "-mca btl openib,sm,self"  but it did not make any
>>> difference.
>>>
>>> Jesus' e-mail this morning has got me thinking.  In our system, each
>>> cabinet has 224 cores, and we are reaching a different level of the
>>> system architecture when we go beyond 224.  I got an additional data
>>> point at 256 and found that performance is already falling off. Perhaps
>>> I did not build OpenMPI properly to support the Mellanox adapters that
>>> are used in the backplane, or I need some configuration setting similar
>>> to FAQ #19 in the Tuning/Openfabrics section.
>>>
>>> From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
>>> [mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On
>>> Behalf Of Ralph Castain
>>> Sent: Sunday, June 09, 2013 6:48 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
>>> problem
>>>
>>> Strange - it looks like a classic oversubscription behavior. Ano

[OMPI users] Application hangs on mpi_waitall

2013-06-18 Thread Blosch, Edwin L
I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never returns. 
 The case runs fine with MVAPICH.  The logic associated with the communications 
has been extensively debugged in the past; we don't think it has errors.   Each 
process posts non-blocking receives, non-blocking sends, and then does waitall 
on all the outstanding requests.

The work is broken down into 960 chunks. If I run with 960 processes (60 nodes 
of 16 cores each), things seem to work.  If I use 160 processes (each process 
handling 6 chunks of work), then each process is handling 6 times as much 
communication, and that is the case that hangs with OpenMPI 1.6.4; again, seems 
to work with MVAPICH.  Is there an obvious place to start, diagnostically?  
We're using the openib btl.

Thanks,

Ed


Re: [OMPI users] Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
Attached is the message list for rank 0 for the communication step that is 
failing.  There are about 160 isends and irecvs.  The ‘message size’ is 
actually a number of cells.  On some steps only one 8-byte word per cell is 
communicated, at another step we exchange 7 words, and another step we exchange 
21 words.  You can see the smallest is 10 cells, the largest around 1000 cells.

Thus for the 7-word communication step, the smallest messages are 560 bytes, 
the largest are 56000 bytes, and there is a distribution of sizes.  For the 
single-word communication step, the size distribution would be from 80 bytes to 
8000 and for the 21-word step it would be from 1680 to 168000 bytes.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Rolf vandeVaart
Sent: Thursday, June 27, 2013 9:02 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall

Ed, how large are the messages that you are sending and receiving?
Rolf

From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On Behalf Of Ed Blosch
Sent: Thursday, June 27, 2013 9:01 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Application hangs on mpi_waitall

It ran a bit longer but still deadlocked.  All matching sends are posted 
1:1with posted recvs so it is a delivery issue of some kind.  I'm running a 
debug compiled version tonight to see what that might turn up.  I may try to 
rewrite with blocking sends and see if that works.  I can also try adding a 
barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering 
waiting for recvs to be posted.


Sent via the Samsung Galaxy S™ III, an AT&T 4G LTE smartphone



 Original message 
From: George Bosilca mailto:bosi...@icl.utk.edu>>
Date:
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Application hangs on mpi_waitall


Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
>
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
>
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
>
> Thanks again,
>
> Ed
>
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>>
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
>> start, diagnostically?  We're using the openib btl.
>>
>> Thanks,
>>
>> Ed
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



send_recv.dat
Description: send_recv.dat


Re: [OMPI users] EXTERNAL: Re: Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
The debug version also hung, roughly the same amount of progress in the 
computations (although of course it took much longer to make that progress in 
comparison to the optimized version).

On the bright side, the idea of putting an mpi_barrier after the irecvs and 
before the isends appears to have helped.  I was able to run 5 times farther 
without any trouble.  So now I’m trying to run 50 times farther and, if no 
hang, I will declare workaround-victory.

What could this mean?

I am guessing that one or more processes may run ahead of the others, just 
because of the different amounts of work that precedes the communication step.  
If a process manages to post all its irecvs and post all its isends well before 
another process has managed to post any matching irecvs, perhaps there is some 
buffering resource on the sender side that is getting exhausted?   This is pure 
guessing on my part.

Thanks

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ed Blosch
Sent: Thursday, June 27, 2013 8:01 AM
To: us...@open-mpi.org
Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall

It ran a bit longer but still deadlocked.  All matching sends are posted 
1:1with posted recvs so it is a delivery issue of some kind.  I'm running a 
debug compiled version tonight to see what that might turn up.  I may try to 
rewrite with blocking sends and see if that works.  I can also try adding a 
barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering 
waiting for recvs to be posted.


Sent via the Samsung Galaxy S™ III, an AT&T 4G LTE smartphone



 Original message 
From: George Bosilca mailto:bosi...@icl.utk.edu>>
Date:
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Application hangs on mpi_waitall


Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
>
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
>
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
>
> Thanks again,
>
> Ed
>
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>>
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
>> start, diagnostically?  We're using the openib btl.
>>
>> Thanks,
>>
>> Ed
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
Also, just to be clear, that attached listing is ordered by data in the first 
column and doesn’t reflect the call sequence.  In actual implementation, all 
the messages labeled “mpi-recv” are mpi_irecv and are all posted before any of 
the mpi_isends are posted.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Thursday, June 27, 2013 12:48 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall

Attached is the message list for rank 0 for the communication step that is 
failing.  There are about 160 isends and irecvs.  The ‘message size’ is 
actually a number of cells.  On some steps only one 8-byte word per cell is 
communicated, at another step we exchange 7 words, and another step we exchange 
21 words.  You can see the smallest is 10 cells, the largest around 1000 cells.

Thus for the 7-word communication step, the smallest messages are 560 bytes, 
the largest are 56000 bytes, and there is a distribution of sizes.  For the 
single-word communication step, the size distribution would be from 80 bytes to 
8000 and for the 21-word step it would be from 1680 to 168000 bytes.

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Thursday, June 27, 2013 9:02 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall

Ed, how large are the messages that you are sending and receiving?
Rolf

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org] On Behalf Of Ed Blosch
Sent: Thursday, June 27, 2013 9:01 AM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: Re: [OMPI users] Application hangs on mpi_waitall

It ran a bit longer but still deadlocked.  All matching sends are posted 
1:1with posted recvs so it is a delivery issue of some kind.  I'm running a 
debug compiled version tonight to see what that might turn up.  I may try to 
rewrite with blocking sends and see if that works.  I can also try adding a 
barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering 
waiting for recvs to be posted.


Sent via the Samsung Galaxy S™ III, an AT&T 4G LTE smartphone



 Original message 
From: George Bosilca mailto:bosi...@icl.utk.edu>>
Date:
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Application hangs on mpi_waitall


Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net<mailto:eblo...@1scom.net> wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
>
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
>
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
>
> Thanks again,
>
> Ed
>
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>>
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
>> start, diagnostically?  We're using the openib btl.
>>
>> Thanks,
>>
>> Ed
>> ___
>> users mailing list
>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mai

Re: [OMPI users] EXTERNAL: Re: Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
I tried excluding openib but it did not succeed.  It actually made about the 
same progress as previously using the openib interface before hanging (I mean, 
my 30 second timeout period expired).

I'm more than happy to try out any other suggestions...

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, June 27, 2013 2:57 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Application hangs on mpi_waitall

This seems to highlight a possible bug in the MPI implementation. As I 
suggested earlier, the credit management of the OpenIB might be unsafe.

To confirm this one last test to run. Let's prevent the OpenIB support from 
being used during the run (thus Open MPI will fall back to TCP). I suppose you 
should have ethernet cards in your cluster or you have IBoIP. Add "--mca btl 
^openib" to your mpirun command. If this allows your application to run to 
completion then we know exactly where to start looking.

  George.

On Jun 27, 2013, at 19:59 , "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:


The debug version also hung, roughly the same amount of progress in the 
computations (although of course it took much longer to make that progress in 
comparison to the optimized version).

On the bright side, the idea of putting an mpi_barrier after the irecvs and 
before the isends appears to have helped.  I was able to run 5 times farther 
without any trouble.  So now I'm trying to run 50 times farther and, if no 
hang, I will declare workaround-victory.

What could this mean?

I am guessing that one or more processes may run ahead of the others, just 
because of the different amounts of work that precedes the communication step.  
If a process manages to post all its irecvs and post all its isends well before 
another process has managed to post any matching irecvs, perhaps there is some 
buffering resource on the sender side that is getting exhausted?   This is pure 
guessing on my part.

Thanks

From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of 
Ed Blosch
Sent: Thursday, June 27, 2013 8:01 AM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall

It ran a bit longer but still deadlocked.  All matching sends are posted 
1:1with posted recvs so it is a delivery issue of some kind.  I'm running a 
debug compiled version tonight to see what that might turn up.  I may try to 
rewrite with blocking sends and see if that works.  I can also try adding a 
barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering 
waiting for recvs to be posted.


Sent via the Samsung Galaxy S(tm) III, an AT&T 4G LTE smartphone



 Original message 
From: George Bosilca mailto:bosi...@icl.utk.edu>>
List-Post: users@lists.open-mpi.org
Date:
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Application hangs on mpi_waitall


Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net<mailto:eblo...@1scom.net> wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
>
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
>
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
>
> Thanks again,
>
> Ed
>
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>>
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 

[OMPI users] What's the status of OpenMPI and thread safety?

2013-12-18 Thread Blosch, Edwin L
I was wondering if the FAQ entry below is considered current opinion or perhaps 
a little stale.  Is multi-threading still considered to be 'lightly tested'?  
Are there known open bugs?

Thank you,

Ed


7. Is Open MPI thread safe?

Support for MPI_THREAD_MULTIPLE (i.e., multiple threads executing within the 
MPI library) and asynchronous message passing progress (i.e., continuing 
message passing operations even while no user threads are in the MPI library) 
has been designed into Open MPI from its first planning meetings.

Support for MPI_THREAD_MULTIPLE is included in the first version of Open MPI, 
but it is only lightly tested and likely still has some bugs. Support for 
asynchronous progress is included in the TCP point-to-point device, but it, 
too, has only had light testing and likely still has bugs.

Completing the testing for full support of MPI_THREAD_MULTIPLE and asynchronous 
progress is planned in the near future.



Re: [OMPI users] EXTERNAL: Re: What's the status of OpenMPI and thread safety?

2013-12-19 Thread Blosch, Edwin L
Thanks Ralph,

We are attempting to use 1.6.4 with an application that requires 
multi-threading, and it is hanging most of the time; it is using openib.  They 
steered us to try Intel MPI for now.  If you lack drivers/testers for improved 
thread safety on openib, let me know and I'll encourage the developers of the 
application to support you.

Ed

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, December 18, 2013 6:50 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] What's the status of OpenMPI and thread 
safety?

This was, in fact, a primary point of discussion at last week's OMPI 
developer's conference. Bottom line is that we are only a little further along 
than we used to be, but are focusing on improving it. You'll find good thread 
support for some transports (some of the MTLs and at least the TCP BTL), not so 
good for others (e.g., openib is flat-out not thread safe).


On Dec 18, 2013, at 3:57 PM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:


I was wondering if the FAQ entry below is considered current opinion or perhaps 
a little stale.  Is multi-threading still considered to be 'lightly tested'?  
Are there known open bugs?

Thank you,

Ed


7. Is Open MPI thread safe?

Support for MPI_THREAD_MULTIPLE (i.e., multiple threads executing within the 
MPI library) and asynchronous message passing progress (i.e., continuing 
message passing operations even while no user threads are in the MPI library) 
has been designed into Open MPI from its first planning meetings.

Support for MPI_THREAD_MULTIPLE is included in the first version of Open MPI, 
but it is only lightly tested and likely still has some bugs. Support for 
asynchronous progress is included in the TCP point-to-point device, but it, 
too, has only had light testing and likely still has bugs.

Completing the testing for full support of MPI_THREAD_MULTIPLE and asynchronous 
progress is planned in the near future.

___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Questions on MPI I/O and ompi_info

2014-02-13 Thread Blosch, Edwin L
Why does ompi_info -c say "MPI I/O Support: yes" even though I configured using 
-disable-io-romio?  If ompi_info is going to tell me MPI I/O is supported, then 
shouldn't I expect my test program (attached) to work correctly?  (it doesn't).

I didn't disable "built-in" mpi-io, only io-romio.

  --disable-mpi-ioDisable built-in support for MPI-2 I/O, likely
  because an externally-provided MPI I/O package will
  be used. Default is to use the internal component
  system and its specially modified version of ROMIO
  --disable-io-romio  Disable the ROMIO MPI-IO component

Thanks,

Ed

configure options used:
+ ./configure --prefix=/applocal/tools/mpi/intel/openmpi-1.6.4 --without-tm 
--without-sge --without-lsf --without-psm --without-portals --without-elan 
--without-slurm --without-loadleveler --with-mxm=/opt/mellanox/mxm 
--with-mxm-lib=/opt/mellanox/mxm/lib --enable-mpirun-prefix-by-default 
--enable-contrib-no-build=vt --disable-per-user-config-files --disable-io-romio 
--enable-static CXX=/applocal/intel/composer_xe_2013/bin/icpc 
CC=/applocal/intel/composer_xe_2013/bin/icc 'CFLAGS=  -O2' 'CXXFLAGS=  -O2' 
F77=/applocal/intel/composer_xe_2013/bin/ifort 'FFLAGS=-D_GNU_SOURCE -traceback 
 -O2' FC=/applocal/intel/composer_xe_2013/bin/ifort 'FCFLAGS=-D_GNU_SOURCE 
-traceback  -O2' 'LDFLAGS= -static-intel'

ompi_info -c output:
   Configured by: bloscel
   Configured on: Tue Jun 11 16:20:00 CDT 2013
  Configure host: mgmt1
Built by: bloscel
Built on: Tue Jun 11 16:35:12 CDT 2013
  Built host: mgmt1
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
Fortran90 bindings size: small
  C compiler: /applocal/intel/composer_xe_2013/bin/icc
 C compiler absolute:
  C compiler family name: INTEL
  C compiler version: 1310.20130514
 C char size: 1
 C bool size: 1
C short size: 2
  C int size: 4
 C long size: 8
C float size: 4
   C double size: 8
  C pointer size: 8
C char align: 1
C bool align: 1
 C int align: 4
   C float align: 4
  C double align: 8
C++ compiler: /applocal/intel/composer_xe_2013/bin/icpc
   C++ compiler absolute: none
  Fortran77 compiler: /applocal/intel/composer_xe_2013/bin/ifort
  Fortran77 compiler abs:
  Fortran90 compiler: /applocal/intel/composer_xe_2013/bin/ifort
  Fortran90 compiler abs:
   Fort integer size: 4
   Fort logical size: 4
Fort logical value true: -1
  Fort have integer1: yes
  Fort have integer2: yes
  Fort have integer4: yes
  Fort have integer8: yes
 Fort have integer16: no
 Fort have real4: yes
 Fort have real8: yes
Fort have real16: no
  Fort have complex8: yes
 Fort have complex16: yes
 Fort have complex32: no
  Fort integer1 size: 1
  Fort integer2 size: 2
  Fort integer4 size: 4
  Fort integer8 size: 8
 Fort integer16 size: -1
  Fort real size: 4
 Fort real4 size: 4
 Fort real8 size: 8
Fort real16 size: 16
  Fort dbl prec size: 8
  Fort cplx size: 8
  Fort dbl cplx size: 16
 Fort cplx8 size: 8
Fort cplx16 size: 16
Fort cplx32 size: 32
  Fort integer align: 1
 Fort integer1 align: 1
 Fort integer2 align: 1
 Fort integer4 align: 1
 Fort integer8 align: 1
Fort integer16 align: -1
 Fort real align: 1
Fort real4 align: 1
Fort real8 align: 1
   Fort real16 align: 1
 Fort dbl prec align: 1
 Fort cplx align: 1
 Fort dbl cplx align: 1
Fort cplx8 align: 1
   Fort cplx16 align: 1
   Fort cplx32 align: 1
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
   Sparse Groups: no
Build CFLAGS: -DNDEBUG -O2 -finline-functions -fno-strict-aliasing 
-restrict -pthread
  Build CXXFLAGS: -DNDEBUG -O2 -finline-functions -pthread
Build FFLAGS: -D_GNU_SOURCE -traceback  -O2
   Build FCFLAGS: -D_GNU_SOURCE -traceback  -O2
   Build LDFLAGS: -export-dynamic  -static-intel
  Build LIBS: -lrt -lnsl  -lutil
Wrapper extra CFLAGS: -pthread
  Wrapper extra CXXFLAGS: -pthread
Wrapper extra FFLAGS:
   Wrapper extra FCFLAGS:
   Wrapper extra LDFLAGS:
  Wrapper extra LIBS: -ldl  -lm -lnuma  -Wl,--export-dynamic -lrt -lnsl 
-lutil
  Internal debug support: no
  MPI interface warnings: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous suppo

[OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-01 Thread Blosch, Edwin L
I am getting some errors building 1.8 on RHEL6.  I tried autoreconf as 
suggested, but it failed for the same reason.  Is there a minimum version of m4 
required that is newer than that provided by RHEL6?

Thanks

aclocal.m4:16: warning: this file was generated for autoconf 2.69.
You have another version of autoconf.  It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically 'autoreconf'.
configure.ac:40: error: m4_defn: undefined macro: _m4_divert_diversion
configure.ac:40: the top level
autom4te: /usr/bin/m4 failed with exit status: 1

[bloscel@head openmpi]$ autoreconf
configure.ac:40: error: m4_defn: undefined macro: _m4_divert_diversion
configure.ac:40: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal: autom4te failed with exit status: 1
autoreconf: aclocal failed with exit status: 1


[OMPI users] Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
I am submitting a job for execution under SGE.  My default shell is /bin/csh.  
The script that is submitted has #!/bin/bash at the top.  The script runs on 
the 1st node allocated to the job.  The script runs a Python wrapper that 
ultimately issues the following mpirun command:

/apps/local/test/openmpi/bin/mpirun --machinefile mpihosts.914 -np 48 -x 
LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 --mca btl ^tcp --mca 
shmem_mmap_relocate_backing_file -1 --bind-to-core --bycore --mca 
orte_rsh_agent /usr/bin/rsh --mca plm_rsh_disable_qrsh 1 
/apps/local/test/solver/bin/solver_openmpi -cycles 50 -ri restart.0 -i flow.inp 
>& output

Just so there's no confusion, OpenMPI is built without support for SGE.  It 
should be using rsh to launch.

There are 4 nodes involved (each 12 cores, 48 processes total).  In the output 
file, I see 3 sets of messages as shown below.  I assume I am seeing 1 set of 
messages for each of the 3 remote nodes where processes need to be launched:

/bin/.: Permission denied.
OPAL_PREFIX=/apps/local/falcon2014/openmpi: Command not found.
export: Command not found.
PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
 Command not found.
export: Command not found.
LD_LIBRARY_PATH: Undefined variable.

These look like errors you get when csh is trying to parse commands intended 
for bash.

Does anyone know what may be going on here?

Thanks,

Ed



Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
If I create a program called hello which just contains the line "echo hello", 
then I do

"/bin/. hello"  then I get permission denied.

Is that what you mean?

I might be lost in esoteric corners of Linux.  What is "." under /bin ?  There 
is no program there by that name.  I've heard of "." as a shell built-in, but I 
haven't seen it prefixed by /bin before.

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Monday, April 07, 2014 3:10 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem with shell when launching jobs with 
OpenMPI 1.6.5 rsh

Looks to me like the problem is here:

/bin/.: Permission denied.

Appears you don't have permission to exec bash??


On Apr 7, 2014, at 1:04 PM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:


I am submitting a job for execution under SGE.  My default shell is /bin/csh.  
The script that is submitted has #!/bin/bash at the top.  The script runs on 
the 1st node allocated to the job.  The script runs a Python wrapper that 
ultimately issues the following mpirun command:

/apps/local/test/openmpi/bin/mpirun --machinefile mpihosts.914 -np 48 -x 
LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 --mca btl ^tcp --mca 
shmem_mmap_relocate_backing_file -1 --bind-to-core --bycore --mca 
orte_rsh_agent /usr/bin/rsh --mca plm_rsh_disable_qrsh 1 
/apps/local/test/solver/bin/solver_openmpi -cycles 50 -ri restart.0 -i flow.inp 
>& output

Just so there's no confusion, OpenMPI is built without support for SGE.  It 
should be using rsh to launch.

There are 4 nodes involved (each 12 cores, 48 processes total).  In the output 
file, I see 3 sets of messages as shown below.  I assume I am seeing 1 set of 
messages for each of the 3 remote nodes where processes need to be launched:

/bin/.: Permission denied.
OPAL_PREFIX=/apps/local/falcon2014/openmpi: Command not found.
export: Command not found.
PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
 Command not found.
export: Command not found.
LD_LIBRARY_PATH: Undefined variable.

These look like errors you get when csh is trying to parse commands intended 
for bash.

Does anyone know what may be going on here?

Thanks,

Ed

___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
I guess this is not OpenMPI related anymore.  I can repeat the essential 
problem interactively:

% echo $SHELL
/bin/csh

% echo $SHLVL
1

% cat hello
echo Hello

% /bin/bash hello
Hello

% /bin/csh hello
Hello

%  . hello
/bin/.: Permission denied

I think I need to hope the administrator can fix it.  Sorry for the bother...


-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti
Sent: Monday, April 07, 2014 3:27 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem with shell when launching jobs with 
OpenMPI 1.6.5 rsh

Am 07.04.2014 um 22:04 schrieb Blosch, Edwin L:

> I am submitting a job for execution under SGE.  My default shell is /bin/csh.

Where - in SGE or on the interactive command line you get?


>  The script that is submitted has #!/bin/bash at the top.  The script runs on 
> the 1st node allocated to the job.  The script runs a Python wrapper that 
> ultimately issues the following mpirun command:
>  
> /apps/local/test/openmpi/bin/mpirun --machinefile mpihosts.914 -np 48 -x 
> LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 --mca btl ^tcp --mca 
> shmem_mmap_relocate_backing_file -1 --bind-to-core --bycore --mca 
> orte_rsh_agent /usr/bin/rsh --mca plm_rsh_disable_qrsh 1 
> /apps/local/test/solver/bin/solver_openmpi -cycles 50 -ri restart.0 -i 
> flow.inp >& output
>  
> Just so there's no confusion, OpenMPI is built without support for SGE.  It 
> should be using rsh to launch.
>  
> There are 4 nodes involved (each 12 cores, 48 processes total).  In the 
> output file, I see 3 sets of messages as shown below.  I assume I am seeing 1 
> set of messages for each of the 3 remote nodes where processes need to be 
> launched:
>  
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/falcon2014/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.

This looks really like csh is trying to interpret bash commands. In case SGE's 
queue is set up to have "shell_start_mode posix_compliant" set, the first line 
of the script is not treated in a special way. You can change the shell only by 
"-S /bin/bash" then (or redefine the queue to have "shell_start_mode 
unix_behavior" set and get the expected behavior when starting a script [side 
effect: the shell is not started as login shell any longer. See also `man 
sge_conf` => "login_shells" for details]).

BTW: you don't want a tight integration by intention?

-- Reuti


>  These look like errors you get when csh is trying to parse commands intended 
> for bash. 
>  
> Does anyone know what may be going on here?
>  
> Thanks,
>  
> Ed
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
Thanks Noam, that makes sense.

Yes, I did mean to do ". hello" (with space in between).  That was an attempt 
to replicate whatever OpenMPI is doing.  

In the first post I mentioned that my mpirun command actually gets executed 
from within a Python script using the subprocess module.  I don't know the 
details of the rsh launcher, but there are 3 remote hosts in the hosts file, 
and 3 sets of the error messages below.  May be the rsh launcher is getting 
confused, doing something that is only valid under bash even though my default 
login environment is /bin/csh.  

mpirun --machinefile mpihosts.914 -np 48 -x LD_LIBRARY_PATH --mca 
orte_rsh_agent /usr/bin/rsh  solver_openmpi  -i flow.inp >& output

% cat output

/bin/.: Permission denied.
OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
export: Command not found.
PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
 Command not found.
export: Command not found.
LD_LIBRARY_PATH: Undefined variable.
/bin/.: Permission denied.
OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
export: Command not found.
PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
 Command not found.
export: Command not found.
LD_LIBRARY_PATH: Undefined variable.
/bin/.: Permission denied.
OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
export: Command not found.
PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
 Command not found.
export: Command not found.
LD_LIBRARY_PATH: Undefined variable.

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Noam Bernstein
Sent: Monday, April 07, 2014 3:41 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs 
with OpenMPI 1.6.5 rsh


On Apr 7, 2014, at 4:36 PM, Blosch, Edwin L  wrote:

> I guess this is not OpenMPI related anymore.  I can repeat the essential 
> problem interactively:
> 
> % echo $SHELL
> /bin/csh
> 
> % echo $SHLVL
> 1
> 
> % cat hello
> echo Hello
> 
> % /bin/bash hello
> Hello
> 
> % /bin/csh hello
> Hello
> 
> %  . hello
> /bin/.: Permission denied

. is a bash internal which evaluates the contents of the file in the current 
shell.  Since you're running csh, it's just looking for an executable named ., 
which does not exist (the csh analog of bash's . is source). /bin/. _is_ in 
your path, but it's a directory (namely /bin itself), which cannot be executed, 
hence the error. Perhaps you meant to do
   ./hello
which means (both in bash and csh) run the script hello in the current working 
directory (.), rather than looking for it in the list of directories in $PATH


Noam
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
That worked!

But still a mystery.

I tried printing the environment immediately before mpirun.  Inside the Python 
wrapper, I do os.system('env') immediately before the subprocess.pOpen( 
['mpirun', ..., shell=False ] ) command.  This returns SHELL=/bin/csh, and I 
can confirm that getpwuid, if it works, would also have returned /bin/csh, as 
that is my default shell.

It is also interesting that it does not matter if the job-submission script is 
#!/bin/bash or #!/bin/tcsh (properly re-written, of course) -- I get the same 
errors either way. 

So why did the launcher use a bash syntax on the remote host?  It does not seem 
to be behaving exactly as you described.

But telling it to check the remote shell did the trick.

Thanks


-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Monday, April 07, 2014 4:12 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs 
with OpenMPI 1.6.5 rsh

I doubt that the rsh launcher is getting confused by the cmd you show below. 
However, if that command is embedded in a script that changes the shell away 
from your default shell, then yes - it might get confused. When the rsh 
launcher spawns your remote orted, it attempts to set some envars to ensure 
things are correctly setup (e.g., that the path is right). Thus, it needs to 
know what the remove shell is going to be.

If given no other direction, it assumes that both the remote shell and your 
current shell are your default shell as reported by getpwuid (if available - 
otherwise, it falls back to the SHELL envar). If the remote shell can be 
something different, then you need to set the "plm_rsh_assume_same_shell=0" MCA 
param so it will check the remote shell.


On Apr 7, 2014, at 1:53 PM, Blosch, Edwin L  wrote:

> Thanks Noam, that makes sense.
> 
> Yes, I did mean to do ". hello" (with space in between).  That was an attempt 
> to replicate whatever OpenMPI is doing.  
> 
> In the first post I mentioned that my mpirun command actually gets executed 
> from within a Python script using the subprocess module.  I don't know the 
> details of the rsh launcher, but there are 3 remote hosts in the hosts file, 
> and 3 sets of the error messages below.  May be the rsh launcher is getting 
> confused, doing something that is only valid under bash even though my 
> default login environment is /bin/csh.  
> 
> mpirun --machinefile mpihosts.914 -np 48 -x LD_LIBRARY_PATH --mca 
> orte_rsh_agent /usr/bin/rsh  solver_openmpi  -i flow.inp >& output
> 
> % cat output
> 
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd:
>  Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> 
> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Noam Bernstein
> Sent: Monday, April 07, 2014 3:41 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching 
> jobs with OpenMPI 1.6.5 rsh
> 
> 
> On Apr 7, 2014, at 4:36 PM, Blosch, Edwin L  wrote:
> 
>> I guess this is not OpenMPI related anymore.  I can repeat the essential 
>> problem interactively:
>> 
>> % echo $SHELL
>> /bin/csh
>> 
>> % echo $SHLVL
>> 1
>> 
>> % cat hello
>> echo Hello
>> 
>> % /bin/bash hello
>> Hello
>> 
>> % /bin/csh hello
>> Hello
>> 
>> %  . hello
>> /bin/.: Permission denied
> 
> . is a bash internal which evaluates the contents of the file in the current 
> shell.  Since you're running csh, it's just looking for an executable named 
> ., which does not exist (the csh analog of bash's . is source). /bin/. _is_ 
> in your path, but it's a directory (namely /bin 

Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-07 Thread Blosch, Edwin L
Sorry for the confusion.  I am not building OpenMPI from the SVN source.  I 
downloaded the 1.8 tarball and did configure, and that is what failed.  I was 
surprised that it didn't work on a vanilla Redhat Enterprise Linux 6, out of 
the box operating system installation.   

The error message suggested that I try autoreconf, so I tried it.  

I can try the autogen.sh script and see if that fixes it, but I'm noticing 
another thread right now where Jeff is saying that shouldn't be necessary.

-Original Message-
From: Dave Goodell (dgoodell) [mailto:dgood...@cisco.com] 
Sent: Tuesday, April 01, 2014 11:20 AM
To: Open MPI Users
Subject: Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

On Apr 1, 2014, at 10:26 AM, "Blosch, Edwin L"  wrote:

> I am getting some errors building 1.8 on RHEL6.  I tried autoreconf as 
> suggested, but it failed for the same reason.  Is there a minimum version of 
> m4 required that is newer than that provided by RHEL6?

Don't run "autoreconf" by hand, make sure to run the "./autogen.sh" script that 
is packaged with OMPI.  It will also check your versions and warn you if they 
are out of date.

Do you need to build OMPI from the SVN source?  Or would a (pre-autogen'ed) 
release tarball work for you?

-Dave





[OMPI users] Question on process and memory affinity with 1.8.1

2014-07-21 Thread Blosch, Edwin L
In making the leap from 1.6 to 1.8, how can I check whether or not 
process/memory affinity is supported?

I've built OpenMPI on a system where the numactl-devel package was not 
installed, and another where it was, but I can't see anything in the output of 
ompi_info that suggests any difference between the two builds.  Both builds on 
Linux RHEL6 systems, just different hosts.  both Sandy Bridge.

I guess I can try -bind-to-core on both systems and -report-bindings, then see 
what output I get.

I was just wondering if there's a quick way to tell by using ompi_info.

Thanks

Ed


[OMPI users] Application hangs in 1.8.1 related to collective operations

2014-09-25 Thread Blosch, Edwin L
I had an application suddenly stop making progress.  By killing the last 
process out of 208 processes, then looking at the stack trace, I found 3 of 208 
processes were in an MPI_REDUCE call.  The other 205 had progressed in their 
execution to another routine, where they were waiting in an unrelated 
MPI_ALLREDUCE call.

The code structure is such that each processes calls MPI_REDUCE 5 times for 
different variables, then some work is done, then the MPI_ALLREDUCE call 
happens early in the next iteration of the solution procedure.  I thought it 
was also noteworthy that the 3 processes stuck at MPI_REDUCE, were actually 
stuck on the 4th of 5 MPI_REDUCE calls, not the 5th call.

No issues with MVAPICH.  Problem easily solved by adding MPI_BARRIER after the 
section of MPI_REDUCE calls.

It seems like MPI_REDUCE has some kind of non-blocking implementation, and it 
was not safe to enter the MPI_ALLREDUCE while those MPI_REDUCE calls had not 
yet completed for other processes.

This was in OpenMPI 1.8.1.  Same problem seen on 3 slightly different systems, 
all QDR Infiniband, Mellanox HCAs, using a Mellanox OFED stack (slightly 
different versions on each cluster).  Intel compilers, again slightly different 
versions on each of the 3 systems.

Has anyone encountered anything similar?  While I have a workaround, I want to 
make sure the root cause of the deadlock gets fixed.  Please let me know what I 
can do to help.

Thanks,

Ed


Re: [OMPI users] EXTERNAL: Re: Application hangs in 1.8.1 related to collective operations

2014-09-28 Thread Blosch, Edwin L
Thanks Howard,

I’ve attached ompi_info –c output.

Below are the code snippets that all processes execute.  The first one has some 
number of MPI_REDUCE calls. It’s not an IREDUCE, is that what you mean by 
‘variation’?  The second one calls the MPI_ALLREDUCE.

All the processes execute both of these regions of the code. When the job 
hangs, I notice after 15 or 20 minutes of no progress.  Then I kill one of the 
processes, and the stack trace indicates that most of the processes are still 
in the next-to-last MPI_REDUCE (the 3rd of the 4 that you see), but 3 of them 
are in the MPI_ALLREDUCE.  I miscounted earlier when I said the majority of 
processes were in the 4th MPI_REDUCE out of 5.  It was the 3rd out of 4.

You also asked about size.  The first two MPI_REDUCE calls in the loop below 
involve 1 element; the second two calls involve num_quans elements, which is 22 
in the case that hangs.

I will post some output from the coll_base_verbose output in the next e-mail.

Thanks again

Ed

Snippet #1

do k = 1, num_integrations
  if (integration(k)%skip) cycle

  atots_tot = 0.0_fdf
  atots = integration(k)%atots  ! locally accumulated
  call mpi_reduce(atots,atots_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr)
  if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce atots 
mpi_reduce',ierr)
  integration(k)%atots = atots_tot

  rats_tot = 0.0_fdf
  rats = integration(k)%rats  ! locally accumulated
  call mpi_reduce(rats,rats_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr)
  if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce rats 
mpi_reduce',ierr)
  integration(k)%rats = rats_tot

 int_data_tot = 0.0_fdf
  call mpi_reduce(integration(k)%int_data,int_data_tot, &
  integration(k)%num_quans,my_mpi_real,MPI_SUM, &
  0,exec_comm,ierr)
  if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce int_data 
mpi_reduce',ierr)
  integration(k)%int_data = int_data_tot

  quan_num_max = 0
  call mpi_reduce(integration(k)%quan_num,quan_num_max, &
  integration(k)%num_quans,MPI_INTEGER,MPI_MAX, &
  0,exec_comm,ierr)
  if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce quan_num 
mpi_reduce',ierr)
  integration(k)%quan_num = quan_num_max

enddo


Snippet #2:
! Everybody gets the information about whether any cells have failed.
  itmp(1) = wallfn_runinfo%nwallfn_cells
  itmp(2) = wallfn_runinfo%ncells_failed
  itmp(3) = wallfn_runinfo%ncells_printed
  itmpg = 0
  call mpi_allreduce(itmp,itmpg,3,MPI_INTEGER,MPI_SUM,exec_comm,ierr)
  if (ierr /= MPI_SUCCESS) call 
handle_mpi_error('wallfn_runinfo_dump_errors mpi_allreduce',ierr)
  g_nwallfn_cells = itmpg(1)
  g_ncells_failed = itmpg(2)
  g_ncells_printed = itmpg(3)


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Friday, September 26, 2014 4:10 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Application hangs in 1.8.1 related to 
collective operations

Hello Ed,

Could you post the output of ompi_info?  It would also help to know which 
variant of the collective ops
your doing.  If you could post the output when you run with

mpirun --mca coll_base_verbose 10 "other mpirun args you've been using"

that would be great

Also, if you know the sizes (number of elements) involved in the reduce and 
allreduce operations it
would be helpful to know this as well.

Thanks,

Howard


2014-09-25 3:34 GMT-06:00 Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>>:
I had an application suddenly stop making progress.  By killing the last 
process out of 208 processes, then looking at the stack trace, I found 3 of 208 
processes were in an MPI_REDUCE call.  The other 205 had progressed in their 
execution to another routine, where they were waiting in an unrelated 
MPI_ALLREDUCE call.

The code structure is such that each processes calls MPI_REDUCE 5 times for 
different variables, then some work is done, then the MPI_ALLREDUCE call 
happens early in the next iteration of the solution procedure.  I thought it 
was also noteworthy that the 3 processes stuck at MPI_REDUCE, were actually 
stuck on the 4th of 5 MPI_REDUCE calls, not the 5th call.

No issues with MVAPICH.  Problem easily solved by adding MPI_BARRIER after the 
section of MPI_REDUCE calls.

It seems like MPI_REDUCE has some kind of non-blocking implementation, and it 
was not safe to enter the MPI_ALLREDUCE while those MPI_REDUCE calls had not 
yet completed for other processes.

This was in OpenMPI 1.8.1.  Same problem seen on 3 slightly different systems, 
all QDR Infiniband, Mellanox HCAs, using a Mellanox OFED stack (slightly 
different versions on each cluster).  Intel compilers, again slightly different 
versions on each of the 3 systems.

Has anyone encountered anything similar?  Wh

[OMPI users] Question on mapping processes to hosts file

2014-11-07 Thread Blosch, Edwin L
Here's my command:

/bin/mpirun  --machinefile 
hosts.dat -np 4 

Here's my hosts.dat file:

% cat hosts.dat
node01
node02
node03
node04

All 4 ranks are launched on node01.  I don't believe I've ever seen this 
before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
'round-robin', which I take to mean that one process would be launched per line 
in the hosts file, so this really seems like incorrect behavior.

What could be the possibilities here?

Thanks for the help!





Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Blosch, Edwin L
OK, that’s what I was suspecting.  It’s a bug, right?  I asked for 4 processes 
and I supplied a host file with 4 lines in it, and mpirun didn’t launch the 
processes where I told it to launch them.

Do you know when or if this changed?  I can’t recall seeing this this behavior 
in 1.6.5 or 1.4 or 1.2, and I know I’ve run cases across workstation clusters, 
so I think I would have noticed this behavior.

Can I throw another one at you, most likely related?  On a system where node01, 
node02, node03, and node04 already had a full load of work (i.e. other 
applications were running a number of processes equal to the number of cores on 
each node), I had a hosts file like this:  node01, node01, node02, node02.   I 
asked for 4 processes.  mpirun launched them as I would think: rank 0 and rank 
1 on node01, and rank 2 and 3 on node02.  Then I tried node01, node01, node02, 
node03.  In this case, all 4 processes were launched on node01.  Is there a 
logical explanation for this behavior as well?

Thanks again,

Ed


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, November 07, 2014 11:51 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Question on mapping processes to hosts file

Ah, yes - so here is what is happening. When no slot info is provided, we use 
the number of detected cores on each node as the #slots. So if you want to 
loadbalance across the nodes, you need to set —map-by node

Or add slots=1 to each line of your host file to override the default behavior

On Nov 7, 2014, at 8:52 AM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:

Here’s my command:

/bin/mpirun  --machinefile 
hosts.dat -np 4 

Here’s my hosts.dat file:

% cat hosts.dat
node01
node02
node03
node04

All 4 ranks are launched on node01.  I don’t believe I’ve ever seen this 
before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
‘round-robin’, which I take to mean that one process would be launched per line 
in the hosts file, so this really seems like incorrect behavior.

What could be the possibilities here?

Thanks for the help!



___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25707.php



Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Blosch, Edwin L
Thanks Ralph.  I’ll experiment with these options.  Much appreciated.

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, November 11, 2014 10:00 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts 
file


On Nov 11, 2014, at 6:11 AM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:

OK, that’s what I was suspecting.  It’s a bug, right?  I asked for 4 processes 
and I supplied a host file with 4 lines in it, and mpirun didn’t launch the 
processes where I told it to launch them.

Actually, no - it’s an intended “feature”. When the dinosaurs still roamed the 
earth and OMPI was an infant, we had no way of detecting the number of 
processors on a node in advance of the map/launch phase. During that time, 
users were required to tell us that info in the hostfile, which was a source of 
constant complaint.

Since that time, we have changed the launch procedure so we do have access to 
that info when we need it. Accordingly, we now check to see if you told us the 
number of slots on each node in the hostfile - if not, then we autodetect it 
for you.

Quite honestly, it sounds to me like you might be happier using the 
“sequential” mapper for this use case. It will place one proc on each of the 
indicated nodes, with the rank set by the order in the hostfile. So a hostfile 
like this:

node1
node2
node1
node3

will result in
rank 0 -> node1
rank 1 -> node2
rank 2 -> node1
rank 3 -> node3

etc. To use it, just add "-mca rmaps seq" to you cmd line. Alternatively, you 
could add “--map-by node" to your cmd line and we will round-robin by node.



Do you know when or if this changed?  I can’t recall seeing this this behavior 
in 1.6.5 or 1.4 or 1.2, and I know I’ve run cases across workstation clusters, 
so I think I would have noticed this behavior.

It changed early in the 1.7 series, and has remained consistent since then.



Can I throw another one at you, most likely related?  On a system where node01, 
node02, node03, and node04 already had a full load of work (i.e. other 
applications were running a number of processes equal to the number of cores on 
each node), I had a hosts file like this:  node01, node01, node02, node02.   I 
asked for 4 processes.  mpirun launched them as I would think: rank 0 and rank 
1 on node01, and rank 2 and 3 on node02.  Then I tried node01, node01, node02, 
node03.  In this case, all 4 processes were launched on node01.  Is there a 
logical explanation for this behavior as well?

Now that one is indeed a bug! I’ll dig it up and fix it.




Thanks again,

Ed


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, November 07, 2014 11:51 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Question on mapping processes to hosts file

Ah, yes - so here is what is happening. When no slot info is provided, we use 
the number of detected cores on each node as the #slots. So if you want to 
loadbalance across the nodes, you need to set —map-by node

Or add slots=1 to each line of your host file to override the default behavior

On Nov 7, 2014, at 8:52 AM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:

Here’s my command:

/bin/mpirun  --machinefile 
hosts.dat -np 4 

Here’s my hosts.dat file:

% cat hosts.dat
node01
node02
node03
node04

All 4 ranks are launched on node01.  I don’t believe I’ve ever seen this 
before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
‘round-robin’, which I take to mean that one process would be launched per line 
in the hosts file, so this really seems like incorrect behavior.

What could be the possibilities here?

Thanks for the help!



___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25707.php

___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25742.php



[OMPI users] How can I discover valid values for MCA parameters?

2015-05-29 Thread Blosch, Edwin L
Sometimes I want to use one of the option flags, for example today it is 
mtl_mxm_verbose.  How do I discover the valid possible values of various MCA 
parameters?

I've tried ompi_info --all but it does not show the possible values, only the 
current value

I've tried ompi_info --param  all  but no matter what string I 
give for framework, I get no output at all.

Thanks




Re: [OMPI users] How can I discover valid values for MCA parameters?

2015-05-29 Thread Blosch, Edwin L
Follow-up to the 2nd question which I now realize is something else.

I can see output when I do: ompi_info --param coll fca  with a version 
of OpenMPI that was built with --prefix set to its installed location.

I cannot get the output when I use a relocated version, i.e. built in one place 
and installed in another, even after I set OPAL_PREFIX to reflect the installed 
location.

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L
Sent: Friday, May 29, 2015 11:06 AM
To: Open MPI Users (us...@open-mpi.org)
Subject: EXTERNAL: [OMPI users] How can I discover valid values for MCA 
parameters?

Sometimes I want to use one of the option flags, for example today it is 
mtl_mxm_verbose.  How do I discover the valid possible values of various MCA 
parameters?

I've tried ompi_info --all but it does not show the possible values, only the 
current value

I've tried ompi_info --param  all  but no matter what string I 
give for framework, I get no output at all.

Thanks




[OMPI users] Basic question on portability

2011-03-01 Thread Blosch, Edwin L
If I compile against OpenMPI 1.2.8, shared linkage, on one system, then move 
the executable to another system with OpenMPI 1.4.x or 1.5.x, will I have any 
problems running the executable?

Thanks


[OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
The mpirun command is invoked when the user's group is 'set group' to group 
650.  When the rank 0 process creates files, they have group ownership 650.  
But the user's login group is group 1040. The child processes that get started 
on other nodes run with group 1040, and the files they create have group 
ownership 1040.

Is there a way to tell mpirun to start the child processes with the same uid 
and gid as the rank 0 process?

Thanks


Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
Ralph,

Thanks for the reply.   I'm using 1.4.2.

We have a job queueing system with a prioritization scheme where the priorities 
of jobs are in part a function of the group id.  This is why, for us, it is 
common that the initial mpirun command executes with a group other than the 
user's default group.   We also have some applications where each process 
writes data to disk, and the resulting collection of output files has mixed 
group permissions.  This creates problems --- mostly just inconvenience --- but 
I could imagine some security-conscious folks might be more concerned about it. 
  Also, if it's relevant, the OpenMPI we are using is built without support for 
the job-queueing system (our preference for various reasons).

Ed

From: Ralph Castain [mailto:r...@open-mpi.org]
Sent: Wednesday, September 07, 2011 8:53 AM
To: Open MPI Users
Subject: Re: [OMPI users] Can you set the gid of the processes created by 
mpirun?

On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:


The mpirun command is invoked when the user's group is 'set group' to group 
650.  When the rank 0 process creates files, they have group ownership 650.  
But the user's login group is group 1040. The child processes that get started 
on other nodes run with group 1040, and the files they create have group 
ownership 1040.

Is there a way to tell mpirun to start the child processes with the same uid 
and gid as the rank 0 process?

I'm afraid not - never came up before. Could be done, but probably not right 
away. What version are you using?



Thanks
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-08 Thread Blosch, Edwin L
Yes, we build OpenMPI --without-torque. 

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Thursday, September 08, 2011 4:33 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Can you set the gid of the processes 
created by mpirun?

Am 08.09.2011 um 04:04 schrieb Ed Blosch:

> Typically it is something like 'qsub -W group_list=groupB 
> myjob.sh'. Ultimately myjob.sh runs with gid groupB on some host in the
> cluster.  When that script reaches the mpirun command, then mpirun and the
> processes started on the same host all run with gid groupB, but any of the
> spawned processes that start on other hosts run with the user's default
> group, say groupA.
> 
> It did occur to me that the launching technique might have some ability to
> influence this behavior as you indicated. I don't know what launcher is
> being used in our cases, I guess it's rsh/ssh.

I can only make a statement for SGE that it would honor the group id for the 
"built-in method" or "rsh startup method" with SGE's patched rshd, not for ssh.

But you are using Torque I assume, as there is no -W switch in SGE. How did you 
build Open MPI then? I thought the support for Torque is available by default 
without any special switch to configure in Open MPI.

So, if the slave tasks are started by the pbs_mom, then it should also get the 
set group id. As I don't use Torque I can't make any definite statement for it.

Are you resetting inside the job script some variables to let it run outside 
Torque, i.e. without tight integration?

-- Reuti


> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Reuti
> Sent: Wednesday, September 07, 2011 12:24 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by
> mpirun?
> 
> Hi,
> 
> you mean you change the group id of the user before you submit the job? In
> GridEngine you can specify whether the actual group id should be used for
> the job, or the default login id.
> 
> Having a tight integration, also the slave processes will run with the same
> group id.
> 
> -- Reuti
> 
> 
>> Ed
>> 
>> From: Ralph Castain [mailto:r...@open-mpi.org] 
>> Sent: Wednesday, September 07, 2011 8:53 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Can you set the gid of the processes created by
> mpirun?
>> 
>> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
>> 
>> 
>> The mpirun command is invoked when the user's group is 'set group' to
> group 650.  When the rank 0 process creates files, they have group ownership
> 650.  But the user's login group is group 1040. The child processes that get
> started on other nodes run with group 1040, and the files they create have
> group ownership 1040.
>> 
>> Is there a way to tell mpirun to start the child processes with the same
> uid and gid as the rank 0 process?
>> 
>> I'm afraid not - never came up before. Could be done, but probably not
> right away. What version are you using?
>> 
>> 
>> 
>> Thanks
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
I am getting this error message below and I don't know what it means or how to 
fix it.   It only happens when I run on a large number of processes, e.g. 960.  
Things work fine on 480, and I don't think the application has a bug.  Any help 
is appreciated...

[c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory
[c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory

Here's the mpirun command I used:
mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile  -np 960 
--mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca mpi_leave_pinned 1 -x 
LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 

Here's the applicable hardware from the 
/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini 
file:
# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128

And this is the output of ompi_info -param btl openib:
 MCA btl: parameter "btl_base_verbose" (current value: "0", 
data source: default value)
  Verbosity level of the BTL framework
 MCA btl: parameter "btl" (current value: , data source: 
default value)
  Default selection set of components for the btl 
framework ( means use all components that can be found)
 MCA btl: parameter "btl_openib_verbose" (current value: "0", 
data source: default value)
  Output some verbose OpenIB BTL information (0 = no 
output, nonzero = output)
 MCA btl: parameter "btl_openib_warn_no_device_params_found" 
(current value: "1", data source: default value, synonyms:
  btl_openib_warn_no_hca_params_found)
  Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
  parameter (0 = do not warn; any other value = warn)
 MCA btl: parameter "btl_openib_warn_no_hca_params_found" 
(current value: "1", data source: default value, deprecated, synonym
  of: btl_openib_warn_no_device_params_found)
  Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
  parameter (0 = do not warn; any other value = warn)
 MCA btl: parameter "btl_openib_warn_default_gid_prefix" 
(current value: "1", data source: default value)
  Warn when there is more than one active ports and at 
least one of them connected to the network with only default GID
  prefix configured (0 = do not warn; any other value = 
warn)
 MCA btl: parameter "btl_openib_warn_nonexistent_if" (current 
value: "1", data source: default value)
  Warn if non-existent devices and/or ports are 
specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not
  warn; any other value = warn)
 MCA btl: parameter "btl_openib_want_fork_support" (current 
value: "-1", data source: default value)
  Whether fork support is desired or not (negative = 
try to enable fork support, but continue even if it is not
  available, 0 = do not enable fork support, positive = 
try to enable fork support and fail if it is not available)
 MCA btl: parameter "btl_openib_device_param_files" (current 
value:
  
"/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", 
data source: default value, synonyms:
  btl_openib_hca_param_files)
  Colon-delimited list of INI-style files that contain 
device vendor/part-specific parameters
 MCA btl: parameter "btl_openib_hca_param_files" (current value:
  
"/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", 
data source: default value, deprecated,
  synonym of: btl_openib_device_param_files)
  Colon-delimited list of INI-style files that contain 
device vendor/part-specific parameters
 MCA btl: parameter "btl_openib_device_type" (current value: 
"all", data source: default value)
  Specify to only use IB or iWARP network adapters 
(infiniband = only use InfiniBand HCAs; iwarp = only use iWARP NICs;
  all = use any available adapters)
 MCA btl: parameter "btl_openib_max_btls" (current value: "-1", 
data source: default value)
  Maximum number of device ports to use (-1 = use all

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
Samuel,

This worked.  Did this magic line disable the use of per-peer queue pairs?I 
have seen a previous post by Jeff that explains what this line does generally, 
but I didn't study the post in detail, so if you could provide a little 
explanation I would appreciate it.

Ed

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Samuel K. Gutierrez
Sent: Monday, September 12, 2011 10:49 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem

Hi,

This problem can be  caused by a variety of things, but I suspect our default 
queue pair parameters (QP) aren't helping the situation :-).

What happens when you add the following to your mpirun command?

-mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12

OMPI Developers:

Maybe we should consider disabling the use of per-peer queue pairs by default.  
Do they buy us anything?  For what it is worth, we have stopped using them on 
all of our large systems here at LANL.

Thanks,

Samuel K. Gutierrez
Los Alamos National Laboratory

On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote:


I am getting this error message below and I don't know what it means or how to 
fix it.   It only happens when I run on a large number of processes, e.g. 960.  
Things work fine on 480, and I don't think the application has a bug.  Any help 
is appreciated...

[c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory
[c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory

Here's the mpirun command I used:
mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile  -np 960 
--mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca mpi_leave_pinned 1 -x 
LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 

Here's the applicable hardware from the 
/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini 
file:
# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128

And this is the output of ompi_info -param btl openib:
 MCA btl: parameter "btl_base_verbose" (current value: "0", 
data source: default value)
  Verbosity level of the BTL framework
 MCA btl: parameter "btl" (current value: , data source: 
default value)
  Default selection set of components for the btl 
framework ( means use all components that can be found)
 MCA btl: parameter "btl_openib_verbose" (current value: "0", 
data source: default value)
  Output some verbose OpenIB BTL information (0 = no 
output, nonzero = output)
 MCA btl: parameter "btl_openib_warn_no_device_params_found" 
(current value: "1", data source: default value, synonyms:
  btl_openib_warn_no_hca_params_found)
  Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
  parameter (0 = do not warn; any other value = warn)
 MCA btl: parameter "btl_openib_warn_no_hca_params_found" 
(current value: "1", data source: default value, deprecated, synonym
  of: btl_openib_warn_no_device_params_found)
  Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
  parameter (0 = do not warn; any other value = warn)
 MCA btl: parameter "btl_openib_warn_default_gid_prefix" 
(current value: "1", data source: default value)
  Warn when there is more than one active ports and at 
least one of them connected to the network with only default GID
  prefix configured (0 = do not warn; any other value = 
warn)
 MCA btl: parameter "btl_openib_warn_nonexistent_if" (current 
value: "1", data source: default value)
  Warn if non-existent devices and/or ports are 
specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not
  warn; any other value = warn)
 MCA btl: parameter "btl_openib_want_fork_support" (current 
value: "-1", data source: default value)
  Whether fork support is desired or not (negative = 
try to enable fork support, but continue even if it is not
  available, 0 = do not enable fork support, positive = 
try to enable fork support and fail if it is not available)

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
Nathan,   I found this parameters under /sys/module/mlx4_core/parameters.   How 
do you incorporate a changed value?  What to restart/rebuild?

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Nathan Hjelm
Sent: Monday, September 12, 2011 11:00 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem

I also recommend checking the log_mtts_per_set parameter to the mlx4 module. 
This parameter controls how much memory can be registered for use by the mlx4 
driver and it should be in the range 1-5 (or 0-7 depending on the version of 
the mlx4 driver). I recommend tthe parameter be set such that you can register 
all the memory on the node. Assuming you are not using huge pages here is a 
list of possible values (and how much memory can be registered).

0 -   2 GB (new mlx4 default-- bad setting)
1 -   4 GB
2 -   8 GB
3 -  16 GB (old mlx4 default)
4 -  32 GB
5 -  64 GB
6 - 128 GB
7 - 256 GB

-Nathan Hjelm
Los Alamos National Laboratory

On Mon, 12 Sep 2011, Samuel K. Gutierrez wrote:

> Hi,
> This problem can be  caused by a variety of things, but I suspect our
> default queue pair parameters (QP) aren't helping the situation :-).
>
> What happens when you add the following to your mpirun command?
>
> -mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12
>
> OMPI Developers:
>
> Maybe we should consider disabling the use of per-peer queue pairs by
> default.  Do they buy us anything?  For what it is worth, we have stopped 
> using them on all of our large systems here at LANL.
>
> Thanks,
>
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>
> On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote:
>
>   I am getting this error message below and I don't know what it means or 
> how to fix it.   It only happens when I
>   run on a large number of processes, e.g. 960.  Things work fine on 480, 
> and I don't think the application has a
>   bug.  Any help is appreciated...
>
> [c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_on
> e] error creating qp errno says Cannot allocate memory
> [c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_on
> e] error creating qp errno says Cannot allocate memory
>
> Here's the mpirun command I used:
> mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile 
> -np 960 --mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca
> mpi_leave_pinned 1 -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1
>  arguments>
>
> Here's the applicable hardware from the
> /usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-param
> s.ini
> file:
> # A.k.a. ConnectX
> [Mellanox Hermon]
> vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
> vendor_part_id =
> 25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
> use_eager_rdma = 1
> mtu = 2048
> max_inline_data = 128
>
> And this is the output of ompi_info -param btl openib:
>  MCA btl: parameter "btl_base_verbose" (current value:
> "0", data source: default value)
>   Verbosity level of the BTL framework
>  MCA btl: parameter "btl" (current value: , data
> source: default value)
>   Default selection set of components for the
> btl framework ( means use all components that can be found)
>  MCA btl: parameter "btl_openib_verbose" (current
> value: "0", data source: default value)
>   Output some verbose OpenIB BTL information
> (0 = no output, nonzero = output)
>  MCA btl: parameter
> "btl_openib_warn_no_device_params_found" (current value: "1", data source: 
> default value, synonyms:
>   btl_openib_warn_no_hca_params_found)
>   Warn when no device-specific parameters are
> found in the INI file specified by the btl_openib_device_param_files
> MCA
>   parameter (0 = do not warn; any other value
> = warn)
>  MCA btl: parameter
> "btl_openib_warn_no_hca_params_found" (current value: "1", data
> source: default value, deprecated, synonym
>   of: btl_openib_warn_no_device_params_found)
>   Warn when no device-specific parameters are
> found in the INI file specified by the btl_openib_device_param_files
> MCA
>   parameter (0 = do not warn; any other value
> = warn)
>  MCA btl: parameter
> "btl_openib_warn_default_gid_prefix" (current value: "1", data source:
> default
> value)
>

Re: [OMPI users] qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
Actually we were already aware of this FAQ and already have the limits set to 
hard and soft unlimited in the PAM limits.conf as well as in the pbs_mom 
resource manager startup script.  We encountered those issues a few years ago 
and definitely are aware of having process limits set too low.  I don't think 
we are suffering from that particular problem.


Samuel's suggestion worked, and we're trying Nathan's suggestion now.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Shamis, Pavel
Sent: Monday, September 12, 2011 11:39 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem

Alternative solution for the problem is updating your memory limits
Please see below: 
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Apparently you memory limit is low and the driver fails to create QPs


What happens when you add the following to your mpirun command?

-mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12

And if you have connectX-2/or 3 device I would suggest replace "S" by "X"  - X 
enables much more efficient XRC QP,  instead of SRQ.


OMPI Developers:

Maybe we should consider disabling the use of per-peer queue pairs by default.  
Do they buy us anything?  For what it is worth, we have stopped using them on 
all of our large systems here at LANL.

It is cons-and-pros of using per-peer queues. It really depends on application 
behavior.
I would suggest some dynamic-adjustment solution. If NP > some_threshould -> 
switch to SRQ or XRC.

Also I would recommend to print our some informative error message for "qp 
errno says Cannot allocate memory" error.

Regards,
Pasha.


Thanks,

Samuel K. Gutierrez
Los Alamos National Laboratory

On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote:

I am getting this error message below and I don't know what it means or how to 
fix it.   It only happens when I run on a large number of processes, e.g. 960.  
Things work fine on 480, and I don't think the application has a bug.  Any help 
is appreciated...

[c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory
[c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory

Here's the mpirun command I used:
mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile  -np 960 
--mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca mpi_leave_pinned 1 -x 
LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 

Here's the applicable hardware from the 
/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini 
file:
# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128

And this is the output of ompi_info -param btl openib:
MCA btl: parameter "btl_base_verbose" (current value: "0", data 
source: default value)
 Verbosity level of the BTL framework
MCA btl: parameter "btl" (current value: , data source: 
default value)
 Default selection set of components for the btl 
framework ( means use all components that can be found)
MCA btl: parameter "btl_openib_verbose" (current value: "0", 
data source: default value)
 Output some verbose OpenIB BTL information (0 = no 
output, nonzero = output)
MCA btl: parameter "btl_openib_warn_no_device_params_found" 
(current value: "1", data source: default value, synonyms:
 btl_openib_warn_no_hca_params_found)
 Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
 parameter (0 = do not warn; any other value = warn)
MCA btl: parameter "btl_openib_warn_no_hca_params_found" 
(current value: "1", data source: default value, deprecated, synonym
 of: btl_openib_warn_no_device_params_found)
 Warn when no device-specific parameters are found in 
the INI file specified by the btl_openib_device_param_files MCA
 parameter (0 = do not warn; any other value = warn)
MCA btl: parameter "btl_openib_warn_default_gid_prefix" 
(current value: "1", data source: default value)
 Warn when there is more than one active ports and at 
least one of them connected to the network with only default GID
 prefix configured (0 = do not warn; any other value = 
warn)
MCA 

[OMPI users] Question on using rsh

2011-09-12 Thread Blosch, Edwin L
I have a hello world program that runs without prompting for password with 
plm_rsh_agent but not with orte_rsh_agent, I mean it runs but only after 
prompting for a password:

/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
/usr/bin/rsh ./test_setup

Hello from process2
Hello from process5
Hello from process   12
Hello from process6
Hello from process0
Hello from process4
Hello from process3
Hello from process7
Hello from process   14
Hello from process8
Hello from process1
Hello from process9
Hello from process   10
Hello from process   11
Hello from process   13
Hello from process   15

bin/mpirun --machinefile mpihosts.dat -np 16 -mca orte_rsh_agent 
/usr/bin/rsh./test_setup
bloscel@f8312's password:

I didn't notice anything about this in the FAQ except that orte_rsh_agent is 
newer than plm_rsh_agent.  Did I miss some critical piece of information?  Why 
do these options behave differently?

Thanks


Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
It was set to 0 previously.  We've set it to 4 and restarted some service and 
now it works.  So both your and Samuel's suggestions worked. 

On another system, slightly older, it was defaulted to 3 instead of 0, and 
apparently that explains why the job always ran before and on this newer system 
did not run.

I'm wondering if there was any way for us to know that this change had happened.

At any rate, thanks for the support.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Nathan Hjelm
Sent: Monday, September 12, 2011 12:05 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem


On Mon, 12 Sep 2011, Blosch, Edwin L wrote:

> Nathan,   I found this parameters under /sys/module/mlx4_core/parameters.   
> How do you incorporate a changed value?  What to restart/rebuild?

Add the following line to /etc/modprobe (replace X with the appropriate value 
for log_mtts_per_seg):
options mlx4_core log_mtts_per_seg=X

BTW, what was log_mtts_per_seg set to?

-Nathan Hjelm
Los Alamos National Laboratory
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Question on using rsh

2011-09-13 Thread Blosch, Edwin L
Ralph, Reuti,

There are no typos, except in my post itself where I clipped out a few 
arguments.

I just repeated the exercise this morning, exactly like this:

/bin/mpirun --machinefile mpihosts.dat -np 16 -mca orte_rsh_agent 
/usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup

It prompts for a password.

Then I hit the "up" arrow key to bring the command back, and I type over "orte" 
and replace with "plm":

/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
/usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup

This time it does not prompt for a password.

I can reverse the order, it doesn't change the behavior:  the "orte" one 
prompts for a password but the "plm" one doesn't.

They must not be wholly identical, somehow.  This is OpenMPI 1.4.3.

Ed


From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Monday, September 12, 2011 7:43 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Question on using rsh

The two are synonyms for each other - they resolve to the identical variable, 
so there isn't anything different about them.

Not sure what the issue might be, but I would check for a typo - we don't check 
that mca params are spelled correctly, nor do we check for params that don't 
exist (e.g., because you spelled it wrong).


On Sep 12, 2011, at 3:03 PM, Blosch, Edwin L wrote:


I have a hello world program that runs without prompting for password with 
plm_rsh_agent but not with orte_rsh_agent, I mean it runs but only after 
prompting for a password:

/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
/usr/bin/rsh ./test_setup

Hello from process2
Hello from process5
Hello from process   12
Hello from process6

Hello from process0
Hello from process4
Hello from process3
Hello from process7
Hello from process   14
Hello from process8
Hello from process1
Hello from process9
Hello from process &n bsp; 10
Hello from process   11
Hello from process   13
Hello from process   15

bin/mpirun --machinefile mpihosts.dat -np 16 -mca orte_rsh_agent 
/usr/bin/rsh./test_setup
bloscel@f8312's password:

I didn't notice anything about this in the FAQ except that orte_rsh_agent is 
newer than plm_rsh_agent.  Did I miss some critical piece of information?  Why 
do these options behave differently?


Thanks
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
I'm able to run this command below from an interactive shell window:

/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
/usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup

but it does not work if I put it into a shell script and 'qsub' that script to 
SGE.  I get the message shown at the bottom of this post.

I've tried everything I can think of.  I would welcome any hints on how to 
proceed.

For what it's worth, this OpenMPI is 1.4.3 and I built it on another system.  I 
am setting and exporting OPAL_PREFIX and as I said, all works fine 
interactively just not in batch.  It was built with -disable-shared and I don't 
see any shared libs under openmpi/lib, and I've done 'ldd' from within the 
script, on both the application executable and on the orterun command; no 
unresolved shared libraries.  So I don't think the error message hinting at 
LD_LIBRARY_PATH issues is pointing me in the right direction.

Thanks for any guidance,

Ed


error: executing task of job 139362 failed: execution daemon on host "f8312" 
didn't accept task
--
A daemon (pid 2818) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished



Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
This version of OpenMPI I am running was built without any guidance regarding 
SGE in the configure command, but it was built on a system that did not have 
SGE, so I would presume support is absent.

My hope is that OpenMPI will not attempt to use SGE in any way. But perhaps it 
is trying to. 

Yes, I did supply a machinefile on my own.  It is formed on the fly within the 
submitted script by parsing the PE_HOSTFILE, and I leave the resulting file 
lying around, and the result appears to be correct, i.e. it includes those 
nodes (and only those nodes) allocated to the job.



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Tuesday, September 13, 2011 4:27 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE

Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L:

> I'm able to run this command below from an interactive shell window:
>  
> /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup
>  
> but it does not work if I put it into a shell script and 'qsub' that script 
> to SGE.  I get the message shown at the bottom of this post. 
>  
> I've tried everything I can think of.  I would welcome any hints on how to 
> proceed. 
>  
> For what it's worth, this OpenMPI is 1.4.3 and I built it on another system.  
> I am setting and exporting OPAL_PREFIX and as I said, all works fine 
> interactively just not in batch.  It was built with -disable-shared and I 
> don't see any shared libs under openmpi/lib, and I've done 'ldd' from within 
> the script, on both the application executable and on the orterun command; no 
> unresolved shared libraries.  So I don't think the error message hinting at 
> LD_LIBRARY_PATH issues is pointing me in the right direction.
>  
> Thanks for any guidance,
>  
> Ed
>  

Oh, I missed this:


> error: executing task of job 139362 failed: execution daemon on host "f8312" 
> didn't accept task

did you supply a machinefile on your own? In a proper SGE integration it's 
running in a parallel environment. You defined and requested one? The error 
looks like it was started in a PE, but tried to access a node not granted for 
the actual job

-- Reuti


> --
> A daemon (pid 2818) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>  
> There may be more information reported by the environment (see above).
>  
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> mpirun: clean termination accomplished
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
Your comment guided me in the right direction, Reuti. And overlapped with your 
guidance, Ralph.

It works: if I add this flag then it runs
--mca plm_rsh_disable_qrsh

Thank you both for the explanations.  

I had built OpenMPI on another system, as I said, it did not have SGE and thus 
I did not give --without-sge (nor did I give --with-sge).  In the future for 
building 1.4.3 I will just add --without-sge and presumably I won't run into 
the qrsh issue.

Thanks again




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Tuesday, September 13, 2011 4:27 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE

Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L:

> I'm able to run this command below from an interactive shell window:
>  
> /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup
>  
> but it does not work if I put it into a shell script and 'qsub' that script 
> to SGE.  I get the message shown at the bottom of this post. 
>  
> I've tried everything I can think of.  I would welcome any hints on how to 
> proceed. 
>  
> For what it's worth, this OpenMPI is 1.4.3 and I built it on another system.  
> I am setting and exporting OPAL_PREFIX and as I said, all works fine 
> interactively just not in batch.  It was built with -disable-shared and I 
> don't see any shared libs under openmpi/lib, and I've done 'ldd' from within 
> the script, on both the application executable and on the orterun command; no 
> unresolved shared libraries.  So I don't think the error message hinting at 
> LD_LIBRARY_PATH issues is pointing me in the right direction.
>  
> Thanks for any guidance,
>  
> Ed
>  

Oh, I missed this:


> error: executing task of job 139362 failed: execution daemon on host "f8312" 
> didn't accept task

did you supply a machinefile on your own? In a proper SGE integration it's 
running in a parallel environment. You defined and requested one? The error 
looks like it was started in a PE, but tried to access a node not granted for 
the actual job

-- Reuti


> --
> A daemon (pid 2818) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>  
> There may be more information reported by the environment (see above).
>  
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> mpirun: clean termination accomplished
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
We don't budget computer hours so I don't think we would use accounting, 
although I'm not sure I know what this capability is all about. Also, I don't 
care about launch speed. A few minutes means nothing when the job will take 
days to run. Also, I have a highly portable strategy of wrapping the mpirun 
command with a shell script that figures out how many processes are allocated 
to the job and explicitly tells OpenMPI how many hosts to use and which ones.  
I can adapt that script in very minor ways to support any job-queueing system 
past present or future, and my invocation of the mpirun command remains the 
same and should always work.

For these reasons I have preferred the rsh/ssh launcher, the less intelligent 
the better. I'm sure there are benefits of tight integration, as you said, 
perhaps you can keep users from accidentally or intentionally using nodes 
outside their allocation. It's just not an issue for us.

I will check the FAQ to see if I can learn more about the benefits of tight 
integration with a job-queueing system.


Thank you again for the help


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Tuesday, September 13, 2011 5:36 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

Am 14.09.2011 um 00:25 schrieb Blosch, Edwin L:

> Your comment guided me in the right direction, Reuti. And overlapped with 
> your guidance, Ralph.
> 
> It works: if I add this flag then it runs
> --mca plm_rsh_disable_qrsh
> 
> Thank you both for the explanations.  
> 
> I had built OpenMPI on another system, as I said, it did not have SGE and 
> thus I did not give --without-sge (nor did I give --with-sge).  In the future 
> for building 1.4.3 I will just add --without-sge and presumably I won't run 
> into the qrsh issue.

Can I understand this in a way, that you don't want a tight integration with 
correct accounting, but prefer to run slave tasks by rsh/ssh on your own? This 
can lead to oversubscribed machines in case some users' scripts are not 
honoring the machinefile in the correct way.

Having a tight integration (with disabled ssh/rsh inside the cluster) is the 
setup I usually prefer.

-- Reuti


> Thanks again
> 
> 
> 
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Reuti
> Sent: Tuesday, September 13, 2011 4:27 PM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE
> 
> Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L:
> 
>> I'm able to run this command below from an interactive shell window:
>> 
>> /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
>> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup
>> 
>> but it does not work if I put it into a shell script and 'qsub' that script 
>> to SGE.  I get the message shown at the bottom of this post. 
>> 
>> I've tried everything I can think of.  I would welcome any hints on how to 
>> proceed. 
>> 
>> For what it's worth, this OpenMPI is 1.4.3 and I built it on another system. 
>>  I am setting and exporting OPAL_PREFIX and as I said, all works fine 
>> interactively just not in batch.  It was built with -disable-shared and I 
>> don't see any shared libs under openmpi/lib, and I've done 'ldd' from within 
>> the script, on both the application executable and on the orterun command; 
>> no unresolved shared libraries.  So I don't think the error message hinting 
>> at LD_LIBRARY_PATH issues is pointing me in the right direction.
>> 
>> Thanks for any guidance,
>> 
>> Ed
>> 
> 
> Oh, I missed this:
> 
> 
>> error: executing task of job 139362 failed: execution daemon on host "f8312" 
>> didn't accept task
> 
> did you supply a machinefile on your own? In a proper SGE integration it's 
> running in a parallel environment. You defined and requested one? The error 
> looks like it was started in a PE, but tried to access a node not granted for 
> the actual job
> 
> -- Reuti
> 
> 
>> --
>> A daemon (pid 2818) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>> 
>> There may be more information reported by the environment (see above).
>> 
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this 

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
Thanks, Ralph,

I get the failure messages, unfortunately:

setgid FAILED
setgid FAILED
setgid FAILED

I actually had attempted to call setgid from within the application previously, 
which looks similar to what you've done, but it failed. That was when I 
initiated the post to the mailing list. My conclusion, a guess really, was that 
Linux would not let me setgid from within my program because I was not root.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Wednesday, September 14, 2011 8:15 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?

The attached should set the gid of the remote daemons (and their children) to 
the gid of mpirun. No cmd line option or anything is required - it will just 
always do it.

Would you mind giving it a try?

Please let me know if/how it works.



Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
Thanks for trying. 

Do you feel that this is an impossible request without the assistance of some 
process running as root, for example, as Reuti mentioned, the daemons of a job 
scheduler?  Or are you saying it will just not be as straightforward as calling 
setgid as you had hoped?

Also, do you think there is a way I could make use of the sg command below?  
Perhaps there is a way to have the rsh/ssh launcher start the application 
processes with a command like 'sg  '?

Ed


NAME
   sg - execute command as different group ID

SYNOPSIS
   sg [-] [group [-c ] command]

DESCRIPTION
   The sg command works similar to newgrp but accepts a command. The
   command will be executed with the /bin/sh shell. With most shells you
   may run sg from, you need to enclose multi-word commands in quotes.
   Another difference between newgrp and sg is that some shells treat
   newgrp specially, replacing themselves with a new instance of a shell
   that newgrp creates. This doesn't happen with sg, so upon exit from a
   sg command you are returned to your previous group ID.




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Wednesday, September 14, 2011 11:33 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?


On Sep 14, 2011, at 9:39 AM, Blosch, Edwin L wrote:

> Thanks, Ralph,
> 
> I get the failure messages, unfortunately:
> 
> setgid FAILED
> setgid FAILED
> setgid FAILED
> 
> I actually had attempted to call setgid from within the application 
> previously, which looks similar to what you've done, but it failed. That was 
> when I initiated the post to the mailing list. My conclusion, a guess really, 
> was that Linux would not let me setgid from within my program because I was 
> not root.

I was afraid of that - the documentation seemed to indicate that would be the 
case, but I figured it was worth a quick try. Sorry I can't be of help.


> 
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Wednesday, September 14, 2011 8:15 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
> created by mpirun?
> 
> The attached should set the gid of the remote daemons (and their children) to 
> the gid of mpirun. No cmd line option or anything is required - it will just 
> always do it.
> 
> Would you mind giving it a try?
> 
> Please let me know if/how it works.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L

> Try -mca orte_launch_agent "sg N orted", where N is the desired group ID.

There is a catch in the way the orted process is started. 

I get messages like this:

bash: sg /home/install/openmpi/bin/(null) --daemonize -mca ess env -mca 
orte_ess_jobid 3913285632 -mca orte_ess_vpid 9 -mca orte_ess_num_procs 10 
--hnp-uri 3913285632.0: No such file or directory


I think it is taking "sg" as the orted command and ignoring the groupid and the 
orted arguments to sg.  Probably some detail of how the orted process is being 
created.


> 
> Ed
> 
> 
> NAME
>   sg - execute command as different group ID
> 
> SYNOPSIS
>   sg [-] [group [-c ] command]
> 
> DESCRIPTION
>   The sg command works similar to newgrp but accepts a command. The
>   command will be executed with the /bin/sh shell. With most shells you
>   may run sg from, you need to enclose multi-word commands in quotes.
>   Another difference between newgrp and sg is that some shells treat
>   newgrp specially, replacing themselves with a new instance of a shell
>   that newgrp creates. This doesn't happen with sg, so upon exit from a
>   sg command you are returned to your previous group ID.
> 
> 
> 
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Wednesday, September 14, 2011 11:33 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
> created by mpirun?
> 
> 
> On Sep 14, 2011, at 9:39 AM, Blosch, Edwin L wrote:
> 
>> Thanks, Ralph,
>> 
>> I get the failure messages, unfortunately:
>> 
>> setgid FAILED
>> setgid FAILED
>> setgid FAILED
>> 
>> I actually had attempted to call setgid from within the application 
>> previously, which looks similar to what you've done, but it failed. That was 
>> when I initiated the post to the mailing list. My conclusion, a guess 
>> really, was that Linux would not let me setgid from within my program 
>> because I was not root.
> 
> I was afraid of that - the documentation seemed to indicate that would be the 
> case, but I figured it was worth a quick try. Sorry I can't be of help.
> 
> 
>> 
>> 
>> -Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
>> Behalf Of Ralph Castain
>> Sent: Wednesday, September 14, 2011 8:15 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
>> created by mpirun?
>> 
>> The attached should set the gid of the remote daemons (and their children) 
>> to the gid of mpirun. No cmd line option or anything is required - it will 
>> just always do it.
>> 
>> Would you mind giving it a try?
>> 
>> Please let me know if/how it works.
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
Thank you -  I did pursue this kind of workaround, and it worked, but you'll be 
happy to know that nothing had to be owned by root.

ASIDE 
Just to remind:  The job script is a shell script that invokes mpirun; the job 
script itself is run with as the correct user, but the group id may be changed 
to whatever the user requests of the job-scheduling system.  I think it may not 
be uncommon to have jobs that request a specific Unix group, for many reasons, 
but in our case the group is an input for the scheduler's prioritization policy.

Outcome: rank 0 runs user:group2 as the user requested, but the launched child 
processes run  user:group1  where group 1 is the user's primary group.  The 
peculiarity of this application is that each of the processes writes a file to 
disk, so the resulting group ownership of rank 0 files is group2, but the group 
ownership of all other ranks' files is group1.  That was the original problem 
I'm trying to work around.
--- END ASIDE

Fortunately for me, there is another peculiarity of this application -- the 
executable gets copied out to /tmp (local space) on each of the hosts to be 
used in the job.  We found this helped prevent some crashes during test phases 
where an executable gets overwritten while in use.  Definitely a special 
behavior.  But as a result of this peculiarity, the mpirun command ends up 
launching the copied executable, and I took advantage of that.

I had the job script do chown user:group2 on the copied executables and then 
chmod 6711, and then I observed that the child processes ran as user:group2, 
same as the rank 0 process, so the files they created had the desired group 
ownership.

I will explore Reuti's guidance as well.

Thank you


From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Randall Svancara
Sent: Wednesday, September 14, 2011 3:07 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?

You could set the setuid bit on the application and chown it to root??  It is 
about as secure as anything else that has been described thus far.  As a system 
admin, I cringe at the thought of anything that would allow something to run as 
someone else,  so there would have to be a pretty good justification for such 
unique use case as yours.

Randall
On Wed, Sep 14, 2011 at 12:56 PM, Reuti 
mailto:re...@staff.uni-marburg.de>> wrote:
Am 14.09.2011 um 19:02 schrieb Blosch, Edwin L:

> Thanks for trying.
>
> Do you feel that this is an impossible request without the assistance of some 
> process running as root, for example, as Reuti mentioned, the daemons of a 
> job scheduler?  Or are you saying it will just not be as straightforward as 
> calling setgid as you had hoped?
>
> Also, do you think there is a way I could make use of the sg command below?  
> Perhaps there is a way to have the rsh/ssh launcher start the application 
> processes with a command like 'sg  '?
What about a half-tight integration (or call it: classic tight integration), 
i.e. no recompilation necessary?

- setup your mpiexec call in the jobscript to use a plain rsh for the remote 
startup (no path given): -mca plm_rsh_agent rsh

- the PE of SGE needs the argument -catch_rsh in start_proc_args and the 
supplied script in $SGE_ROOT/mpi/startmpi.sh

 (SGE will create a symbolic link in $TMPDIR therein [which will be called 
first this way] to the rsh-wrapper in $SGE_ROOT/mpi [pitfall: some applications 
need a -V to be added in the lines woth "qrsh", i.e. "qrsh -inherit -V ..." to 
send all environment variables to the slaves])

- what is your setting of qrsh_daemon/qrsh_command in `qconf -sconf`? This will 
then be used finally to reach the node and should be builtin or point to the 
SGE supplied rsh/rshd (no rshd necessary to install, no rshd is running all the 
time, no rshd will be started by xinet.d or alike)

- like you do already: switch off the built-in SGE starter in your mpiexec 
call: -mca plm_rsh_disable_qrsh 1

-- Reuti

PS: To avoid misunderstandings: you could also set "-mca plm_rsh_agent foobar" 
and in $SGE_ROOT/mpi/startmpi.sh you change it to create a symbolic link called 
"foobar " in $TMPDIR. It's just a name at this stage of startup.


> Ed
>
>
> NAME
>  sg - execute command as different group ID
>
> SYNOPSIS
>  sg [-] [group [-c ] command]
>
> DESCRIPTION
>  The sg command works similar to newgrp but accepts a command. The
>  command will be executed with the /bin/sh shell. With most shells you
>  may run sg from, you need to enclose multi-word commands in quotes.
>  Another difference between newgrp and sg is that some shells treat
>  newgrp specially, replacing themselves with a new instance of a shell
>  that newgrp creates. This doesn't happen with s

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
I would appreciate trying to fix the multi-word argument to orte_launch_agent, 
unless you've already fixed it in a later series in which case I could update.  
My workaround with the setuid applied to a copied executable doesn't work for 
other applications that run on our clusters ( I mean their job scripts would 
also have to be modified similarly).

I think OpenMPI should be able to launch all processes with the same uid and 
gid even when the user has launched the job while using one of his secondary 
groups.  Whether it should be a change request or bug (unexpected, 
counter-intuitive), I'll leave that to your judgment.

Thank you again for the support



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Wednesday, September 14, 2011 5:10 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?

Hi Ed

I finally had some time to dig around a little. I believe we could make this 
work if I fix the multi-word cmd line param issue. Either "newgrp" or "sg" 
should resolve the problem - both are user-level cmds.

Up to you - let me know if you want me to pursue this.
Ralph


On Sep 14, 2011, at 3:31 PM, Blosch, Edwin L wrote:


Thank you -  I did pursue this kind of workaround, and it worked, but you'll be 
happy to know that nothing had to be owned by root.

ASIDE 
Just to remind:  The job script is a shell script that invokes mpirun; the job 
script itself is run with as the correct user, but the group id may be changed 
to whatever the user requests of the job-scheduling system.  I think it may not 
be uncommon to have jobs that request a specific Unix group, for many reasons, 
but in our case the group is an input for the scheduler's prioritization policy.

Outcome: rank 0 runs user:group2 as the user requested, but the launched child 
processes run  user:group1  where group 1 is the user's primary group.  The 
peculiarity of this application is that each of the processes writes a file to 
disk, so the resulting group ownership of rank 0 files is group2, but the group 
ownership of all other ranks' files is group1.  That was the original problem 
I'm trying to work around.
--- END ASIDE

Fortunately for me, there is another peculiarity of this application -- the 
executable gets copied out to /tmp (local space) on each of the hosts to be 
used in the job.  We found this helped prevent some crashes during test phases 
where an executable gets overwritten while in use.  Definitely a special 
behavior.  But as a result of this peculiarity, the mpirun command ends up 
launching the copied executable, and I took advantage of that.

I had the job script do chown user:group2 on the copied executables and then 
chmod 6711, and then I observed that the child processes ran as user:group2, 
same as the rank 0 process, so the files they created had the desired group 
ownership.

I will explore Reuti's guidance as well.

Thank you


From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org] On Behalf Of Randall Svancara
Sent: Wednesday, September 14, 2011 3:07 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?

You could set the setuid bit on the application and chown it to root??  It is 
about as secure as anything else that has been described thus far.  As a system 
admin, I cringe at the thought of anything that would allow something to run as 
someone else,  so there would have to be a pretty good justification for such 
unique use case as yours.

Randall
On Wed, Sep 14, 2011 at 12:56 PM, Reuti 
mailto:re...@staff.uni-marburg.de>> wrote:
Am 14.09.2011 um 19:02 schrieb Blosch, Edwin L:

> Thanks for trying.
>
> Do you feel that this is an impossible request without the assistance of some 
> process running as root, for example, as Reuti mentioned, the daemons of a 
> job scheduler?  Or are you saying it will just not be as straightforward as 
> calling setgid as you had hoped?
>
> Also, do you think there is a way I could make use of the sg command below?  
> Perhaps there is a way to have the rsh/ssh launcher start the application 
> processes with a command like 'sg  '?
What about a half-tight integration (or call it: classic tight integration), 
i.e. no recompilation necessary?

- setup your mpiexec call in the jobscript to use a plain rsh for the remote 
startup (no path given): -mca plm_rsh_agent rsh

- the PE of SGE needs the argument -catch_rsh in start_proc_args and the 
supplied script in $SGE_ROOT/mpi/startmpi.sh

 (SGE will create a symbolic link in $TMPDIR therein [which will be called 
first this way] to the rsh-wrapper in $SGE_ROOT/mpi [pitfall: some applications 
need a -V to be added in the lines woth "qrsh", i.e. "qrsh -inherit -V ...&

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-15 Thread Blosch, Edwin L
I think one has to quote everything after the gid, sg 25000 "ls -l" should work.

I tried something similar to the orted wrapper but also couldn't get it to 
work, but I didn't have time yet to dig into it.  Might be worth trying it with 
both quotes:
#!/bin/sh
exec sg 25000 "orted ${@}"

Also, Ralph, 

I think I skipped one step of logic in asking you yesterday to consider adding 
the capability to pass a multi-word argument to orte_launch_agent.  This itself 
seems like it could have uses, so I like the idea.  But what I'd like to have, 
in the end, is the possibility to launch all processes with the same group or 
gid (for the situation where all hosts share the same passwd/group database), 
without relying on any launcher other than rsh/ssh.  I think it should be 
possible.  The best way to do this might not be the idea of using sg, so 
perhaps this should be discussed first before you spend time adding multi-word 
support for the orte_launch_agent MCA parameter.  But if you do it I will 
certainly test it.

Thanks again


Ed

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Thursday, September 15, 2011 4:37 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
created by mpirun?

Am 15.09.2011 um 01:15 schrieb Blosch, Edwin L:

> I would appreciate trying to fix the multi-word argument to 
> orte_launch_agent, unless you've already fixed it in a later series in which 
> case I could update.  My workaround with the setuid applied to a copied 
> executable doesn't work for other applications that run on our clusters ( I 
> mean their job scripts would also have to be modified similarly).

I played around with `sg` and it looks like having some pitfalls:

$ sg 25000 ls -l

The -l will go to `sg`, even double quotes fo " and ' don't help and there is 
no --:

$ sg 2600 ls "'-l'"

gives an error too.

(I was about to suggest an orted wrapper doing: 

#!/bin/sh
exec sg 25000 orted ${@}"

but it failed.)


> I think OpenMPI should be able to launch all processes with the same uid and 
> gid even when the user has launched the job while using one of his secondary 
> groups.  Whether it should be a change request or bug (unexpected, 
> counter-intuitive), I'll leave that to your judgment.

Yes, but for now the "username" is used, not the "uid" - right? So it could 
have a different uid/gid on the machines, but with the new feature they must be 
the same. Okay, in a cluster they are most likely unique across all machines 
anyway. But just to note as a side effect.

-- Reuti


> Thank you again for the support
>  
>  
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Wednesday, September 14, 2011 5:10 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes 
> created by mpirun?
>  
> Hi Ed
>  
> I finally had some time to dig around a little. I believe we could make this 
> work if I fix the multi-word cmd line param issue. Either "newgrp" or "sg" 
> should resolve the problem - both are user-level cmds.
>  
> Up to you - let me know if you want me to pursue this.
> Ralph
>  
>  
> On Sep 14, 2011, at 3:31 PM, Blosch, Edwin L wrote:
> 
> 
> Thank you -  I did pursue this kind of workaround, and it worked, but you'll 
> be happy to know that nothing had to be owned by root.
>  
> ASIDE 
> Just to remind:  The job script is a shell script that invokes mpirun; the 
> job script itself is run with as the correct user, but the group id may be 
> changed to whatever the user requests of the job-scheduling system.  I think 
> it may not be uncommon to have jobs that request a specific Unix group, for 
> many reasons, but in our case the group is an input for the scheduler's 
> prioritization policy.
>  
> Outcome: rank 0 runs user:group2 as the user requested, but the launched 
> child processes run  user:group1  where group 1 is the user's primary group.  
> The peculiarity of this application is that each of the processes writes a 
> file to disk, so the resulting group ownership of rank 0 files is group2, but 
> the group ownership of all other ranks' files is group1.  That was the 
> original problem I'm trying to work around.
> --- END ASIDE
>  
> Fortunately for me, there is another peculiarity of this application -- the 
> executable gets copied out to /tmp (local space) on each of the hosts to be 
> used in the job.  We found this helped prevent some crashes during test 
> phases where an executable gets overwritten while in use.  Definitely a 
> special behavio

[OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-19 Thread Blosch, Edwin L
I am observing differences in floating-point results from an application 
program that appear to be related to whether I link with OpenMPI 1.4.3 or 
MVAPICH 1.2.0.  Both packages were built with the same installation of Intel 
11.1, as well as the application program; identical flags passed to the 
compiler in each case.

I've tracked down some differences in a compute-only routine where I've printed 
out the inputs to the routine (to 18 digits) ; the inputs are identical.  The 
output numbers are different in the 16th place (perhaps a few in the 15th 
place).  These differences only show up for optimized code, not for -O0.

My assumption is that some optimized math intrinsic is being replaced 
dynamically, but I do not know how to confirm this.  Anyone have guidance to 
offer? Or similar experience?

Thanks very much

Ed

Just for what it's worth, here's the output of ldd:

% ldd application_mvapich
linux-vdso.so.1 =>  (0x7fffe3746000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x2b5b45fc1000)
libmpich.so.1.0 => 
/usr/mpi/intel/mvapich-1.2.0/lib/shared/libmpich.so.1.0 (0x2b5b462cd000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x2b5b465ed000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x2b5b467fc000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b5b46a04000)
librt.so.1 => /lib64/librt.so.1 (0x2b5b46c21000)
libm.so.6 => /lib64/libm.so.6 (0x2b5b46e2a000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b5b47081000)
libc.so.6 => /lib64/libc.so.6 (0x2b5b47285000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b5b475e3000)
/lib64/ld-linux-x86-64.so.2 (0x2b5b45da)
libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so 
(0x2b5b477fb000)
libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so 
(0x2b5b47b8f000)
libintlc.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 
(0x2b5b47da5000)

% ldd application_openmpi
   linux-vdso.so.1 =>  (0x7fff6ebff000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x2b6e7c17d000)
libmpi_f90.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f90.so.0 
(0x2b6e7c489000)
libmpi_f77.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f77.so.0 
(0x2b6e7c68d000)
libmpi.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi.so.0 
(0x2b6e7c8ca000)
libopen-rte.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-rte.so.0 
(0x2b6e7cb9c000)
libopen-pal.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-pal.so.0 
(0x2b6e7ce01000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b6e7d077000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x2b6e7d27c000)
libutil.so.1 => /lib64/libutil.so.1 (0x2b6e7d494000)
libm.so.6 => /lib64/libm.so.6 (0x2b6e7d697000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b6e7d8ee000)
libc.so.6 => /lib64/libc.so.6 (0x2b6e7db0b000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b6e7de69000)
/lib64/ld-linux-x86-64.so.2 (0x2b6e7bf5c000)
libifport.so.5 => 
/opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x2b6e7e081000)
libifcoremt.so.5 => 
/opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5 (0x2b6e7e1ba000)
libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so 
(0x2b6e7e45f000)
libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so 
(0x2b6e7e7f4000)
libintlc.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 
(0x2b6e7ea0a000)



Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Thank you all for the replies.

Certainly optimization flags can be useful to address differences between 
compilers, etc. And differences in MPI_ALLREDUCE are appreciated as possible.  
But I don't think either is quite relevant because:

- It was exact same compiler, with identical compilation flags.  So whatever 
optimizations are applied, we should have the same instructions; 
- I'm looking at inputs and outputs to a compute-only routine - there are no 
MPI calls within the routine

Again, most numbers going into the routine were checked, and there were no 
differences in the numbers out to 18 digits (i.e. beyond the precision of the 
FP representation).  Yet, coming out of the routine, results differ.  I am 
quite sure that no MPI routines were actually involved in calculations, and 
that the compiler options given, were also the same.

It appears to be a side effect of linkage that is able to change a compute-only 
routine's answers.

I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of 
corruption may be going on.

I also could be mistaken about the inputs to the routine, i.e. they are not 
truly identical as I have presumed and (partially) checked.

It is interesting that the whole of the calculation runs fine with MVAPICH and 
blows up with OpenMPI.

Another diagnostic step I am taking: see if observation can be repeated with a 
newer version of OpenMPI (currently using 1.4.3)

Ed


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Tuesday, September 20, 2011 7:25 AM
To: tpri...@computer.org; Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect 
floating-point results?

Am 20.09.2011 um 13:52 schrieb Tim Prince:

> On 9/20/2011 7:25 AM, Reuti wrote:
>> Hi,
>> 
>> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
>> 
>>> I am observing differences in floating-point results from an application 
>>> program that appear to be related to whether I link with OpenMPI 1.4.3 or 
>>> MVAPICH 1.2.0.  Both packages were built with the same installation of 
>>> Intel 11.1, as well as the application program; identical flags passed to 
>>> the compiler in each case.
>>> 
>>> I've tracked down some differences in a compute-only routine where I've 
>>> printed out the inputs to the routine (to 18 digits) ; the inputs are 
>>> identical.  The output numbers are different in the 16th place (perhaps a 
>>> few in the 15th place).  These differences only show up for optimized code, 
>>> not for -O0.
>>> 
>>> My assumption is that some optimized math intrinsic is being replaced 
>>> dynamically, but I do not know how to confirm this.  Anyone have guidance 
>>> to offer? Or similar experience?
>> 
>> yes, I face it often but always at a magnitude where it's not of any concern 
>> (and not related to any MPI). Due to the limited precision in computers, a 
>> simple reordering of operation (although being equivalent in a mathematical 
>> sense) can lead to different results. Removing the anomalies with -O0 could 
>> proof that.
>> 
>> The other point I heard especially for the x86 instruction set is, that the 
>> internal FPU has still 80 bits, while the presentation in memory is only 64 
>> bit. Hence when all can be done in the registers, the result can be 
>> different compared to the case when some interim results need to be stored 
>> to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it 
>> to stay always in 64 bit, and a similar one for Intel is -mp (now 
>> -fltconsistency) and -mp1.
>> 
> Diagnostics below indicate that ifort 11.1 64-bit is in use.  The options 
> aren't the same as Reuti's "now" version (a 32-bit compiler which hasn't been 
> supported for 3 years or more?).

In the 11.1 documentation they are also still listed:

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm

I read it in the way, that -mp is deprecated syntax (therefore listed under 
"Alternate Options"), but -fltconsistency is still a valid and supported option.

-- Reuti


> With ifort 10.1 and more recent, you would set at least
> -assume protect_parens -prec-div -prec-sqrt
> if you are interested in numerical consistency.  If you don't want 
> auto-vectorization of sum reductions, you would use instead
> -fp-model source -ftz
> (ftz sets underflow mode back to abrupt, while "source" sets gradual).
> It may be possible to expose 80-bit x87 by setting the ancient -mp option, 
> but such a course can't be recommended without additional cautions.
> 
> Quot

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Thank you for this explanation.  I will assume that my problem here is some 
kind of memory corruption.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:

> It appears to be a side effect of linkage that is able to change a 
> compute-only routine's answers.
>
> I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind 
> of corruption may be going on.
>

Those intrinsics have direct instruction set translations which 
shouldn't vary from -O1 on up nor with linkage options nor be affected 
by MPI or insertion of WRITEs.

-- 
Tim Prince
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
t 
floating-point results?



I've not been following closely.  How do you know you're using the

identical compilation flags?  Are you saying you specify the same flags

to "mpicc" (or whatever) or are you confirming that the back-end

compiler is seeing the same flags?  The MPI compiler wrapper (mpicc, et

al.) can add flags.  E.g., as I remember it, "mpicc" with no flags means

no optimization with OMPI but with optimization for MVAPICH.



On 9/20/2011 7:50 AM, Blosch, Edwin L wrote:

> - It was exact same compiler, with identical compilation flags.

___

users mailing list

us...@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-20 Thread Blosch, Edwin L
I'm having trouble building 1.4.3 using PGI 10.9.  I searched the list archives 
briefly but I didn't stumble across anything that looked like the same problem, 
so I thought I'd ask if an expert might recognize the nature of the problem 
here.

The configure command:

./configure --prefix=/release/openmpi-pgi --without-tm --without-sge 
--enable-mpirun-prefix-by-default --enable-contrib-no-build=vt 
--enable-mca-no-build=maffinity --disable-per-user-config-files 
--disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared 
--with-wrapper-cflags=-Msignextend --with-wrapper-cxxflags=-Msignextend 
CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
CC=/appserv/pgi/linux86-64/10.9/bin/pgcc 'CFLAGS=  -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend' 'CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
-Msignextend' F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 'FFLAGS=-D_GNU_SOURCE  
-O2 -Mcache_align -Minfo -Munixlogical' 
FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 'FCFLAGS=-D_GNU_SOURCE  -O2 
-Mcache_align -Minfo -Munixlogical' 'LDFLAGS= -Bstatic_pgi'

The place where the build eventually dies:

/bin/sh ../../../libtool --tag=CXX   --mode=link 
/appserv/pgi/linux86-64/10.9/bin/pgCC  -DNDEBUG   -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend  -version-info 0:1:0 -export-dynamic  -Bstatic_pgi  
-o libmpi_cxx.la -rpath /release/cfd/openmpi-pgi/lib mpicxx.lo intercepts.lo 
comm.lo datatype.lo win.lo file.lo ../../../ompi/libmpi.la -lnsl -lutil  
-lpthread
libtool: link: tpldir=Template.dir
libtool: link:  rm -rf Template.dir
libtool: link:  /appserv/pgi/linux86-64/10.9/bin/pgCC --prelink_objects 
--instantiation_dir Template.dir   mpicxx.o intercepts.o comm.o datatype.o 
win.o file.o
pgCC-Warning-prelink_objects switch is deprecated
pgCC-Warning-instantiation_dir switch is deprecated
/usr/lib64/crt1.o: In function `_start':
/usr/src/packages/BUILD/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:109: 
undefined reference to `main'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x49): undefined reference to `ompi_mpi_errors_are_fatal'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x62): undefined reference to `ompi_mpi_errors_return'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x7b): undefined reference to `ompi_mpi_errors_throw_exceptions'


Re: [OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-20 Thread Blosch, Edwin L
Follow-up #1:  I tried using the autogen.sh script referenced here
 https://svn.open-mpi.org/trac/ompi/changeset/22274
but that did not resolve the build problem.

Follow-up #2:  configuring with --disable-mpi-cxx does allow the compilation to 
succeed.  Perhaps that's obvious, but I had to check.



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Tuesday, September 20, 2011 12:17 PM
To: Open MPI Users
Subject: EXTERNAL: [OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

I'm having trouble building 1.4.3 using PGI 10.9.  I searched the list archives 
briefly but I didn't stumble across anything that looked like the same problem, 
so I thought I'd ask if an expert might recognize the nature of the problem 
here.

The configure command:

./configure --prefix=/release/openmpi-pgi --without-tm --without-sge 
--enable-mpirun-prefix-by-default --enable-contrib-no-build=vt 
--enable-mca-no-build=maffinity --disable-per-user-config-files 
--disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared 
--with-wrapper-cflags=-Msignextend --with-wrapper-cxxflags=-Msignextend 
CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
CC=/appserv/pgi/linux86-64/10.9/bin/pgcc 'CFLAGS=  -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend' 'CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
-Msignextend' F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 'FFLAGS=-D_GNU_SOURCE  
-O2 -Mcache_align -Minfo -Munixlogical' 
FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 'FCFLAGS=-D_GNU_SOURCE  -O2 
-Mcache_align -Minfo -Munixlogical' 'LDFLAGS= -Bstatic_pgi'

The place where the build eventually dies:

/bin/sh ../../../libtool --tag=CXX   --mode=link 
/appserv/pgi/linux86-64/10.9/bin/pgCC  -DNDEBUG   -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend  -version-info 0:1:0 -export-dynamic  -Bstatic_pgi  
-o libmpi_cxx.la -rpath /release/cfd/openmpi-pgi/lib mpicxx.lo intercepts.lo 
comm.lo datatype.lo win.lo file.lo ../../../ompi/libmpi.la -lnsl -lutil  
-lpthread
libtool: link: tpldir=Template.dir
libtool: link:  rm -rf Template.dir
libtool: link:  /appserv/pgi/linux86-64/10.9/bin/pgCC --prelink_objects 
--instantiation_dir Template.dir   mpicxx.o intercepts.o comm.o datatype.o 
win.o file.o
pgCC-Warning-prelink_objects switch is deprecated
pgCC-Warning-instantiation_dir switch is deprecated
/usr/lib64/crt1.o: In function `_start':
/usr/src/packages/BUILD/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:109: 
undefined reference to `main'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x49): undefined reference to `ompi_mpi_errors_are_fatal'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x62): undefined reference to `ompi_mpi_errors_return'
mpicxx.o: In function `__sti___9_mpicxx_cc_a6befbec':
(.text+0x7b): undefined reference to `ompi_mpi_errors_throw_exceptions'
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
Follow-up to a mislabeled thread:  "How could OpenMPI (or MVAPICH) affect 
floating-point results?"

I have found a solution to my problem, but I would like to understand the 
underlying issue better.

To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked 
with OpenMPI fails.  The earliest symptom I could see was some strange 
difference in numerical values of quantities that should be unaffected by MPI 
calls.  Tim's advice guided me to assume memory corruption. Eugene's advice 
guided me to explore the detailed differences in compilation.  

I observed that the MVAPICH mpif90 wrapper adds -fPIC.

I tried adding -fPIC and -mcmodel=medium to the compilation of the 
OpenMPI-linked executable.  Now it works fine. I haven't tried without 
-mcmodel=medium, but my guess is -fPIC did the trick.

Does anyone know why compiling with -fPIC has helped?  Does it suggest an 
application problem or an OpenMPI problem?

To note: This is an Infiniband-based cluster.  The application does pretty 
basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, 
isend, irecv, waitall.  There is one task that uses iprobe with MPI_ANY_TAG, 
but this task is only involved in certain cases (including this one). 
Conversely, cases that do not call iprobe have not yet been observed to crash.  
I am deducing that this function is the problem.

Thanks,

Ed

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Tuesday, September 20, 2011 11:46 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

Thank you for this explanation.  I will assume that my problem here is some 
kind of memory corruption.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:

> It appears to be a side effect of linkage that is able to change a 
> compute-only routine's answers.
>
> I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind 
> of corruption may be going on.
>

Those intrinsics have direct instruction set translations which 
shouldn't vary from -O1 on up nor with linkage options nor be affected 
by MPI or insertion of WRITEs.

-- 
Tim Prince
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
Thanks Tim.

I'm compiling source units and linking them into an executable.  Or perhaps you 
are talking about how OpenMPI itself is built?  Excuse my ignorance...

The source code units are compiled like this:
/usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad 
-xHost -falign-functions -fpconstant -O2 -I. 
-I/usr/mpi/intel/openmpi-1.4.3/include -c ../code/src/main/main.f90

The link step is like this:
/usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad 
-xHost -falign-functions -fpconstant -static-intel -o ../bin/  -lstdc++

OpenMPI itself was configured like this:
./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-gm --without-elan 
--without-mx --without-slurm --without-loadleveler 
--enable-mpirun-prefix-by-default --enable-contrib-no-build=vt 
--enable-mca-no-build=maffinity --disable-per-user-config-files 
--disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared 
CXX=/appserv/intel/Compiler/11.1/072/bin/intel64/icpc 
CC=/appserv/intel/Compiler/11.1/072/bin/intel64/icc 'CFLAGS=  -O2' 'CXXFLAGS=  
-O2' F77=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 
'FFLAGS=-D_GNU_SOURCE -traceback  -O2' 
FC=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FCFLAGS=-D_GNU_SOURCE 
-traceback  -O2' 'LDFLAGS= -static-intel'

ldd output on the final executable gives: 
linux-vdso.so.1 =>  (0x7fffb77e7000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x2b2e2b652000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x2b2e2b95e000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b2e2bb6d000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x2b2e2bd72000)
libutil.so.1 => /lib64/libutil.so.1 (0x2b2e2bf8a000)
libm.so.6 => /lib64/libm.so.6 (0x2b2e2c18d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b2e2c3e4000)
libc.so.6 => /lib64/libc.so.6 (0x2b2e2c60)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b2e2c959000)
/lib64/ld-linux-x86-64.so.2 (0x2b2e2b433000)

Do you see anything that suggests I should have been compiling the application 
and/or OpenMPI with -fPIC?

Thanks

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Tim Prince
Sent: Wednesday, September 21, 2011 10:53 AM
To: us...@open-mpi.org
Subject: EXTERNAL: Re: [OMPI users] Question about compilng with fPIC

On 9/21/2011 11:44 AM, Blosch, Edwin L wrote:
> Follow-up to a mislabeled thread:  "How could OpenMPI (or MVAPICH) affect 
> floating-point results?"
>
> I have found a solution to my problem, but I would like to understand the 
> underlying issue better.
>
> To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked 
> with OpenMPI fails.  The earliest symptom I could see was some strange 
> difference in numerical values of quantities that should be unaffected by MPI 
> calls.  Tim's advice guided me to assume memory corruption. Eugene's advice 
> guided me to explore the detailed differences in compilation.
>
> I observed that the MVAPICH mpif90 wrapper adds -fPIC.
>
> I tried adding -fPIC and -mcmodel=medium to the compilation of the 
> OpenMPI-linked executable.  Now it works fine. I haven't tried without 
> -mcmodel=medium, but my guess is -fPIC did the trick.
>
> Does anyone know why compiling with -fPIC has helped?  Does it suggest an 
> application problem or an OpenMPI problem?
>
> To note: This is an Infiniband-based cluster.  The application does pretty 
> basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, 
> isend, irecv, waitall.  There is one task that uses iprobe with MPI_ANY_TAG, 
> but this task is only involved in certain cases (including this one). 
> Conversely, cases that do not call iprobe have not yet been observed to 
> crash.  I am deducing that this function is the problem.
>

If you are making a .so, the included .o files should be built with 
-fPIC or similar. Ideally, the configure and build tools would enforce this.

-- 
Tim Prince
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
Follow-up:  I misread the coding, so now I think mpi_iprobe is probably not 
being used for this case.  I'll have to pin the blame somewhere else.  -fPIC 
definitely fixes the problem, as I tried removing -mcmodel=medium and it still 
worked.   Our usual communication pattern is mpi_irecv, mpi_isend, mpi_waitall; 
perhaps there is something unhealthy in the semantics there.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Wednesday, September 21, 2011 10:44 AM
To: Open MPI Users
Subject: EXTERNAL: [OMPI users] Question about compilng with fPIC

Follow-up to a mislabeled thread:  "How could OpenMPI (or MVAPICH) affect 
floating-point results?"

I have found a solution to my problem, but I would like to understand the 
underlying issue better.

To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked 
with OpenMPI fails.  The earliest symptom I could see was some strange 
difference in numerical values of quantities that should be unaffected by MPI 
calls.  Tim's advice guided me to assume memory corruption. Eugene's advice 
guided me to explore the detailed differences in compilation.  

I observed that the MVAPICH mpif90 wrapper adds -fPIC.

I tried adding -fPIC and -mcmodel=medium to the compilation of the 
OpenMPI-linked executable.  Now it works fine. I haven't tried without 
-mcmodel=medium, but my guess is -fPIC did the trick.

Does anyone know why compiling with -fPIC has helped?  Does it suggest an 
application problem or an OpenMPI problem?

To note: This is an Infiniband-based cluster.  The application does pretty 
basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, 
isend, irecv, waitall.  There is one task that uses iprobe with MPI_ANY_TAG, 
but this task is only involved in certain cases (including this one). 
Conversely, cases that do not call iprobe have not yet been observed to crash.  
I am deducing that this function is the problem.

Thanks,

Ed

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Tuesday, September 20, 2011 11:46 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

Thank you for this explanation.  I will assume that my problem here is some 
kind of memory corruption.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:

> It appears to be a side effect of linkage that is able to change a 
> compute-only routine's answers.
>
> I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind 
> of corruption may be going on.
>

Those intrinsics have direct instruction set translations which 
shouldn't vary from -O1 on up nor with linkage options nor be affected 
by MPI or insertion of WRITEs.

-- 
Tim Prince
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-26 Thread Blosch, Edwin L
Actually I can download OpenMPI 1.5.4, 1.4.4rc3 or 1.4.3 - and ALL of them 
build just fine.



Apparently what isn't working is the version of 1.4.3 that I have downloaded 
and copied from place to place, i.e. timestamps on files may have changed 
(otherwise the files are the same).



It seems to be a configure behavior, but I don't understand enough to figure it 
out.  Perhaps you can advise me.



Key differences that I noted were:



(1) in config.log, the configure command that gets rerun appears to add some 
arguments in the bad build:



Bad build includes the underlined 3 options at the end below:

  $ ./configure --prefix=/release/cfd/openmpi-pgi --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-gm --without-elan 
--without-slurm --without-loadleveler --enable-mpirun-prefix-by-default 
--enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
--disable-per-user-config-files --disable-io-romio --enable-static 
--disable-shared --with-wrapper-cflags=-Msignextend 
--with-wrapper-cxxflags=-Msignextend CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
CC=/appserv/pgi/linux86-64/10.9/bin/pgcc CFLAGS=  -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
-Msignextend F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 FFLAGS=-D_GNU_SOURCE  
-O2 -Mcache_align -Minfo -Munixlogical 
FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 FCFLAGS=-D_GNU_SOURCE  -O2 
-Mcache_align -Minfo -Munixlogical LDFLAGS= -Bstatic_pgi 
--enable-ltdl-convenience --no-create --no-recursion



Good build:

  $ ./configure --prefix=/release/cfd/openmpi-pgi --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-gm --without-elan 
--without-slurm --without-loadleveler --enable-mpirun-prefix-by-default 
--enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
--disable-per-user-config-files --disable-io-romio --enable-static 
--disable-shared --with-wrapper-cflags=-Msignextend 
--with-wrapper-cxxflags=-Msignextend CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
CC=/appserv/pgi/linux86-64/10.9/bin/pgcc CFLAGS=  -O2 -Mcache_align -Minfo 
-Msignextend -Msignextend CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
-Msignextend F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 FFLAGS=-D_GNU_SOURCE  
-O2 -Mcache_align -Minfo -Munixlogical 
FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 FCFLAGS=-D_GNU_SOURCE  -O2 
-Mcache_align -Minfo -Munixlogical LDFLAGS= -Bstatic_pgi



(2) in configure itself, the version number is missing in the bad build:



Bad build:

#! /bin/sh

# Guess values for system-dependent variables and create Makefiles.

# Generated by GNU Autoconf 2.63 for Open MPI .



Good build:

#! /bin/sh

# Guess values for system-dependent variables and create Makefiles.

# Generated by GNU Autoconf 2.63 for Open MPI 1.4.3.

#



(3) also in configure, the good build has picked up availability of pgfortran 
but the bad one does not:



- Bad build:  for ac_prog in g77 xlf f77 frt pgf77 cf77 fort77 fl32 af77 xlf90 
f90 pgf90 pghpf epcf90 gfortran g95 xlf95 f95 fort ifort ifc efc pgf95 lf95 ftn



- Good build: for ac_prog in g77 xlf f77 frt pgf77 cf77 fort77 fl32 af77 xlf90 
f90 pgf90 pghpf epcf90 gfortran g95 xlf95 f95 fort ifort ifc efc pgfortran 
pgf95 lf95 ftn





If you have any idea what could cause these differences, I'm all ears...



Thanks



Ed





-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Saturday, September 24, 2011 8:23 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Trouble compiling 1.4.3 with PGI 10.9 
compilers



Just out of curiosity, does Open MPI 1.5.4 build properly?



We've seen problems with the PGI compiler suite before -- it *does* look like a 
problem with libtool-building issues; e.g., a switch is too old or is missing 
or something.  Meaning: it looks like PGI thinks it's trying to build an 
application, not a library.  This is usually bit rot in libtool (i.e., PGI may 
have changed their options, but we're using an older Libtool in the 1.4.x 
series that doesn't know about this option).



I do note that we fixed some libtool issues in the 1.4.4 tarball; could you try 
the 1.4.4rc and see if that fixes the issue?  If not, we might have missed some 
patches to bring over to the v1.4 branch.



http://www.open-mpi.org/software/ompi/v1.4/







On Sep 20, 2011, at 1:16 PM, Blosch, Edwin L wrote:



> I'm having trouble building 1.4.3 using PGI 10.9.  I searched the list 
> archives briefly but I didn't stumble across anything that looked like the 
> same problem, so I thought I'd ask if an expert might recognize the nature of 
> the problem here.

>

> The configure command:

>

> ./configure --prefix=/release/openmpi-pgi --without-tm --without-sge 
> --enable-mpirun-prefix-by-default --enable-contrib-no-build=vt 
> --enable-mca-no-build=maf

Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-27 Thread Blosch, Edwin L
Yes, I've been copying around the source tree.  That was the problem. If I am 
careful to preserve the original timestamps, there are no problems.

Thanks

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Monday, September 26, 2011 6:16 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 
compilers

On Sep 26, 2011, at 6:53 PM, Blosch, Edwin L wrote:

> Actually I can download OpenMPI 1.5.4, 1.4.4rc3 or 1.4.3 - and ALL of them 
> build just fine.
>  
> Apparently what isn't working is the version of 1.4.3 that I have downloaded 
> and copied from place to place, i.e. timestamps on files may have changed 
> (otherwise the files are the same).

Are you copying the source tree around, like "cp -r my_orig_ompi_tree 
my_new_ompi_tree"?

If so, you might be running into timestamp issues.  Automake is actually fairly 
sensitive to timestamps; it makes tarballs in fairly specific ordering so that 
the timestamps will be correct when you un-tar them.

You might be able to get away with something like "cp -rp my_orig my_new".  But 
I find it usually just easier to just un-tar the original tarball.

>  
> It seems to be a configure behavior, but I don't understand enough to figure 
> it out.  Perhaps you can advise me.
>  
> Key differences that I noted were:
>  
> (1) in config.log, the configure command that gets rerun appears to add some 
> arguments in the bad build:
>  
> Bad build includes the underlined 3 options at the end below:
>   $ ./configure --prefix=/release/cfd/openmpi-pgi --without-tm --without-sge 
> --without-lsf --without-psm --without-portals --without-gm --without-elan 
> --without-slurm --without-loadleveler --enable-mpirun-prefix-by-default 
> --enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
> --disable-per-user-config-files --disable-io-romio --enable-static 
> --disable-shared --with-wrapper-cflags=-Msignextend 
> --with-wrapper-cxxflags=-Msignextend 
> CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
> CC=/appserv/pgi/linux86-64/10.9/bin/pgcc CFLAGS=  -O2 -Mcache_align -Minfo 
> -Msignextend -Msignextend CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
> -Msignextend F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 FFLAGS=-D_GNU_SOURCE  
> -O2 -Mcache_align -Minfo -Munixlogical 
> FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 FCFLAGS=-D_GNU_SOURCE  -O2 
> -Mcache_align -Minfo -Munixlogical LDFLAGS= -Bstatic_pgi 
> --enable-ltdl-convenience --no-create --no-recursion
>  
> Good build:
>   $ ./configure --prefix=/release/cfd/openmpi-pgi --without-tm --without-sge 
> --without-lsf --without-psm --without-portals --without-gm --without-elan 
> --without-slurm --without-loadleveler --enable-mpirun-prefix-by-default 
> --enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
> --disable-per-user-config-files --disable-io-romio --enable-static 
> --disable-shared --with-wrapper-cflags=-Msignextend 
> --with-wrapper-cxxflags=-Msignextend 
> CXX=/appserv/pgi/linux86-64/10.9/bin/pgCC 
> CC=/appserv/pgi/linux86-64/10.9/bin/pgcc CFLAGS=  -O2 -Mcache_align -Minfo 
> -Msignextend -Msignextend CXXFLAGS=  -O2 -Mcache_align -Minfo -Msignextend 
> -Msignextend F77=/appserv/pgi/linux86-64/10.9/bin/pgf95 FFLAGS=-D_GNU_SOURCE  
> -O2 -Mcache_align -Minfo -Munixlogical 
> FC=/appserv/pgi/linux86-64/10.9/bin/pgf95 FCFLAGS=-D_GNU_SOURCE  -O2 
> -Mcache_align -Minfo -Munixlogical LDFLAGS= -Bstatic_pgi
>  
> (2) in configure itself, the version number is missing in the bad build:
>  
> Bad build:
> #! /bin/sh
> # Guess values for system-dependent variables and create Makefiles.
> # Generated by GNU Autoconf 2.63 for Open MPI .
>  
> Good build:
> #! /bin/sh
> # Guess values for system-dependent variables and create Makefiles.
> # Generated by GNU Autoconf 2.63 for Open MPI 1.4.3.
> #
>  
> (3) also in configure, the good build has picked up availability of pgfortran 
> but the bad one does not:
>  
> - Bad build:  for ac_prog in g77 xlf f77 frt pgf77 cf77 fort77 fl32 af77 
> xlf90 f90 pgf90 pghpf epcf90 gfortran g95 xlf95 f95 fort ifort ifc efc pgf95 
> lf95 ftn
>  
> - Good build: for ac_prog in g77 xlf f77 frt pgf77 cf77 fort77 fl32 af77 
> xlf90 f90 pgf90 pghpf epcf90 gfortran g95 xlf95 f95 fort ifort ifc 
> efcpgfortran pgf95 lf95 ftn
>  
>  
> If you have any idea what could cause these differences, I'm all ears...
>  
> Thanks
>  
> Ed
>  
>  
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Jeff Squyres
> Sent: Saturday, September 24, 2011 8:23 AM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] Trouble compil

[OMPI users] Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-28 Thread Blosch, Edwin L
I am getting some undefined references in building OpenMPI 1.5.4 and I would 
like to know how to work around it.

The errors look like this:

/scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): 
In function `hwloc_linux_alloc_membind':
topology-linux.c:(.text+0x1da): undefined reference to `mbind'
topology-linux.c:(.text+0x213): undefined reference to `mbind'
/scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): 
In function `hwloc_linux_set_area_membind':
topology-linux.c:(.text+0x414): undefined reference to `mbind'
topology-linux.c:(.text+0x46c): undefined reference to `mbind'
/scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): 
In function `hwloc_linux_get_thisthread_membind':
topology-linux.c:(.text+0x4ff): undefined reference to `get_mempolicy'
topology-linux.c:(.text+0x5ff): undefined reference to `get_mempolicy'
/scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): 
In function `hwloc_linux_set_thisthread_membind':
topology-linux.c:(.text+0x7b5): undefined reference to `migrate_pages'
topology-linux.c:(.text+0x7e9): undefined reference to `set_mempolicy'
topology-linux.c:(.text+0x831): undefined reference to `set_mempolicy'
make: *** [main] Error 1

Some  configure output that is probably relevant:

checking numaif.h usability... yes
checking numaif.h presence... yes
checking for numaif.h... yes
checking for set_mempolicy in -lnuma... yes
checking for mbind in -lnuma... yes
checking for migrate_pages in -lnuma... yes

The FAQ says that I should have to give -with-libnuma explicitly, but I did not 
do that.   Is there a problem with configure? Or the FAQ?  Or perhaps the 
system has a configuration peculiarity?

On another system, the configure output is different, and there are no 
unresolved references:

checking numaif.h usability... no
checking numaif.h presence... no
checking for numaif.h... no

What is the configure option that will make the unresolved references go away?

Thanks,

Ed


Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-28 Thread Blosch, Edwin L
Jeff,  

I've tried it now adding --without-libnuma.  Actually that did NOT fix the 
problem, so I can send you the full output from configure if you want, to 
understand why this "hwloc" function is trying to use a function which appears 
to be unavailable.  The answers to some of your questions:

The configure command was this:

./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-elan --without-slurm 
--without-loadleveler --without-libnuma --enable-mpirun-prefix-by-default 
--enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
--disable-per-user-config-files --disable-io-romio --enable-static 
--disable-shared --without-openib CXX=/appserv/intel/cce/10.1.021/bin/icpc 
CC=/appserv/intel/cce/10.1.021/bin/icc 'CFLAGS=  -O2' 'CXXFLAGS=  -O2' 
F77=/appserv/intel/fce/10.1.021/bin/ifort 'FFLAGS=-D_GNU_SOURCE -traceback  
-O2' FC=/appserv/intel/fce/10.1.021/bin/ifort 'FCFLAGS=-D_GNU_SOURCE -traceback 
 -O2' 'LDFLAGS= -static-intel

The error messages upon linking the application are unchanged:  
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_alloc_membind':
> topology-linux.c:(.text+0x1da): undefined reference to `mbind'

Re: NUMA:  It appears there is a /usr/lib64/libnuma.so but no static version. 
There is /usr/include/numa.h and /usr/include/numaif.h.

I don't understand about make V=1. What tree? Somewhere in the OpenMPI build, 
or in the application compilation itself? Is "V=1" something in the OpenMPI 
makefile structure?

Thanks,

Ed

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Wednesday, September 28, 2011 11:05 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Unresolved reference 'mbind' and 
'get_mempolicy'

Yowza; that sounds like a configury bug.  :-(

What line were you using to configure Open MPI?  Do you have libnuma installed? 
 If so, do you have the .h and .so files?  Do you have the .a file?

Can you send the last few lines of output from a failed "make V=1" in that 
tree?  (it'll show us the exact commands used to compile/link, etc.)


On Sep 28, 2011, at 11:55 AM, Blosch, Edwin L wrote:

> I am getting some undefined references in building OpenMPI 1.5.4 and I would 
> like to know how to work around it.
>  
> The errors look like this:
>  
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_alloc_membind':
> topology-linux.c:(.text+0x1da): undefined reference to `mbind'
> topology-linux.c:(.text+0x213): undefined reference to `mbind'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_set_area_membind':
> topology-linux.c:(.text+0x414): undefined reference to `mbind'
> topology-linux.c :(.text+0x46c): undefined reference to `mbind'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_get_thisthread_membind':
> topology-linux.c:(.text+0x4ff): undefined reference to `get_mempolicy'
> topology-linux.c:(.text+0x5ff): undefined reference to `get_mempolicy'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_set_thisthread_membind':
> topology-linux.c:(.text+0x7b5): undefined reference to `migrate_pages'
> topology-linux.c:(.text+0x7e9): undefined reference to `set_mempolicy'
> topology-linux.c:(.text+0x831): undefined reference to `set_mempolicy'
> make: *** [main] Error 1
>  
> S ome  configure output that is probably relevant:
>  
> checking numaif.h usability... yes
> checking numaif.h presence... yes
> checking for numaif.h... yes
> checking for set_mempolicy in -lnuma... yes
> checking for mbind in -lnuma... yes
> checking for migrate_pages in -lnuma... yes
>  
> The FAQ says that I should have to give -with-libnuma explicitly, but I did 
> not do that.   Is there a problem with configure? Or the FAQ?  Or perhaps the 
> system has a configuration peculiarity?
>  
> On another system, the configure output is different, and there are no 
> unresolved references:
>  
> checking numaif.h usability... no
> checking numaif.h presence... no
> checking for numaif.h... no
>  
> What is the configure option that will make the unresolved references go away?
>  
> Thanks,
>  
> Ed
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-29 Thread Blosch, Edwin L
Jeff, Bruce, Reuti, all:

If I add --without-hwloc in addition to --without-libnuma, then it builds.  Is 
that a reasonable thing to do?  Is there a better workaround?  This 'hwloc' 
module looks like it might be important.

For what it's worth, if there's something wrong with my configure line, let me 
know what to improve. Otherwise, as weird as "--enable-mca-no-build=maffinity 
--disable-io-romio --enable-static --disable-shared" may look, I am not trying 
to build fully static binaries. I have unavoidable need to build OpenMPI on 
certain machines and then transfer the executables to other machines that are 
compatable but not identical, and over the years these are the minimal set of 
configure flags necessary to make that possible. I may revisit these choices at 
some point, but if they are supposed to work, then I'd rather just keep using 
them.

Thanks again,

Ed


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Wednesday, September 28, 2011 4:02 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 
'get_mempolicy'

Jeff,  

I've tried it now adding --without-libnuma.  Actually that did NOT fix the 
problem, so I can send you the full output from configure if you want, to 
understand why this "hwloc" function is trying to use a function which appears 
to be unavailable.  The answers to some of your questions:

The configure command was this:

./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge 
--without-lsf --without-psm --without-portals --without-elan --without-slurm 
--without-loadleveler --without-libnuma --enable-mpirun-prefix-by-default 
--enable-contrib-no-build=vt --enable-mca-no-build=maffinity 
--disable-per-user-config-files --disable-io-romio --enable-static 
--disable-shared --without-openib CXX=/appserv/intel/cce/10.1.021/bin/icpc 
CC=/appserv/intel/cce/10.1.021/bin/icc 'CFLAGS=  -O2' 'CXXFLAGS=  -O2' 
F77=/appserv/intel/fce/10.1.021/bin/ifort 'FFLAGS=-D_GNU_SOURCE -traceback  
-O2' FC=/appserv/intel/fce/10.1.021/bin/ifort 'FCFLAGS=-D_GNU_SOURCE -traceback 
 -O2' 'LDFLAGS= -static-intel

The error messages upon linking the application are unchanged:  
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_alloc_membind':
> topology-linux.c:(.text+0x1da): undefined reference to `mbind'

Re: NUMA:  It appears there is a /usr/lib64/libnuma.so but no static version. 
There is /usr/include/numa.h and /usr/include/numaif.h.

I don't understand about make V=1. What tree? Somewhere in the OpenMPI build, 
or in the application compilation itself? Is "V=1" something in the OpenMPI 
makefile structure?

Thanks,

Ed

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Wednesday, September 28, 2011 11:05 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Unresolved reference 'mbind' and 
'get_mempolicy'

Yowza; that sounds like a configury bug.  :-(

What line were you using to configure Open MPI?  Do you have libnuma installed? 
 If so, do you have the .h and .so files?  Do you have the .a file?

Can you send the last few lines of output from a failed "make V=1" in that 
tree?  (it'll show us the exact commands used to compile/link, etc.)


On Sep 28, 2011, at 11:55 AM, Blosch, Edwin L wrote:

> I am getting some undefined references in building OpenMPI 1.5.4 and I would 
> like to know how to work around it.
>  
> The errors look like this:
>  
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_alloc_membind':
> topology-linux.c:(.text+0x1da): undefined reference to `mbind'
> topology-linux.c:(.text+0x213): undefined reference to `mbind'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_set_area_membind':
> topology-linux.c:(.text+0x414): undefined reference to `mbind'
> topology-linux.c :(.text+0x46c): undefined reference to `mbind'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_get_thisthread_membind':
> topology-linux.c:(.text+0x4ff): undefined reference to `get_mempolicy'
> topology-linux.c:(.text+0x5ff): undefined reference to `get_mempolicy'
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
>  In function `hwloc_linux_set_thisthread_membind':
> topology-linux.c:(.text+0x7b5): undefined reference to `migrate_pages'
> topology-linux.c:(.text+0x7e9): undefined reference to `set_mempolicy'
>

[OMPI users] Performance slowdown for large cases

2011-10-07 Thread Blosch, Edwin L
All,

I'm using OpenMPI 1.4.3 and have been running a particular case on 120, 240, 
480 and 960 processes.  My time-per-work metric reports 60, 30, 15, 15.  If I 
do the same run with MVAPICH 1.2, I get 60, 30, 15, 8. There is something 
running very slowly with OpenMPI 1.4.3 as the process count goes from 480 up to 
960.

Also this case has been really troublesome at 960, reliability-wise. Initially, 
the OpenMPI cases would reach a certain point in the application with some 
weird communication patterns, and they would die with the following messages:
c4n01][[14679,1],5][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory

I then added this parameter:
'--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32'

and it runs...  but as I said above, it runs 2x slower than MVAPICH.  All of it 
is very repeatable.

How can I determine the source of the problem here?

Thanks for any advice,

Ed







[OMPI users] How to set up state-less node /tmp for OpenMPI usage

2011-11-01 Thread Blosch, Edwin L
I'm getting this message below which is observing correctly that /tmp is 
NFS-mounted.   But there is no other directory which has user or group write 
permissions.  So I think I'm kind of stuck, and it sounds like a serious issue.

Before I ask the administrators to change their image, i.e. mount this 
partition under /work instead of /tmp, I'd like to ask if anyone is using 
OpenMPI on a state-less cluster, and are there any gotchas with regards to 
performance of OpenMPI, i.e. like handling of /tmp, that one would need to know?

Thank you,

Ed

--
WARNING: Open MPI will create a shared memory backing file in a
directory that appears to be mounted on a network filesystem.
Creating the shared memory backup file on a network file system, such
as NFS or Lustre is not recommended -- it may cause excessive network
traffic to your file servers and/or cause shared memory traffic in
Open MPI to be much slower than expected.

You may want to check your compute nodes, what the typical temporari
directory: node.  Possible sources of the location of this temporary
directory include the $TEMPDIR, $TEMP, and $TMP environment variables.

Note, too, that system administrators can set a list of filesystems
where Open MPI is disallowed from creating temporary files by settings
the MCA parameter "orte_no_session_dir".

Local host: e8332
File Name:  
/tmp/159313.1.e8300/openmpi-sessions-bloscel@e8332_0/53301/1/shared_mem_pool.e8332
--



[OMPI users] Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
Can anyone guess what the problem is here?  I was under the impression that 
OpenMPI (1.4.4) would look for /tmp and would create its shared-memory backing 
file there, i.e. if you don't set orte_tmpdir_base to anything.

Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use 
/dev/shm.  Why?

And, next question, why doesn't it work?  Here are the oddities of this cluster:

-the cluster is 'diskless'

-/tmp is an NFS mount

-/dev/shm is 12 GB and has 755 permissions

FilesystemSize  Used Avail Use% Mounted on
tmpfs  12G  164K   12G   1% /dev/shm

% ls -l output:
drwxr-xr-x  2 root root 40 Oct 28 09:14 shm


The error message below suggests that OpenMPI (1.4.4) has somehow 
auto-magically decided to use /dev/shm and is failing to be able to use it, for 
some reason.

Thanks for whatever help you can offer,

Ed


e8315:02942] opal_os_dirpath_create: Error: Unable to create the sub-directory 
(/dev/shm/openmpi-sessions-estenfte@e8315_0) of 
(/dev/shm/openmpi-sessions-estenfte@e8315_0/8474/0/1), mkdir failed [1]
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at 
line 106
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at 
line 399
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file 
base/ess_base_std_orted.c at line 206
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file ess_env_module.c at 
line 136
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at 
line 132
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
[e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file orted/orted_main.c at 
line 325





Re: [OMPI users] EXTERNAL: Re: Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
In /tmp.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Durga Choudhury
Sent: Thursday, November 03, 2011 11:04 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Shared-memory problems

Since /tmp is mounted across a network and /dev/shm is (always) local,
/dev/shm seems to be the right place for shared memory transactions.
If you create temporary files using mktemp is it being created in
/dev/shm or /tmp?


On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu  wrote:
> On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L  wrote:
>> -    /dev/shm is 12 GB and has 755 permissions
>> ...
>> % ls -l output:
>>
>> drwxr-xr-x  2 root root 40 Oct 28 09:14 shm
>
> This is your problem: it should be something like drwxrwxrwt. It might
> depend on the distribution, f.e. the following show this to be a bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=533897
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317329
>
> and surely you can find some more on the subject with your favorite
> search engine. Another source could be a paranoid sysadmin who has
> changed the default (most likely correct) setting the distribution
> came with - not only OpenMPI but any application using shmem would be
> affected..
>
> Cheers,
> Bogdan
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
Cross-thread response here, as this is related to the shared-memory thread:

Yes it sucks, so that's what led me to post my original question: If /dev/shm 
isn't the right place to put the session file, and /tmp is NFS-mounted, then 
what IS the "right" way to set up a diskless cluster?  I don't think the idea 
of tempfs sounds very appealing, after reading the discussion in FAQ #8 about 
shared-memory usage. We definitely have a job-queueing system and jobs are very 
often killed using qdel, and writing a post-script handler is way beyond the 
level of involvement or expertise we can expect from our sys admins.

Surely there's some reasonable guidance that can be offered to work around an 
issue that is so disabling.

A related question would be: How is it that HP-MPI works just fine on this 
cluster as it is configured now?  Are they doing something different for shared 
memory communications?


Thanks


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Thursday, November 03, 2011 11:35 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] How to set up state-less node /tmp for 
OpenMPI usage

On Nov 1, 2011, at 7:31 PM, Blosch, Edwin L wrote:

> I'm getting this message below which is observing correctly that /tmp is 
> NFS-mounted.   But there is no other directory which has user or group write 
> permissions.  So I think I'm kind of stuck, and it sounds like a serious 
> issue.

That does kinda suck.  :-\

> Before I ask the administrators to change their image, i.e. mount this 
> partition under /work instead of /tmp, I'd like to ask if anyone is using 
> OpenMPI on a state-less cluster, and are there any gotchas with regards to 
> performance of OpenMPI, i.e. like handling of /tmp, that one would need to 
> know?

I don't have much empirical information here -- I know that some people have 
done this (make /tmp be NFS-mounted).  I think there are at least some issues 
with this, though -- many applications believe that a sufficient condition for 
uniqueness in /tmp is to simply append your PID to a filename.  But this may no 
longer be true if /tmp is shared across multiple OS instances.

I don't have a specific case where this is problematic, but it's not a large 
stretch to imagine that this could happen in practice with random applications 
that make temp files in /tmp.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
You are right, Ralph.  There is no surprise behavior.  I had forgotten that I 
had been testing --mca orte_tmpdir_base /dev/shm to see if it worked (and 
obviously it doesn't).  Before that, without any MCA options, OpenMPI had tried 
/tmp, and gave me the warning about /tmp being NFS mounted, and so I had been 
exploring options.

I accept your point - I need "a good local directory - anything you have 
permission to write in will work fine".  How would one do this on a stateless 
node?  And can I beat the vendor over the head for not knowing how to set up 
the node image so that OpenMPI could function properly?

Thanks


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Thursday, November 03, 2011 11:33 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Shared-memory problems

I'm afraid this isn't correct. You definitely don't want the session directory 
in /dev/shm as this will almost always cause problems.

We look thru a progression of envars to find where to put the session directory:

1. the MCA param orte_tmpdir_base

2. the envar OMPI_PREFIX_ENV

3. the envar TMPDIR

4. the envar TEMP

5. the envar TMP

Check all those to see if one is set to /dev/shm. If so, you have a problem to 
resolve. For performance reasons, you probably don't want the session directory 
sitting on a network mounted location. What you need is a good local directory 
- anything you have permission to write in will work fine. Just set one of the 
above to point to it.


On Nov 3, 2011, at 10:04 AM, Durga Choudhury wrote:

> Since /tmp is mounted across a network and /dev/shm is (always) local,
> /dev/shm seems to be the right place for shared memory transactions.
> If you create temporary files using mktemp is it being created in
> /dev/shm or /tmp?
> 
> 
> On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu  wrote:
>> On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L  
>> wrote:
>>> -/dev/shm is 12 GB and has 755 permissions
>>> ...
>>> % ls -l output:
>>> 
>>> drwxr-xr-x  2 root root 40 Oct 28 09:14 shm
>> 
>> This is your problem: it should be something like drwxrwxrwt. It might
>> depend on the distribution, f.e. the following show this to be a bug:
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=533897
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317329
>> 
>> and surely you can find some more on the subject with your favorite
>> search engine. Another source could be a paranoid sysadmin who has
>> changed the default (most likely correct) setting the distribution
>> came with - not only OpenMPI but any application using shmem would be
>> affected..
>> 
>> Cheers,
>> Bogdan
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session 
file on /tmp, which is NFS-mounted and thus not a good choice.

Are you suggesting something like --mca ^sm?


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Eugene Loh
Sent: Thursday, November 03, 2011 12:54 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for 
OpenMPI usage

I've not been following closely.  Why must one use shared-memory 
communications?  How about using other BTLs in a "loopback" fashion?
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
I might be missing something here. Is there a side-effect or performance loss 
if you don't use the sm btl?  Why would it exist if there is a wholly 
equivalent alternative?  What happens to traffic that is intended for another 
process on the same node?

Thanks


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Eugene Loh
Sent: Thursday, November 03, 2011 1:23 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for 
OpenMPI usage

Right.  Actually "--mca btl ^sm".  (Was missing "btl".)

On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:
> I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session 
> file on /tmp, which is NFS-mounted and thus not a good choice.
>
> Are you suggesting something like --mca ^sm?
>
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Eugene Loh
> Sent: Thursday, November 03, 2011 12:54 PM
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp 
> for OpenMPI usage
>
> I've not been following closely.  Why must one use shared-memory
> communications?  How about using other BTLs in a "loopback" fashion?
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
Thanks for the help.  A couple follow-up-questions, maybe this starts to go 
outside OpenMPI:

What's wrong with using /dev/shm?  I think you said earlier in this thread that 
this was not a safe place.

If the NFS-mount point is moved from /tmp to /work, would a /tmp magically 
appear in the filesystem for a stateless node?  How big would it be, given that 
there is no local disk, right?  That may be something I have to ask the vendor, 
which I've tried, but they don't quite seem to get the question.

Thanks




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Thursday, November 03, 2011 5:22 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for 
OpenMPI usage


On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:

> I might be missing something here. Is there a side-effect or performance loss 
> if you don't use the sm btl?  Why would it exist if there is a wholly 
> equivalent alternative?  What happens to traffic that is intended for another 
> process on the same node?

There is a definite performance impact, and we wouldn't recommend doing what 
Eugene suggested if you care about performance.

The correct solution here is get your sys admin to make /tmp local. Making /tmp 
NFS mounted across multiple nodes is a major "faux pas" in the Linux world - it 
should never be done, for the reasons stated by Jeff.


> 
> Thanks
> 
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Eugene Loh
> Sent: Thursday, November 03, 2011 1:23 PM
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp 
> for OpenMPI usage
> 
> Right.  Actually "--mca btl ^sm".  (Was missing "btl".)
> 
> On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:
>> I don't tell OpenMPI what BTLs to use. The default uses sm and puts a 
>> session file on /tmp, which is NFS-mounted and thus not a good choice.
>> 
>> Are you suggesting something like --mca ^sm?
>> 
>> 
>> -Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
>> Behalf Of Eugene Loh
>> Sent: Thursday, November 03, 2011 12:54 PM
>> To: us...@open-mpi.org
>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp 
>> for OpenMPI usage
>> 
>> I've not been following closely.  Why must one use shared-memory
>> communications?  How about using other BTLs in a "loopback" fashion?
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Blosch, Edwin L
OK, I wouldn't have guessed that the space for /tmp isn't actually in RAM until 
it's needed.  That's the key piece of knowledge I was missing; I really 
appreciate it.  So you can allow /tmp to be reasonably sized, but if you aren't 
actually using it, then it doesn't take up 11 GB of RAM.  And you prevent users 
from crashing the node by setting mem limit to 4 GB less than the available 
memory. Got it.

I agree with your earlier comment:  these are fairly common systems now.  We 
have program- and owner-specific disks where I work, and after the program 
ends, the disks are archived or destroyed.  Before the stateless configuration 
option, the entire computer, nodes and switches as well as disks, were archived 
or destroyed after each program.  Not too cost-effective.

Is this a reasonable final summary? :  OpenMPI uses temporary files in such a 
way that it is performance-critical that these so-called session files, used 
for shared-memory communications, must be "local".  For state-less clusters, 
this means the node image must include a /tmp or /wrk partition, intelligently 
sized so as not to enable an application to exhaust the physical memory of the 
node, and care must be taken not to mask this in-memory /tmp with an NFS 
mounted filesystem.  It is not uncommon for cluster enablers to exclude /tmp 
from a typical base Linux filesystem image or mount it over NFS, as a means of 
providing users with a larger-sized /tmp that is not limited to a fraction of 
the node's physical memory, or to avoid garbage accumulation in /tmp taking up 
the physical RAM.  But not having /tmp or mounting it over NFS is not a viable 
stateless-node configuration option if you intend to run OpenMPI. Instead you 
could have a /bigtmp which is NFS-mounted and a /tmp which is local, for 
example. Starting in OpenMPI 1.7.x, shared-memory communication will no longer 
go through memory-mapped files, and vendors/users will no longer need to be 
vigilant concerning this OpenMPI performance requirement on stateless node 
configuration. 


Is that a reasonable summary?

If so, would it be helpful to include this as an FAQ entry under General 
category?  Or the "shared memory" category?  Or the "troubleshooting" category?


Thanks



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of David Turner
Sent: Friday, November 04, 2011 1:38 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for 
OpenMPI usage

% df /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /
% df /
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /

That works out to 11GB.  But...

The compute nodes have 24GB.  Freshly booted, about 3.2GB is
consumed by the kernel, various services, and the root file system.
At this time, usage of /tmp is essentially nil.

We set user memory limits to 20GB.

I would imagine that the size of the session directories depends on a
number of factors; perhaps the developers can comment on that.  I have
only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes.

As long as they're removed after each job, they don't really compete
with the application for available memory.

On 11/3/11 8:40 PM, Ed Blosch wrote:
> Thanks very much, exactly what I wanted to hear. How big is /tmp?
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of David Turner
> Sent: Thursday, November 03, 2011 6:36 PM
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp
> for OpenMPI usage
>
> I'm not a systems guy, but I'll pitch in anyway.  On our cluster,
> all the compute nodes are completely diskless.  The root file system,
> including /tmp, resides in memory (ramdisk).  OpenMPI puts these
> session directories therein.  All our jobs run through a batch
> system (torque).  At the conclusion of each batch job, an epilogue
> process runs that removes all files belonging to the owner of the
> current batch job from /tmp (and also looks for and kills orphan
> processes belonging to the user).  This epilogue had to written
> by our systems staff.
>
> I believe this is a fairly common configuration for diskless
> clusters.
>
> On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
>> Thanks for the help.  A couple follow-up-questions, maybe this starts to
> go outside OpenMPI:
>>
>> What's wrong with using /dev/shm?  I think you said earlier in this thread
> that this was not a safe place.
>>
>> If the NFS-mount point is moved from /tmp to /work, would a /tmp magically
> appear in the filesystem for a stateless

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Blosch, Edwin L
Thanks, Ralph, 

> Having a local /tmp is typically required by Linux for proper operation as 
> the OS itself needs to ensure its usage is protected, as was > previously 
> stated and is reiterated in numerous books on managing Linux systems. 

There is a /tmp, but it's not local.  I don't know if that passes muster as a 
proper setup or not.  I'll gift a Linux book for Christmas to the two reputable 
vendors who have configured diskless clusters for us where /tmp was not local, 
and both /usr/tmp and /var/tmp were linked to /tmp. :)

> IMO, discussions of how to handle /tmp on diskless systems goes beyond the 
> bounds of OMPI - it is a Linux system management issue that > is covered in 
> depth by material on that subject. Explaining how the session directory is 
> used, and why we now include a test and warning if the session directory is 
> going to land on a networked file system (pretty sure this is now in the 1.5 
> series, but certainly is > in the trunk for future releases), would be 
> reasonable.

I know where you're coming from, and I probably didn't title the post correctly 
because I wasn't sure what to ask.  But I definitely saw it, and still see it, 
as an OpenMPI issue.  Having /tmp mounted over NFS on a stateless cluster is 
not a broken configuration, broadly speaking. The vendors made those decisions 
and presumably that's how they do it for other customers as well. There are two 
other (Platform/HP) MPI applications that apparently work normally. But OpenMPI 
doesn't work normally. So it's deficient.

I'll ask the vendor to rebuild the stateless image with a /usr/tmp partition so 
that the end-user application in question can then set orte_tmpdir_base to 
/usr/tmp and all will then work beautifully...

Thanks again,

Ed



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-07 Thread Blosch, Edwin L
Thanks for the valuable input. I'll change to a wait-and-watch approach.

The FAQ on tuning sm says "If the session directory is located on a network 
filesystem, the shared memory BTL latency will be extremely high."  And the 
title is 'Why am I seeing incredibly poor performance...'.  So I made the leap 
that this configuration must be avoided at all costs...

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of David Singleton
Sent: Sunday, November 06, 2011 4:15 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for 
OpenMPI usage


On 11/05/2011 09:11 AM, Blosch, Edwin L wrote:
..
>
> I know where you're coming from, and I probably didn't title the post 
> correctly because I wasn't sure what to ask.  But I definitely saw it, and 
> still see it, as an OpenMPI issue.  Having /tmp mounted over NFS on a 
> stateless cluster is not a broken configuration, broadly speaking. The 
> vendors made those decisions and presumably that's how they do it for other 
> customers as well. There are two other (Platform/HP) MPI applications that 
> apparently work normally. But OpenMPI doesn't work normally. So it's 
> deficient.
>

I'm also concerned that there is a bit of an over-reaction to network
filesystems.  Stores to mmap'd files do not instantly turn into filesystem
writes - there are dirty_writeback parameters to control how often
writes occur and its typically 5-20 seconds.  Ideally, memory or a local
disk is used for session directories but, in many cases, you just wont
notice a performance hit from network filesystems - we didn't when we
tested session directories on Lustre.  If your app is one of those handful
that is slowed by OS jitter at megascale, then you may well notice.
Obviously, its something to test.

For our 1.5 install, I removed Lustre from the list of filesystem types
that generate the warning message about network filesystems.  It would be
nice if it was a site choice whether or not to produce that message and
when.

David

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Trouble with PSM "Could not detect network connectivity"

2012-11-02 Thread Blosch, Edwin L
I am getting a problem where something called "PSM" is failing to start and 
that in turn is preventing my job from running.  Command and output are below.  
I would like to understand what's going on.  Apparently this version of OpenMPI 
decided to build itself with support for PSM, but if it's not available, why 
fail if other transports are available?  Also, in my command I think I've told 
OpenMPI not to use anything but self and sm, so why would it try to use PSM? 

Thanks in advance for any help...

user@machinename:~> /usr/mpi/intel/openmpi-1.4.3/bin/ompi_info -all | grep psm
 MCA mtl: psm (MCA v2.0, API v2.0, Component v1.4.3)
 MCA mtl: parameter "mtl_psm_connect_timeout" (current value: 
"180", data source: default value)
 MCA mtl: parameter "mtl_psm_debug" (current value: "1", data 
source: default value)
 MCA mtl: parameter "mtl_psm_ib_unit" (current value: "-1", 
data source: default value)
 MCA mtl: parameter "mtl_psm_ib_port" (current value: "0", data 
source: default value)
 MCA mtl: parameter "mtl_psm_ib_service_level" (current value: 
"0", data source: default value)
 MCA mtl: parameter "mtl_psm_ib_pkey" (current value: "32767", 
data source: default value)
 MCA mtl: parameter "mtl_psm_priority" (current value: "0", 
data source: default value)

Here is my command:

/usr/mpi/intel/openmpi-1.4.3/bin/mpirun -n 1 --mca btl_base_verbose 30 --mca 
btl self,sm /release/cfd/simgrid/P_OPT.LINUX64

and here is the output:

[machinename:01124] mca: base: components_open: Looking for btl components
[machinename:01124] mca: base: components_open: opening btl components
[machinename:01124] mca: base: components_open: found loaded component self
[machinename:01124] mca: base: components_open: component self has no register 
function
[machinename:01124] mca: base: components_open: component self open function 
successful
[machinename:01124] mca: base: components_open: found loaded component sm
[machinename:01124] mca: base: components_open: component sm has no register 
function
[machinename:01124] mca: base: components_open: component sm open function 
successful
machinename.1124ipath_userinit: assign_context command failed: Network is down
machinename.1124can't open /dev/ipath, network down (err=26)
--
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Could not detect network connectivity
--
[machinename:01124] mca: base: close: component self closed
[machinename:01124] mca: base: close: unloading component self
[machinename:01124] mca: base: close: component sm closed
[machinename:01124] mca: base: close: unloading component sm
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.



[OMPI users] Question on shmem MCA parameter

2012-11-07 Thread Blosch, Edwin L
I am using this parameter "shmem_mmap_relocate_backing_file" and noticed that 
the relocation variable is identified as 
"shmem_mmap_opal_shmem_mmap_backing_file_base_dir" in its documentation, but 
then the next parameter that appears from ompi_info is spelled differently, 
namely "shmem_mmap_backing_file_base_dir". 

Is the first name just a typo?


> MCA shmem: parameter "shmem_mmap_relocate_backing_file" 
> (current value: 
> <0>, data source: default value) 
> Whether to change the default placement of 
> backing files or not 
> (Negative = try to relocate backing files to 
> an area rooted at 
> the path specified by 
> 
> shmem_mmap_opal_shmem_mmap_backing_file_base_dir, but continue 
> with the default path if the relocation 
> fails, 0 = do not 
> relocate, Positive = same as the negative 
> option, but will fail 
> if the relocation fails. 
> MCA shmem: parameter "shmem_mmap_backing_file_base_dir" 
> (current value: 
> , data source: default value) 
> Specifies where backing files will be created when 
> shmem_mmap_relocate_backing_file is in use.



[OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
I am trying to map MPI processes to sockets in a somewhat compacted pattern and 
I am wondering the best way to do it.

Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and 
I have 4 MPI processes, each of which will use 2 OpenMP processes.

I've re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) 
communicate more with each other than with other ranks.  Thus I think the best 
mapping would be:

RANK   SOCKETCORE
0  0  0
1  0  2
2  1  0
3  1  2

My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 
on socket 0 and ranks 1 and 3 on socket 1, not what I want.

It looks like --cpus-per-proc might be what I want, i.e. seems like I might 
give the value 2.  But it was unclear to me whether I would also need to give 
--bysocket and the FAQ suggests this combination is untested.

May be a rankfile is what I need?

I would appreciate some advice on the easiest way to get this mapping.

Thanks


[OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Blosch, Edwin L
I see hwloc is a subproject hosted under OpenMPI but, in reading the 
documentation, I was unable to figure out if hwloc is a module within OpenMPI, 
or if some of the code base is borrowed into OpenMPI, or something else.  Is 
hwloc used by OpenMPI internally?  Is it a layer above libnuma?  Or is it just 
a project that is useful to OpenMPI in support of targeting various new 
platforms?

Thanks


Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
>>> In your desired ordering you have rank 0 on (socket,core) (0,0) and 
>>> rank 1 on (0,2). Is there an architectural reason for that? Meaning 
>>> are cores 0 and 1 hardware threads in the same core, or is there a 
>>> cache level (say L2 or L3) connecting cores 0 and 1 separate from 
>>> cores 2 and 3? 

My thinking was that each MPI rank will be running 2 OpenMP threads and that 
there might be some benefit to having those threads execute on cores 0 and 1 
because those cores might share some level of the memory hierarchy.  No 
hardware threading is being used.

>>> hwloc's lstopo should give you that information if you don't have that 
>>> information handy. 

Here you go, first likwid output then hwloc, just for the first socket.

likwid output:
*
Graphical:
*
Socket 0:
+-+
| +---+ +---+ +---+ +---+ +---+ +---+ +---+ 
+---+ +---+ +---+ |
| |   0   | |   1   | |   2   | |   3   | |   4   | |   5   | |   6   | |   7   
| |   8   | |   9   | |
| +---+ +---+ +---+ +---+ +---+ +---+ +---+ 
+---+ +---+ +---+ |
| +-+ +-+ +-+ 
+-+ +-+ |
| |   32kB  | |   32kB  | |   32kB  | |   32kB  
| |   32kB  | |
| +-+ +-+ +-+ 
+-+ +-+ |
| +-+ +-+ +-+ 
+-+ +-+ |
| |  256kB  | |  256kB  | |  256kB  | |  256kB  
| |  256kB  | |
| +-+ +-+ +-+ 
+-+ +-+ |
| 
+-+
 |
| |   30MB  
| |
| 
+-+
 |
+-+

hwloc output:

Machine (512GB)
  NUMANode L#0 (P#0 64GB) + Socket L#0 + L3 L#0 (30MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)

Thanks again



Re: [OMPI users] EXTERNAL: Re: Best way to map MPI processes to sockets?

2012-11-08 Thread Blosch, Edwin L
Yes it is a Westmere system. 

Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz" 
CPUType=x86_64)
  L3Cache L#0 (size=30720KB linesize=64 ways=24)
L2Cache L#0 (size=256KB linesize=64 ways=8)
  L1dCache L#0 (size=32KB linesize=64 ways=8)
L1iCache L#0 (size=32KB linesize=64 ways=4)
  Core L#0 (P#0)
PU L#0 (P#0)
L2Cache L#1 (size=256KB linesize=64 ways=8)
  L1dCache L#1 (size=32KB linesize=64 ways=8)
L1iCache L#1 (size=32KB linesize=64 ways=4)
  Core L#1 (P#1)
PU L#1 (P#1)

So I guess each core has its own L1 and L2 caches.  Maybe I shouldn't care 
where or if the MPI processes are bound within a socket; if I can test it, that 
will be good enough for me.

So my initial question is now changed to:

What is the best/easiest way to get this mapping?  Rankfile?, --cpus-per-proc 2 
--bind-to-socket, or something else? 

RANK  SOCKET  CORE
0   0   unspecified
1   0   unspecified
2   1   unspecified
3   1   unspecified


Thanks

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Brice Goglin
Sent: Wednesday, November 07, 2012 6:17 PM
To: us...@open-mpi.org
Subject: EXTERNAL: Re: [OMPI users] Best way to map MPI processes to sockets?

What processor and kernel is this? (see /proc/cpuinfo, or run "lstopo -v" and 
look for attributes on the Socket line) You're hwloc output looks like an Intel 
Xeon Westmere-EX (E7-48xx or E7-88xx).
The likwid output is likely wrong (maybe confused by the fact that hardware 
threads are disabled).

Brice







Re: [OMPI users] EXTERNAL: Re: How is hwloc used by OpenMPI

2012-11-08 Thread Blosch, Edwin L
Thanks, I definitely appreciate the new, hotness of hwloc.  I just couldn't 
tell from the documentation or the web page how or if it was being used by 
OpenMPI.

I still work with OpenMPI 1.4.x and now that I've looked into the builds, I 
think I understand that PLPA is used in 1.4 and hwloc is brought in as an MCA 
module in 1.6.x.

Re: layering, I believe you are saying that the relationship to libnuma is not 
one where hwloc is adding higher-level functionalities to libnuma, but rather 
hwloc is a much improved alternative except for a few system calls it makes via 
libnuma out of necessity or convenience.

Thanks





Re: [OMPI users] EXTERNAL: Re: Best way to map MPI processes to sockets?

2012-11-08 Thread Blosch, Edwin L
Thanks, that's what I'm looking for.

My first look for documentation is always the FAQ, not the man pages.  I found 
no mention of -npersocket in the FAQ but there it is very clear in the man 
page.  Boy do I feel dumb.

Anyway, thanks a lot.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Thursday, November 08, 2012 10:08 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Best way to map MPI processes to 
sockets?

I gather from your other emails you are using 1.4.3, yes? I believe that has 
npersocket as an option. If so, you could do:

mpirun -npersocket 2 -bind-to-socket ...

That would put two processes in each socket, bind them to that socket, and rank 
them in series. So ranks 0-1 would be bound to the first socket, ranks 2-3 to 
the second.

Ralph

On Thu, Nov 8, 2012 at 6:52 AM, Blosch, Edwin L 
mailto:edwin.l.blo...@lmco.com>> wrote:
Yes it is a Westmere system.

Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz" 
CPUType=x86_64)
  L3Cache L#0 (size=30720KB linesize=64 ways=24)
L2Cache L#0 (size=256KB linesize=64 ways=8)
  L1dCache L#0 (size=32KB linesize=64 ways=8)
L1iCache L#0 (size=32KB linesize=64 ways=4)
  Core L#0 (P#0)
PU L#0 (P#0)
L2Cache L#1 (size=256KB linesize=64 ways=8)
  L1dCache L#1 (size=32KB linesize=64 ways=8)
L1iCache L#1 (size=32KB linesize=64 ways=4)
  Core L#1 (P#1)
PU L#1 (P#1)

So I guess each core has its own L1 and L2 caches.  Maybe I shouldn't care 
where or if the MPI processes are bound within a socket; if I can test it, that 
will be good enough for me.

So my initial question is now changed to:

What is the best/easiest way to get this mapping?  Rankfile?, --cpus-per-proc 2 
--bind-to-socket, or something else?

RANK  SOCKET  CORE
0   0   unspecified
1   0   unspecified
2   1   unspecified
3   1   unspecified


Thanks

-Original Message-
From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On 
Behalf Of Brice Goglin
Sent: Wednesday, November 07, 2012 6:17 PM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: EXTERNAL: Re: [OMPI users] Best way to map MPI processes to sockets?

What processor and kernel is this? (see /proc/cpuinfo, or run "lstopo -v" and 
look for attributes on the Socket line) You're hwloc output looks like an Intel 
Xeon Westmere-EX (E7-48xx or E7-88xx).
The likwid output is likely wrong (maybe confused by the fact that hardware 
threads are disabled).

Brice





___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Problems with shared libraries while launching jobs

2012-12-14 Thread Blosch, Edwin L
I am having a weird problem launching cases with OpenMPI 1.4.3.  It is most 
likely a problem with a particular node of our cluster, as the jobs will run 
fine on some submissions, but not other submissions.  It seems to depend on the 
node list.  I just am having trouble diagnosing which node, and what is the 
nature of the problem it has.

One or perhaps more of the orted are indicating they cannot find an Intel Math 
library.  The error is:
/release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

I've checked the environment just before launching mpirun, and LD_LIBRARY_PATH 
includes the necessary component to point to where the Intel shared libraries 
are located.  Furthermore, my mpirun command line says to export the 
LD_LIBRARY_PATH variable:
Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile 
/var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x 
MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', 
'-cycles', '1', '-ri', 'restart.1', '-ro', 
'/tmp/fv420761.maruhpc4-mgt/restart.1']

My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH.  
OpenMPI is built explicitly --without-torque and should be using ssh to launch 
the orted.

What options can I add to get more debugging of problems launching orted?

Thanks,

Ed


Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while launching jobs

2012-12-17 Thread Blosch, Edwin L
.org] On Behalf 
Of Ralph Castain
Sent: Friday, December 14, 2012 2:25 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problems with shared libraries while 
launching jobs

Add -mca plm_base_verbose 5 --leave-session-attached to the cmd line - that 
will show the ssh command being used to start each orted.

On Dec 14, 2012, at 12:17 PM, "Blosch, Edwin L" 
mailto:edwin.l.blo...@lmco.com>> wrote:


I am having a weird problem launching cases with OpenMPI 1.4.3.  It is most 
likely a problem with a particular node of our cluster, as the jobs will run 
fine on some submissions, but not other submissions.  It seems to depend on the 
node list.  I just am having trouble diagnosing which node, and what is the 
nature of the problem it has.

One or perhaps more of the orted are indicating they cannot find an Intel Math 
library.  The error is:
/release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

I've checked the environment just before launching mpirun, and LD_LIBRARY_PATH 
includes the necessary component to point to where the Intel shared libraries 
are located.  Furthermore, my mpirun command line says to export the 
LD_LIBRARY_PATH variable:
Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile 
/var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x 
MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', 
'-cycles', '1', '-ri', 'restart.1', '-ro', 
'/tmp/fv420761.maruhpc4-mgt/restart.1']

My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH.  
OpenMPI is built explicitly --without-torque and should be using ssh to launch 
the orted.

What options can I add to get more debugging of problems launching orted?

Thanks,

Ed
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while launching jobs

2012-12-18 Thread Blosch, Edwin L
libimf.so is present on all nodes, by design. However, some times the 
simulation runs and other times not. I have a suspicion that the filesystem 
(GPFS) where the Intel library is located, may become temporarily unavailable 
in the failure cases.  I do not suspect any problem with OpenMPI, but I am 
hopeful that it can produce diagnostics that indicate the root cause of the 
problem.

I have followed Ralph's advice to build with --enable-debug and am now waiting 
for the problem to happen again so I can see the ssh command used to launch the 
orted.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: Tuesday, December 18, 2012 4:14 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while 
launching jobs

Am 17.12.2012 um 16:42 schrieb Blosch, Edwin L:

> Ralph,
>  
> Unfortunately I didn't see the ssh output.  The output I got was pretty much 
> as before.
>  
> You know, the fact that the error message is not prefixed with a host name 
> makes me think it could be happening on the host where the job is placed by 
> PBS. If there is something wrong in the user environment prior to mpirun, 
> that is not an OpenMPI problem. And yet, in one of the jobs that failed, I 
> have also printed outthe results of 'ldd' on the mpirun executable just prior 
> to executing the command, and all the shared libraries were resolved:

You checked the mpirun, but not the orted which misses a "libimf.so" from 
Intel. The Intel libimf.so from the redistributable archive is present on all 
nodes?

-- Reuti


>  
> ldd /release/cfd/openmpi-intel/bin/mpirun
> linux-vdso.so.1 =>  (0x7fffbbb39000)
> libopen-rte.so.0 => /release/cfd/openmpi-intel/lib/libopen-rte.so.0 
> (0x2abdf75d2000)
> libopen-pal.so.0 => /release/cfd/openmpi-intel/lib/libopen-pal.so.0 
> (0x2abdf7887000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2abdf7b39000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x2abdf7d3d000)
> libutil.so.1 => /lib64/libutil.so.1 (0x2abdf7f56000)
> libm.so.6 => /lib64/libm.so.6 (0x2abdf8159000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2abdf83af000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x2abdf85c7000)
> libc.so.6 => /lib64/libc.so.6 (0x2abdf87e4000)
> libimf.so => /appserv/intel/Compiler/11.1/072/lib/intel64/libimf.so 
> (0x2abdf8b42000)
> libsvml.so => /appserv/intel/Compiler/11.1/072/lib/intel64/libsvml.so 
> (0x2abdf8ed7000)
> libintlc.so.5 => 
> /appserv/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 
> (0x2abdf90ed000)
> /lib64/ld-linux-x86-64.so.2 (0x2abdf73b1000)
>  
> Hence my initial assumption that the shared-library problem was happening 
> with one of the child processes on a remote node.
>  
> So at this point I have more questions than answers.  I still don't know if 
> this message comes from the main mpirun process or one of the child 
> processes, although it seems that it should not be the main process because 
> of the output of ldd above.
>  
> Any more suggestions are welcomed of course.
>  
> Thanks
>  
>  
> /release/cfd/openmpi-intel/bin/mpirun --machinefile 
> /var/spool/PBS/aux/20804.maruhpc4-mgt -np 160 -x LD_LIBRARY_PATH -x 
> MPI_ENVIRONMENT=1 --mca plm_base_verbose 5 --leave-session-attached 
> /tmp/fv420804.maruhpc4-mgt/test_jsgl -v -cycles 1 -ri restart.5000 
> -ro /tmp/fv420804.maruhpc4-mgt/restart.5000
>  
> [c6n38:16219] mca:base:select:(  plm) Querying component [rsh] 
> [c6n38:16219] mca:base:select:(  plm) Query of component [rsh] set 
> priority to 10 [c6n38:16219] mca:base:select:(  plm) Selected 
> component [rsh]
> Warning: Permanently added 'c6n39' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c6n40' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c6n41' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c6n42' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c5n26' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c3n20' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c4n10' (RSA) to the list of known hosts.^M
> Warning: Permanently added 'c4n40' (RSA) to the list of known hosts.^M
> /release/cfd/openmpi-intel/bin/orted: error while loading shared 
> libraries: libimf.so: cannot open shared object file: No such file or 
> directory
> --
>  A daemon (pid 16227) died unexpectedly wi

[OMPI users] basic questions about compiling OpenMPI

2013-05-22 Thread Blosch, Edwin L
Apologies for not exploring the FAQ first.



If I want to use Intel or PGI compilers but link against the OpenMPI that ships 
with RedHat Enterprise Linux 6 (compiled with g++ I presume), are there any 
issues to watch out for, during linking?



Thanks,



Ed



Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI

2013-05-23 Thread Blosch, Edwin L
Excellent.  Now I've read the FAQ and noticed that it doesn't mention the issue 
with the Fortran 90 .mod signatures.  Our applications are Fortran.  So your 
replies are very helpful -- now I know it really isn't practical for us to use 
the default OpenMPI shipped with RHEL6 since we use both Intel and PGI 
compilers and have several applications to accommodate.  Presumably if all the 
applications did INCLUDE 'mpif.h'  instead of 'USE MPI' then we could get 
things working, but it's not a great workaround.

Thank you very much


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Tim 
Prince [n...@aol.com]
Sent: Wednesday, May 22, 2013 10:24 AM
To: us...@open-mpi.org
Subject: EXTERNAL: Re: [OMPI users] basic questions about compiling OpenMPI

On 5/22/2013 11:34 AM, Paul Kapinos wrote:
> On 05/22/13 17:08, Blosch, Edwin L wrote:
>> Apologies for not exploring the FAQ first.
>
> No comments =)
>
>
>
>> If I want to use Intel or PGI compilers but link against the OpenMPI
>> that ships with RedHat Enterprise Linux 6 (compiled with g++ I
>> presume), are there any issues to watch out for, during linking?
>
> At least, the Fortran-90 bindings ("use mpi") won't work at all
> (they're compiler-dependent.
>
> So, our way is to compile a version of Open MPI with each compiler. I
> think this is recommended.
>
> Note also that the version of Open MPI shipped with Linux is usuallu a
> bit dusty.
>
>
The gfortran build of Fortran library, as well as the .mod USE files,
won't work with ifort or PGI compilers.  g++ built libraries ought to
work with sufficiently recent versions of icpc.
As noted above, it's worth while to rebuild yourself, even if you use a
(preferably more up to date version of) gcc, which you can use along
with one of the commercial Fortran compilers for linux.

--
Tim Prince

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Question on building OpenMPI with support for memory affinity

2013-05-29 Thread Blosch, Edwin L
The FAQ talks about building support for memory affinity by adding 
-with-libnuma=

However, I did not do that, and yet when I check ompi_info, it looks like there 
is support from the hwloc module.

Can I assume the FAQ is a little stale and that -with-libnuma is not really 
necessary anymore?

[bloscel@mgmt1 bin]$ ./ompi_info | grep affi
  MPI extensions: affinity example
   MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.4)
   MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)


[OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
I'm having trouble building OpenMPI 1.6.4 with PGI 13.4. Suggestions?

checking alignment of double... 8
checking alignment of long double... 8
checking alignment of float _Complex... 4
checking alignment of double _Complex... 8
checking alignment of long double _Complex... 8
checking alignment of void *... 8
checking for C bool type... no
checking size of _Bool... 1
checking for inline... inline
checking for C/C++ restrict keyword... __restrict
checking for weak symbol support... yes
checking for functional offsetof macro... no
configure: WARNING: Your compiler does not support offsetof macro
configure: error: Configure: Cannot continue
+ '[' 1 = 0 ']'




Re: [OMPI users] EXTERNAL: Re: Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
The PGI user forum has a recent post regarding PGI 13.2 and OpenMPI 1.6.4.

The user had effectively a bad install of the compiler.  Some file "stddef.h" 
provided within the PGI installation was missing, and when it was individually 
supplied, compilation still failed.  Basically he thought other files might 
also be missing.  I did not see a final resolution.  It does not appear to be 
an OpenMPI issue, though.  Seems to be a PGI issue with the install.

Thanks

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Nathan Hjelm
Sent: Wednesday, May 29, 2013 4:59 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

It works with PGI 12.x and it better work with newer versions since offsetof is 
ISOC89/ANSIC.

-Nathan

On Wed, May 29, 2013 at 09:31:58PM +, Jeff Squyres (jsquyres) wrote:
> Edwin --
> 
> Can you ask PGI support about this?  I swear that the PGI compiler suite has 
> supported offsetof before.
> 
> 
> On May 29, 2013, at 5:26 PM, "Blosch, Edwin L"  
> wrote:
> 
> > I?m having trouble building OpenMPI 1.6.4 with PGI 13.4. Suggestions?
> >  
> > checking alignment of double... 8
> > checking alignment of long double... 8 checking alignment of float 
> > _Complex... 4 checking alignment of double _Complex... 8 checking 
> > alignment of long double _Complex... 8 checking alignment of void 
> > *... 8 checking for C bool type... no checking size of _Bool... 1 
> > checking for inline... inline checking for C/C++ restrict keyword... 
> > __restrict checking for weak symbol support... yes checking for 
> > functional offsetof macro... no
> > configure: WARNING: Your compiler does not support offsetof macro
> > configure: error: Configure: Cannot continue
> > + '[' 1 = 0 ']'
> >  
> >  
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
You steered me right.  The PGI support representative said, regarding 13.4:

This is a known issue where there's a compatibility issue with the "stddef.h" 
header file we ship and GCC 4.6/4.7. We were able to fix the problem in the 
13.5 compilers (TPR#19320) 

If you can't download and install 13.5, please send a note to PGI Customer 
Service (t...@pgroup.com) and they should be able to get you the updated file. 




From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jeff 
Squyres (jsquyres) [jsquy...@cisco.com]
Sent: Wednesday, May 29, 2013 3:31 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

Edwin --

Can you ask PGI support about this?  I swear that the PGI compiler suite has 
supported offsetof before.


On May 29, 2013, at 5:26 PM, "Blosch, Edwin L"  wrote:

> I’m having trouble building OpenMPI 1.6.4 with PGI 13.4. Suggestions?
>
> checking alignment of double... 8
> checking alignment of long double... 8
> checking alignment of float _Complex... 4
> checking alignment of double _Complex... 8
> checking alignment of long double _Complex... 8
> checking alignment of void *... 8
> checking for C bool type... no
> checking size of _Bool... 1
> checking for inline... inline
> checking for C/C++ restrict keyword... __restrict
> checking for weak symbol support... yes
> checking for functional offsetof macro... no
> configure: WARNING: Your compiler does not support offsetof macro
> configure: error: Configure: Cannot continue
> + '[' 1 = 0 ']'
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] "Failed to find the following executable" problem under Torque

2009-09-25 Thread Blosch, Edwin L
I'm having a problem running OpenMPI under Torque.  It complains like there is 
a command syntax problem, but the three variations below are all correct, best 
I can tell using mpirun -help.  The environment in which the command executes, 
i.e. PATH and LD_LIBRARY_PATH, is correct.  Torque is 2.3.x.  OpenMPI is 1.2.8. 
 OFED is 1.4.

Somewhere in the FAQ I had read that you must not give -machinefile under 
Torque with OpenMPI 1.2.8 and you did not need to give -np.  That's why I tried 
variation 3 below without either of these options, but it still fails.

Thanks for any help



/usr/mpi/intel/openmpi-1.2.8/bin/mpirun -np 28 
/tmp/43.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro 
/tmp/43.fwnaeglingio/restart.0
--
Failed to find the following executable:

Host:   n8n26
Executable: -p

Cannot continue.


mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile 
/var/spool/torque/aux/45.fwnaeglingio -np 28 --mca btl ^tcp  --mca 
mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x LD_LIBRARY_PATH -x 
MPI_ENVIRONMENT /tmp/45.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri 
restart.0 -ro /tmp/45.fwnaeglingio/restart.0
--
Failed to find or execute the following executable:

Host:   n8n27
Executable: --prefix /usr/mpi/intel/openmpi-1.2.8

Cannot continue.


/usr/mpi/intel/openmpi-1.2.8/bin/mpirun -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 
/tmp/47.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro 
/tmp/47.fwnaeglingio/restart.0
--
Failed to find the following executable:

Host:   n8n27
Executable: -

Cannot continue.



  1   2   >