[OMPI users] OpenMPI job launch failures

2013-02-14 Thread Bharath Ramesh
On our cluster we are noticing intermediate job launch failure when 
using OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and 
it is integrated with Torque-4.1.3. It failes even for a simple MPI 
hello world applications. The issue is that orted gets launched on all 
the nodes but there are a bunch of nodes that dont launch the actual MPI 
application. There are no errors reported when the job gets killed 
because the walltime expires. Enabling --debug-daemons doesnt show any 
errors either. The only difference being that successful runs have 
MPI_proctable listed and for failures this is absent. Any help in 
debugging this issue is greatly appreciated.


--
Bharath




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Ralph Castain
Sounds like the orteds aren't reporting back to mpirun after launch. The 
MPI_proctable observation just means that the procs didn't launch in those 
cases where it is absent, which is something you already observed.

Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
report back to mpirun after it launches. If not, then it is likely that 
something is blocking it.

You could also try updating to 1.6.3/4 in case there is some race condition in 
1.6.1, though we haven't heard of it to-date.


On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:

> On our cluster we are noticing intermediate job launch failure when using 
> OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
> integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
> applications. The issue is that orted gets launched on all the nodes but 
> there are a bunch of nodes that dont launch the actual MPI application. There 
> are no errors reported when the job gets killed because the walltime expires. 
> Enabling --debug-daemons doesnt show any errors either. The only difference 
> being that successful runs have MPI_proctable listed and for failures this is 
> absent. Any help in debugging this issue is greatly appreciated.
> 
> -- 
> Bharath
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Bharath Ramesh
Is there any way to prevent the output of more than one node
written to the same line. I tried setting --output-filename,
which didnt help. For some reason only stdout was written to the
files. Making it little bit hard to read close to a 6M output
file.

-- 
Bharath

On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
> Sounds like the orteds aren't reporting back to mpirun after launch. The 
> MPI_proctable observation just means that the procs didn't launch in those 
> cases where it is absent, which is something you already observed.
> 
> Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
> report back to mpirun after it launches. If not, then it is likely that 
> something is blocking it.
> 
> You could also try updating to 1.6.3/4 in case there is some race condition 
> in 1.6.1, though we haven't heard of it to-date.
> 
> 
> On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
> 
> > On our cluster we are noticing intermediate job launch failure when using 
> > OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
> > integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
> > applications. The issue is that orted gets launched on all the nodes but 
> > there are a bunch of nodes that dont launch the actual MPI application. 
> > There are no errors reported when the job gets killed because the walltime 
> > expires. Enabling --debug-daemons doesnt show any errors either. The only 
> > difference being that successful runs have MPI_proctable listed and for 
> > failures this is absent. Any help in debugging this issue is greatly 
> > appreciated.
> > 
> > -- 
> > Bharath
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


smime.p7s
Description: S/MIME cryptographic signature


[OMPI users] Very high latency with openib btl

2013-02-14 Thread Maxime Boissonneault

Hi,
I have a strange case here. The application is "plink" 
(http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml). The 
computation/communication pattern of the application is the following :


1- MPI_Init
2- Some single rank computation
3- MPI_Bcast
4- Some single rank computation
5- MPI_Barrier
6- rank 0 sends data to each other rank with MPI_Ssend, one rank at a time.
6- other ranks use MPI_Recv
7- Some single rank computation
8- other ranks send result to rank 0 with MPI_Ssend
8- rank 0 receives data with MPI_Recv
9- rank 0 analyses result
10- MPI_Finalize

The amount of data being sent is of the order of the kilobytes, and we 
have IB.


The problem we observe is in step 6. I've output dates before and after 
each MPI operation. With the openib btl, the behavior I observe is that :

- rank 0 starts sending
- rank n receives almost instantly, and MPI_Recv returns.
- rank 0's MPI_Ssend often returns _minutes_.

It looks like the acknowledgement from rank n takes minutes to reach 
rank 0.


Now, the tricky part is that if I disable the openib btl to use instead 
tcp over IB, there is no such latency and the acknowledgement comes back 
within a fraction of a second. Also, if rank 0 and rank n are on the 
same node, the acknowledgement is also quasi-instantaneous (I guess it 
goes through the SM btl instead of openib).


I tried to reproduce this in a simple case, but I observed no such 
latency. The duration that I got for the whole communication is of the 
order of milliseconds.


Does anyone have an idea of what could cause such very high latencies 
when using the OpenIB BTL ?


Also, I tried replacing step 6 by explicitly sending a confirmation :
- rank 0 does MPI_Isend to rank n followed by MPI_Recv from rank n
- rank n does MPI_Recv from rank 0 followed by MPI_Isend to rank 0

In this case also, rank n's MPI_Isend executes quasi-instantaneously, 
and rank 0's MPI_Recv only returns a few minutes later.


Thanks,

Maxime Boissonneault


[OMPI users] process binding to NUMA node on Opteron 6xxx series CPUs?

2013-02-14 Thread Oliver Weihe

Hi,

is it possible to bind MPI processes to a NUMA node somehow on Opteron 
6xxx series CPUs (e.g. --bind-to-NUMAnode) *without* the usage of a 
rankfile?
Opteron 6xxx have two NUMA nodes per CPU(-socket) so --bind-to-socket 
doesn't work as I want.


This is a 4 socket Opteron 6344 (12 CPUs per CPU(-socket)):

root@node01:~> numactl --hardware | grep cpus
node 0 cpus: 0 1 2 3 4 5
node 1 cpus: 6 7 8 9 10 11
node 2 cpus: 12 13 14 15 16 17
node 3 cpus: 18 19 20 21 22 23
node 4 cpus: 24 25 26 27 28 29
node 5 cpus: 30 31 32 33 34 35
node 6 cpus: 36 37 38 39 40 41
node 7 cpus: 42 43 44 45 46 47

root@node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 
--bind-to-socket --bysocket sleep 1s
[node01.cluster:21446] MCW rank 1 bound to socket 1[core 0-11]: [. . . . 
. . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . 
. . . . . . . . .]
[node01.cluster:21446] MCW rank 2 bound to socket 2[core 0-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . 
. . . . . . . . .]
[node01.cluster:21446] MCW rank 3 bound to socket 3[core 0-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B 
B B B B B B B B B]
[node01.cluster:21446] MCW rank 4 bound to socket 0[core 0-11]: [B B B B 
B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . 
. . . . . . . . .]
[node01.cluster:21446] MCW rank 5 bound to socket 1[core 0-11]: [. . . . 
. . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . 
. . . . . . . . .]
[node01.cluster:21446] MCW rank 6 bound to socket 2[core 0-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . 
. . . . . . . . .]
[node01.cluster:21446] MCW rank 7 bound to socket 3[core 0-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B 
B B B B B B B B B]
[node01.cluster:21446] MCW rank 0 bound to socket 0[core 0-11]: [B B B B 
B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . 
. . . . . . . . .]


So each process is bound to *two* NUMA nodes, but I wan't to bind to 
*one* NUMA node.


What I want is more like this:
root@node01:~> cat rankfile
rank 0=localhost slot=0-5
rank 1=localhost slot=6-11
rank 2=localhost slot=12-17
rank 3=localhost slot=18-23
rank 4=localhost slot=24-29
rank 5=localhost slot=30-35
rank 6=localhost slot=36-41
rank 7=localhost slot=42-47
root@node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 
--rankfile rankfile sleep 1s
[node01.cluster:21505] MCW rank 1 bound to socket 0[core 6-11]: [. . . . 
. . B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . 
. . . . . . . . .] (slot list 6-11)
[node01.cluster:21505] MCW rank 2 bound to socket 1[core 0-5]: [. . . . 
. . . . . . . .][B B B B B B . . . . . .][. . . . . . . . . . . .][. . . 
. . . . . . . . .] (slot list 12-17)
[node01.cluster:21505] MCW rank 3 bound to socket 1[core 6-11]: [. . . . 
. . . . . . . .][. . . . . . B B B B B B][. . . . . . . . . . . .][. . . 
. . . . . . . . .] (slot list 18-23)
[node01.cluster:21505] MCW rank 4 bound to socket 2[core 0-5]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][B B B B B B . . . . . .][. . . 
. . . . . . . . .] (slot list 24-29)
[node01.cluster:21505] MCW rank 5 bound to socket 2[core 6-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][. . . . . . B B B B B B][. . . 
. . . . . . . . .] (slot list 30-35)
[node01.cluster:21505] MCW rank 6 bound to socket 3[core 0-5]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B 
B B B . . . . . .] (slot list 36-41)
[node01.cluster:21505] MCW rank 7 bound to socket 3[core 6-11]: [. . . . 
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . 
. . . B B B B B B] (slot list 42-47)
[node01.cluster:21505] MCW rank 0 bound to socket 0[core 0-5]: [B B B B 
B B . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . 
. . . . . . . . .] (slot list 0-5)



Actually I'm dreaming of
mpirun --bind-to-NUMAnode --bycore ...
or
mpirun --bind-to-NUMAnode --byNUMAnode ...

Is there any workaround execpt rankfiles for this?

Regards,
 Oliver Weihe


Re: [OMPI users] process binding to NUMA node on Opteron 6xxx series CPUs?

2013-02-14 Thread Ralph Castain
Sure - use the 1.7 branch or the developer's trunk. We have the --bind-to numa 
option there.


On Feb 14, 2013, at 8:54 AM, Oliver Weihe  wrote:

> Hi, 
> 
> is it possible to bind MPI processes to a NUMA node somehow on Opteron 6xxx 
> series CPUs (e.g. --bind-to-NUMAnode) *without* the usage of a rankfile? 
> Opteron 6xxx have two NUMA nodes per CPU(-socket) so --bind-to-socket doesn't 
> work as I want. 
> 
> This is a 4 socket Opteron 6344 (12 CPUs per CPU(-socket)): 
> 
> root@node01:~> numactl --hardware | grep cpus 
> node 0 cpus: 0 1 2 3 4 5 
> node 1 cpus: 6 7 8 9 10 11 
> node 2 cpus: 12 13 14 15 16 17 
> node 3 cpus: 18 19 20 21 22 23 
> node 4 cpus: 24 25 26 27 28 29 
> node 5 cpus: 30 31 32 33 34 35 
> node 6 cpus: 36 37 38 39 40 41 
> node 7 cpus: 42 43 44 45 46 47 
> 
> root@node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 
> --bind-to-socket --bysocket sleep 1s 
> [node01.cluster:21446] MCW rank 1 bound to socket 1[core 0-11]: [. . . . . . 
> . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] 
> [node01.cluster:21446] MCW rank 2 bound to socket 2[core 0-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . 
> . . . . .] 
> [node01.cluster:21446] MCW rank 3 bound to socket 3[core 0-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B 
> B B B B B] 
> [node01.cluster:21446] MCW rank 4 bound to socket 0[core 0-11]: [B B B B B B 
> B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] 
> [node01.cluster:21446] MCW rank 5 bound to socket 1[core 0-11]: [. . . . . . 
> . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] 
> [node01.cluster:21446] MCW rank 6 bound to socket 2[core 0-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . 
> . . . . .] 
> [node01.cluster:21446] MCW rank 7 bound to socket 3[core 0-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B 
> B B B B B] 
> [node01.cluster:21446] MCW rank 0 bound to socket 0[core 0-11]: [B B B B B B 
> B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] 
> 
> So each process is bound to *two* NUMA nodes, but I wan't to bind to *one* 
> NUMA node. 
> 
> What I want is more like this: 
> root@node01:~> cat rankfile 
> rank 0=localhost slot=0-5 
> rank 1=localhost slot=6-11 
> rank 2=localhost slot=12-17 
> rank 3=localhost slot=18-23 
> rank 4=localhost slot=24-29 
> rank 5=localhost slot=30-35 
> rank 6=localhost slot=36-41 
> rank 7=localhost slot=42-47 
> root@node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 
> --rankfile rankfile sleep 1s 
> [node01.cluster:21505] MCW rank 1 bound to socket 0[core 6-11]: [. . . . . . 
> B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] (slot list 6-11) 
> [node01.cluster:21505] MCW rank 2 bound to socket 1[core 0-5]: [. . . . . . . 
> . . . . .][B B B B B B . . . . . .][. . . . . . . . . . . .][. . . . . . . . 
> . . . .] (slot list 12-17) 
> [node01.cluster:21505] MCW rank 3 bound to socket 1[core 6-11]: [. . . . . . 
> . . . . . .][. . . . . . B B B B B B][. . . . . . . . . . . .][. . . . . . . 
> . . . . .] (slot list 18-23) 
> [node01.cluster:21505] MCW rank 4 bound to socket 2[core 0-5]: [. . . . . . . 
> . . . . .][. . . . . . . . . . . .][B B B B B B . . . . . .][. . . . . . . . 
> . . . .] (slot list 24-29) 
> [node01.cluster:21505] MCW rank 5 bound to socket 2[core 6-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][. . . . . . B B B B B B][. . . . . . . 
> . . . . .] (slot list 30-35) 
> [node01.cluster:21505] MCW rank 6 bound to socket 3[core 0-5]: [. . . . . . . 
> . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B . . 
> . . . .] (slot list 36-41) 
> [node01.cluster:21505] MCW rank 7 bound to socket 3[core 6-11]: [. . . . . . 
> . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . B 
> B B B B B] (slot list 42-47) 
> [node01.cluster:21505] MCW rank 0 bound to socket 0[core 0-5]: [B B B B B B . 
> . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . 
> . . . .] (slot list 0-5) 
> 
> 
> Actually I'm dreaming of 
> mpirun --bind-to-NUMAnode --bycore ... 
> or 
> mpirun --bind-to-NUMAnode --byNUMAnode ... 
> 
> Is there any workaround execpt rankfiles for this? 
> 
> Regards, 
>  Oliver Weihe
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Ralph Castain
I don't think this is documented anywhere, but it is an available trick (not 
sure if it is in 1.6.1, but might be): if you set OPAL_OUTPUT_STDERR_FD=N in 
your environment, we will direct all our error outputs to that file descriptor. 
If it is "0", then it goes to stdout.

Might be worth a try?


On Feb 14, 2013, at 8:38 AM, Bharath Ramesh  wrote:

> Is there any way to prevent the output of more than one node
> written to the same line. I tried setting --output-filename,
> which didnt help. For some reason only stdout was written to the
> files. Making it little bit hard to read close to a 6M output
> file.
> 
> -- 
> Bharath
> 
> On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
>> Sounds like the orteds aren't reporting back to mpirun after launch. The 
>> MPI_proctable observation just means that the procs didn't launch in those 
>> cases where it is absent, which is something you already observed.
>> 
>> Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
>> report back to mpirun after it launches. If not, then it is likely that 
>> something is blocking it.
>> 
>> You could also try updating to 1.6.3/4 in case there is some race condition 
>> in 1.6.1, though we haven't heard of it to-date.
>> 
>> 
>> On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
>> 
>>> On our cluster we are noticing intermediate job launch failure when using 
>>> OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
>>> integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
>>> applications. The issue is that orted gets launched on all the nodes but 
>>> there are a bunch of nodes that dont launch the actual MPI application. 
>>> There are no errors reported when the job gets killed because the walltime 
>>> expires. Enabling --debug-daemons doesnt show any errors either. The only 
>>> difference being that successful runs have MPI_proctable listed and for 
>>> failures this is absent. Any help in debugging this issue is greatly 
>>> appreciated.
>>> 
>>> -- 
>>> Bharath
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Bharath Ramesh
After manually fixing some of the issues I see that the failed
nodes never receive commands to launch the local processes. I am
going to request the admins to look into the logs for any dropped
connections.

On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
> Sounds like the orteds aren't reporting back to mpirun after launch. The 
> MPI_proctable observation just means that the procs didn't launch in those 
> cases where it is absent, which is something you already observed.
> 
> Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
> report back to mpirun after it launches. If not, then it is likely that 
> something is blocking it.
> 
> You could also try updating to 1.6.3/4 in case there is some race condition 
> in 1.6.1, though we haven't heard of it to-date.
> 
> 
> On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
> 
> > On our cluster we are noticing intermediate job launch failure when using 
> > OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
> > integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
> > applications. The issue is that orted gets launched on all the nodes but 
> > there are a bunch of nodes that dont launch the actual MPI application. 
> > There are no errors reported when the job gets killed because the walltime 
> > expires. Enabling --debug-daemons doesnt show any errors either. The only 
> > difference being that successful runs have MPI_proctable listed and for 
> > failures this is absent. Any help in debugging this issue is greatly 
> > appreciated.
> > 
> > -- 
> > Bharath
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Bharath


smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Bharath Ramesh
When I set the OPAL_OUTPUT_STDERR_FD=0 I receive a whole bunch of
mca_oob_tcp_message_recv_complete: invalid message type errors
and the job just hangs even when all the nodes have fired off the
MPI application.


-- 
Bharath

On Thu, Feb 14, 2013 at 09:51:50AM -0800, Ralph Castain wrote:
> I don't think this is documented anywhere, but it is an available trick (not 
> sure if it is in 1.6.1, but might be): if you set OPAL_OUTPUT_STDERR_FD=N in 
> your environment, we will direct all our error outputs to that file 
> descriptor. If it is "0", then it goes to stdout.
> 
> Might be worth a try?
> 
> 
> On Feb 14, 2013, at 8:38 AM, Bharath Ramesh  wrote:
> 
> > Is there any way to prevent the output of more than one node
> > written to the same line. I tried setting --output-filename,
> > which didnt help. For some reason only stdout was written to the
> > files. Making it little bit hard to read close to a 6M output
> > file.
> > 
> > -- 
> > Bharath
> > 
> > On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
> >> Sounds like the orteds aren't reporting back to mpirun after launch. The 
> >> MPI_proctable observation just means that the procs didn't launch in those 
> >> cases where it is absent, which is something you already observed.
> >> 
> >> Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
> >> report back to mpirun after it launches. If not, then it is likely that 
> >> something is blocking it.
> >> 
> >> You could also try updating to 1.6.3/4 in case there is some race 
> >> condition in 1.6.1, though we haven't heard of it to-date.
> >> 
> >> 
> >> On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
> >> 
> >>> On our cluster we are noticing intermediate job launch failure when using 
> >>> OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
> >>> integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
> >>> applications. The issue is that orted gets launched on all the nodes but 
> >>> there are a bunch of nodes that dont launch the actual MPI application. 
> >>> There are no errors reported when the job gets killed because the 
> >>> walltime expires. Enabling --debug-daemons doesnt show any errors either. 
> >>> The only difference being that successful runs have MPI_proctable listed 
> >>> and for failures this is absent. Any help in debugging this issue is 
> >>> greatly appreciated.
> >>> 
> >>> -- 
> >>> Bharath
> >>> 
> >>> 
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> 
> >> 
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Ralph Castain
Rats - sorry.

I seem to recall fixing something in 1.6 that might relate to this - a race 
condition in the startup. You might try updating to the 1.6.4 release candidate.


On Feb 14, 2013, at 11:04 AM, Bharath Ramesh  wrote:

> When I set the OPAL_OUTPUT_STDERR_FD=0 I receive a whole bunch of
> mca_oob_tcp_message_recv_complete: invalid message type errors
> and the job just hangs even when all the nodes have fired off the
> MPI application.
> 
> 
> -- 
> Bharath
> 
> On Thu, Feb 14, 2013 at 09:51:50AM -0800, Ralph Castain wrote:
>> I don't think this is documented anywhere, but it is an available trick (not 
>> sure if it is in 1.6.1, but might be): if you set OPAL_OUTPUT_STDERR_FD=N in 
>> your environment, we will direct all our error outputs to that file 
>> descriptor. If it is "0", then it goes to stdout.
>> 
>> Might be worth a try?
>> 
>> 
>> On Feb 14, 2013, at 8:38 AM, Bharath Ramesh  wrote:
>> 
>>> Is there any way to prevent the output of more than one node
>>> written to the same line. I tried setting --output-filename,
>>> which didnt help. For some reason only stdout was written to the
>>> files. Making it little bit hard to read close to a 6M output
>>> file.
>>> 
>>> -- 
>>> Bharath
>>> 
>>> On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
 Sounds like the orteds aren't reporting back to mpirun after launch. The 
 MPI_proctable observation just means that the procs didn't launch in those 
 cases where it is absent, which is something you already observed.
 
 Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted 
 report back to mpirun after it launches. If not, then it is likely that 
 something is blocking it.
 
 You could also try updating to 1.6.3/4 in case there is some race 
 condition in 1.6.1, though we haven't heard of it to-date.
 
 
 On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
 
> On our cluster we are noticing intermediate job launch failure when using 
> OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is 
> integrated with Torque-4.1.3. It failes even for a simple MPI hello world 
> applications. The issue is that orted gets launched on all the nodes but 
> there are a bunch of nodes that dont launch the actual MPI application. 
> There are no errors reported when the job gets killed because the 
> walltime expires. Enabling --debug-daemons doesnt show any errors either. 
> The only difference being that successful runs have MPI_proctable listed 
> and for failures this is absent. Any help in debugging this issue is 
> greatly appreciated.
> 
> -- 
> Bharath
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] qsub error

2013-02-14 Thread Erik Nelson
I'm encountering an error using qsub that none of us can figure out. MPI
C++ programs seem to
run fine when executed from the command line, but for some reason when I
submit them through
the queue I get a strange error message ..


[compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]

connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied
(13)


the compute node 3-12 doesn't matter (the error can generate from any of
the nodes, and I'm
guessing that 3-12 is the parent node here).

To check if there was some problem with my own code, I created a simple
'hello world' program
(see attached files).

Again, the program runs fine from the command line but fails in qsub with
the same sort of error
message.

I have included (i) the code (ii) the job script for qsub, and (iii) the
".o" file from qsub for the
"hello world" program.

These don't look like MPI errors, but rather some conflict with, maybe,
secure communication
accross nodes.

Is there something simple I can do to fix this?

Thanks,

Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948
#include 
#include "/opt/openmpi/include/mpi.h"

#define bufdim128

int main(int argc, char *argv[])
{
char buffer[bufdim];
char id_str[32];

//  mpi :
MPI::Init(argc,argv);
MPI::Status status;

int size;
int rank;
int tag;

size=MPI::COMM_WORLD.Get_size();
rank=MPI::COMM_WORLD.Get_rank();
tag=0;

if (rank==0) {
	printf("%d: we have %d processors\n",rank,size);
	int i;
	i=1;
	for ( ;i

hello.job
Description: Binary data


hello.job.o5822590
Description: Binary data


Re: [OMPI users] OpenMPI job launch failures

2013-02-14 Thread Bharath Ramesh
I ran 15 test jobs using 1.6.4rc3 all of them successful. Unlike
1.6.1 where I would have around 40% of my jobs fail. Thanks for
the help really appreciate it.

-- 
Bharath


On Thu, Feb 14, 2013 at 11:59:06AM -0800, Ralph Castain wrote:
> Rats - sorry.
> 
> I seem to recall fixing something in 1.6 that might relate to this - a race 
> condition in the startup. You might try updating to the 1.6.4 release 
> candidate.
> 
> 
> On Feb 14, 2013, at 11:04 AM, Bharath Ramesh  wrote:
> 
> > When I set the OPAL_OUTPUT_STDERR_FD=0 I receive a whole bunch of
> > mca_oob_tcp_message_recv_complete: invalid message type errors
> > and the job just hangs even when all the nodes have fired off the
> > MPI application.
> > 
> > 
> > -- 
> > Bharath
> > 
> > On Thu, Feb 14, 2013 at 09:51:50AM -0800, Ralph Castain wrote:
> >> I don't think this is documented anywhere, but it is an available trick 
> >> (not sure if it is in 1.6.1, but might be): if you set 
> >> OPAL_OUTPUT_STDERR_FD=N in your environment, we will direct all our error 
> >> outputs to that file descriptor. If it is "0", then it goes to stdout.
> >> 
> >> Might be worth a try?
> >> 
> >> 
> >> On Feb 14, 2013, at 8:38 AM, Bharath Ramesh  wrote:
> >> 
> >>> Is there any way to prevent the output of more than one node
> >>> written to the same line. I tried setting --output-filename,
> >>> which didnt help. For some reason only stdout was written to the
> >>> files. Making it little bit hard to read close to a 6M output
> >>> file.
> >>> 
> >>> -- 
> >>> Bharath
> >>> 
> >>> On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote:
>  Sounds like the orteds aren't reporting back to mpirun after launch. The 
>  MPI_proctable observation just means that the procs didn't launch in 
>  those cases where it is absent, which is something you already observed.
>  
>  Set "-mca plm_base_verbose 5" on your cmd line. You should see each 
>  orted report back to mpirun after it launches. If not, then it is likely 
>  that something is blocking it.
>  
>  You could also try updating to 1.6.3/4 in case there is some race 
>  condition in 1.6.1, though we haven't heard of it to-date.
>  
>  
>  On Feb 14, 2013, at 7:21 AM, Bharath Ramesh  wrote:
>  
> > On our cluster we are noticing intermediate job launch failure when 
> > using OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and 
> > it is integrated with Torque-4.1.3. It failes even for a simple MPI 
> > hello world applications. The issue is that orted gets launched on all 
> > the nodes but there are a bunch of nodes that dont launch the actual 
> > MPI application. There are no errors reported when the job gets killed 
> > because the walltime expires. Enabling --debug-daemons doesnt show any 
> > errors either. The only difference being that successful runs have 
> > MPI_proctable listed and for failures this is absent. Any help in 
> > debugging this issue is greatly appreciated.
> > 
> > -- 
> > Bharath
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
>  
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> 
> >> 
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


smime.p7s
Description: S/MIME cryptographic signature