[OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-19 Thread Siegmar Gross

Hi,

I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 (x86_64)" 
with gcc-6.4.0. Today I discovered that I get an error for --map-by that I don't

get with older versions.


loki fd1026 115 which mpiexec
/usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:00 CET 2017
,...

loki fd1026 107 which mpiexec
/usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:27 CET 2017
...

loki fd1026 107 which mpiexec
/usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
[loki:32662] SETTING BINDING TO CORE
[pc02:04420] SETTING BINDING TO CORE
[pc03:04788] SETTING BINDING TO CORE
--
The request to bind processes could not be completed due to
an internal error - the locale of the following process was
not set by the mapper code:

  Process:  [[57386,1],3]

Please contact the OMPI developers for assistance. Meantime,
you will still be able to run your application without binding
by specifying "--bind-to none" on your command line.
--
--
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[57386,0],0] on node loki
  Remote daemon: [[57386,0],2] on node pc03

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--
[loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt / 
rmaps:no-locale
[loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / 
error messages

loki fd1026 109



I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
I just checked the head of both the master and 3.0.x branches, and they both 
work fine:

$ mpirun  --map-by ppr:1:socket:pe=1 date
[rhc001:139231] SETTING BINDING TO CORE
[rhc002.cluster:203672] SETTING BINDING TO CORE
Wed Dec 20 00:20:55 PST 2017
Wed Dec 20 00:20:55 PST 2017
Tue Dec 19 18:37:03 PST 2017
Tue Dec 19 18:37:03 PST 2017
$

I’ll remove the debug, but it looks like this was already fixed.

> On Dec 19, 2017, at 10:49 PM, Siegmar Gross 
>  wrote:
> 
> Hi,
> 
> I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 
> (x86_64)" with gcc-6.4.0. Today I discovered that I get an error for --map-by 
> that I don't
> get with older versions.
> 
> 
> loki fd1026 115 which mpiexec
> /usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
> loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
> Wed Dec 20 07:41:00 CET 2017
> ,...
> 
> loki fd1026 107 which mpiexec
> /usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
> loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
> Wed Dec 20 07:41:27 CET 2017
> ...
> 
> loki fd1026 107 which mpiexec
> /usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
> loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
> [loki:32662] SETTING BINDING TO CORE
> [pc02:04420] SETTING BINDING TO CORE
> [pc03:04788] SETTING BINDING TO CORE
> --
> The request to bind processes could not be completed due to
> an internal error - the locale of the following process was
> not set by the mapper code:
> 
>  Process:  [[57386,1],3]
> 
> Please contact the OMPI developers for assistance. Meantime,
> you will still be able to run your application without binding
> by specifying "--bind-to none" on your command line.
> --
> --
> ORTE has lost communication with a remote daemon.
> 
>  HNP daemon   : [[57386,0],0] on node loki
>  Remote daemon: [[57386,0],2] on node pc03
> 
> This is usually due to either a failure of the TCP network
> connection to the node, or possibly an internal failure of
> the daemon itself. We cannot recover from this failure, and
> therefore will terminate the job.
> --
> [loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt / 
> rmaps:no-locale
> [loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> loki fd1026 109
> 
> 
> 
> I would be grateful, if somebody can fix the problem. Do you need anything
> else? Thank you very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread Siegmar Gross

Hi Ralph,

the problem occurs when you add --host with a different machine. Without
--host or with "--host " everything works well.

pc03 fd1026 111 which mpiexec
/usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec

pc03 fd1026 112 mpiexec -np 2 --map-by ppr:1:socket:pe=1 date
[pc03:09373] SETTING BINDING TO CORE
Wed Dec 20 10:44:21 CET 2017
Wed Dec 20 10:44:21 CET 2017

pc03 fd1026 113 mpiexec -np 2 --host pc03:2 --map-by ppr:1:socket:pe=1 date
[pc03:09385] SETTING BINDING TO CORE
Wed Dec 20 10:44:43 CET 2017
Wed Dec 20 10:44:43 CET 2017

pc03 fd1026 114 mpiexec -np 2 --host pc02:2 --map-by ppr:1:socket:pe=1 date
[pc03:09395] SETTING BINDING TO CORE
[pc02:08340] SETTING BINDING TO CORE
--
The request to bind processes could not be completed due to
an internal error - the locale of the following process was
not set by the mapper code:
...


Kind regards

Siegmar


On 12/20/17 09:22, r...@open-mpi.org wrote:
I just checked the head of both the master and 3.0.x branches, and they both 
work fine:


$ mpirun  --map-by ppr:1:socket:pe=1 date
[rhc001:139231] SETTING BINDING TO CORE
[rhc002.cluster:203672] SETTING BINDING TO CORE
Wed Dec 20 00:20:55 PST 2017
Wed Dec 20 00:20:55 PST 2017
Tue Dec 19 18:37:03 PST 2017
Tue Dec 19 18:37:03 PST 2017
$

I’ll remove the debug, but it looks like this was already fixed.

On Dec 19, 2017, at 10:49 PM, Siegmar Gross 
> wrote:


Hi,

I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 
(x86_64)" with gcc-6.4.0. Today I discovered that I get an error for --map-by 
that I don't

get with older versions.


loki fd1026 115 which mpiexec
/usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:00 CET 2017
,...

loki fd1026 107 which mpiexec
/usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:27 CET 2017
...

loki fd1026 107 which mpiexec
/usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
[loki:32662] SETTING BINDING TO CORE
[pc02:04420] SETTING BINDING TO CORE
[pc03:04788] SETTING BINDING TO CORE
--
The request to bind processes could not be completed due to
an internal error - the locale of the following process was
not set by the mapper code:

 Process:  [[57386,1],3]

Please contact the OMPI developers for assistance. Meantime,
you will still be able to run your application without binding
by specifying "--bind-to none" on your command line.
--
--
ORTE has lost communication with a remote daemon.

 HNP daemon   : [[57386,0],0] on node loki
 Remote daemon: [[57386,0],2] on node pc03

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--
[loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt / 
rmaps:no-locale
[loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages

loki fd1026 109



I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org 
https://lists.open-mpi.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
Nope - works that way too (running from rhc001):

$ mpirun -H rhc002:24 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 02:08:18 PST 2017
Wed Dec 20 02:08:18 PST 2017
$


> On Dec 20, 2017, at 1:51 AM, Siegmar Gross 
>  wrote:
> 
> Hi Ralph,
> 
> the problem occurs when you add --host with a different machine. Without
> --host or with "--host " everything works well.
> 
> pc03 fd1026 111 which mpiexec
> /usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
> 
> pc03 fd1026 112 mpiexec -np 2 --map-by ppr:1:socket:pe=1 date
> [pc03:09373] SETTING BINDING TO CORE
> Wed Dec 20 10:44:21 CET 2017
> Wed Dec 20 10:44:21 CET 2017
> 
> pc03 fd1026 113 mpiexec -np 2 --host pc03:2 --map-by ppr:1:socket:pe=1 date
> [pc03:09385] SETTING BINDING TO CORE
> Wed Dec 20 10:44:43 CET 2017
> Wed Dec 20 10:44:43 CET 2017
> 
> pc03 fd1026 114 mpiexec -np 2 --host pc02:2 --map-by ppr:1:socket:pe=1 date
> [pc03:09395] SETTING BINDING TO CORE
> [pc02:08340] SETTING BINDING TO CORE
> --
> The request to bind processes could not be completed due to
> an internal error - the locale of the following process was
> not set by the mapper code:
> ...
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> On 12/20/17 09:22, r...@open-mpi.org  wrote:
>> I just checked the head of both the master and 3.0.x branches, and they both 
>> work fine:
>> $ mpirun  --map-by ppr:1:socket:pe=1 date
>> [rhc001:139231] SETTING BINDING TO CORE
>> [rhc002.cluster:203672] SETTING BINDING TO CORE
>> Wed Dec 20 00:20:55 PST 2017
>> Wed Dec 20 00:20:55 PST 2017
>> Tue Dec 19 18:37:03 PST 2017
>> Tue Dec 19 18:37:03 PST 2017
>> $
>> I’ll remove the debug, but it looks like this was already fixed.
>>> On Dec 19, 2017, at 10:49 PM, Siegmar Gross 
>>> >>  
>>> >> >> wrote:
>>> 
>>> Hi,
>>> 
>>> I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 
>>> (x86_64)" with gcc-6.4.0. Today I discovered that I get an error for 
>>> --map-by that I don't
>>> get with older versions.
>>> 
>>> 
>>> loki fd1026 115 which mpiexec
>>> /usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
>>> loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
>>> Wed Dec 20 07:41:00 CET 2017
>>> ,...
>>> 
>>> loki fd1026 107 which mpiexec
>>> /usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
>>> loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
>>> Wed Dec 20 07:41:27 CET 2017
>>> ...
>>> 
>>> loki fd1026 107 which mpiexec
>>> /usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
>>> loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
>>> [loki:32662] SETTING BINDING TO CORE
>>> [pc02:04420] SETTING BINDING TO CORE
>>> [pc03:04788] SETTING BINDING TO CORE
>>> --
>>> The request to bind processes could not be completed due to
>>> an internal error - the locale of the following process was
>>> not set by the mapper code:
>>> 
>>>  Process:  [[57386,1],3]
>>> 
>>> Please contact the OMPI developers for assistance. Meantime,
>>> you will still be able to run your application without binding
>>> by specifying "--bind-to none" on your command line.
>>> --
>>> --
>>> ORTE has lost communication with a remote daemon.
>>> 
>>>  HNP daemon   : [[57386,0],0] on node loki
>>>  Remote daemon: [[57386,0],2] on node pc03
>>> 
>>> This is usually due to either a failure of the TCP network
>>> connection to the node, or possibly an internal failure of
>>> the daemon itself. We cannot recover from this failure, and
>>> therefore will terminate the job.
>>> --
>>> [loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt 
>>> / rmaps:no-locale
>>> [loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>>> help / error messages
>>> loki fd1026 109
>>> 
>>> 
>>> 
>>> I would be grateful, if somebody can fix the problem. Do you need anything
>>> else? Thank you very much for any help in advance.
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org  
>>> >
>>> https://lists.open-mpi.org/mailman/listinfo/users 
>>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/users 
>> 
> ___