Re: [OMPI users] 3.1.1 Bindings Change

2018-07-04 Thread Jeff Squyres (jsquyres) via users
Greetings Matt.  

https://github.com/open-mpi/ompi/commit/4d126c16fa82c64a9a4184bc77e967a502684f02
 is the specific commit where the fixes came in.

Here's a little creative grepping that shows the APIs affected (there's also 
some callback function signatures that were fixed, too, but they're not as 
important):

$ git show 4d126c16fa82c64a9a4184bc77e967a502684f02 | fgrep '+subroutine MPI_' 
-i | sort | uniq
+subroutine MPI_Type_commit(datatype, ierror)
+subroutine MPI_Type_delete_attr(datatype, type_keyval, ierror)
+subroutine MPI_Type_delete_attr_f08(datatype,type_keyval,ierror)
+subroutine MPI_Type_dup(datatype, newtype, ierror)
+subroutine MPI_Type_dup(oldtype, newtype, ierror)
+subroutine MPI_Type_dup_f08(datatype,newtype,ierror)
+subroutine MPI_Type_extent(datatype, extent, ierror)
+subroutine MPI_Type_free(datatype, ierror)
+subroutine MPI_Type_get_attr(datatype, type_keyval, attribute_val, flag, 
ierror)
+subroutine 
MPI_Type_get_attr_f08(datatype,type_keyval,attribute_val,flag,ierror)
+subroutine MPI_Type_get_contents(datatype, max_integers, max_addresses, 
max_datatypes, array_of_integers, &
+subroutine MPI_Type_get_envelope(datatype, num_integers, num_addresses, 
num_datatypes, combiner&
+subroutine MPI_Type_get_extent(datatype, lb, extent, ierror)
+subroutine MPI_Type_get_extent_x(datatype, lb, extent, ierror)
+subroutine MPI_Type_get_name(datatype, type_name, resultlen, ierror)
+subroutine MPI_Type_get_name_f08(datatype,type_name,resultlen,ierror)
+subroutine MPI_Type_lb(datatype, lb, ierror)
+subroutine MPI_Type_match_size(typeclass, size, datatype, ierror)
+subroutine MPI_Type_match_size_f08(typeclass,size,datatype,ierror)
+subroutine MPI_Type_set_attr(datatype, type_keyval, attr_val, ierror)
+subroutine MPI_Type_set_attr_f08(datatype,type_keyval,attribute_val,ierror)
+subroutine MPI_Type_set_name(datatype, type_name, ierror)
+subroutine MPI_Type_set_name_f08(datatype,type_name,ierror)
+subroutine MPI_Type_size(datatype, size, ierror)
+subroutine MPI_Type_size_x(datatype, size, ierror)
+subroutine MPI_Type_ub(datatype, ub, ierror)


> On Jul 3, 2018, at 9:30 AM, Matt Thompson  wrote:
> 
> Dear Open MPI Gurus,
> 
> In the latest 3.1.1 announcement, I saw:
> 
> - Fix dummy variable names for the mpi and mpi_f08 Fortran bindings to
>   match the MPI standard.  This may break applications which use
>   name-based parameters in Fortran which used our internal names
>   rather than those documented in the MPI standard.
> 
> Is there an example of this change somewhere (in the Git issues or another 
> place)? I don't think we have anything in our software that would be hit by 
> this (since we test/run our code with Intel MPI, MPT as well as Open MPI), 
> but I want to be sure we don't have some hidden #ifdef OPENMPI somewhere.
> 
> Matt
> 
> -- 
> Matt Thompson
>“The fact is, this is about us identifying what we do best and 
>finding more ways of doing less of it better” -- Director of Better Anna 
> Rampton
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Verbose output for MPI

2018-07-04 Thread Maksym Planeta

Hello,

I have troubles figuring out how can I configure verbose output 
properly. There is a call to pmix_output_verbose in 
opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp.c in function try_connect:


pmix_output_verbose(2, pmix_ptl_base_framework.framework_output,
"pmix:tcp try connect to %s",
mca_ptl_tcp_component.super.uri);

I'm confident that the control flow goes through this function call, 
because I see a log message from line 692:


PMIX ERROR: ERROR STRING NOT FOUND in file ptl_tcp.c at line 692

But my attempts to configure mca parameters properly failed.

Could you help me with the exact parameters to force the 
pmix_output_verbose be active?


--
Regards,
Maksym Planeta
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread Nathan Hjelm via users
--mca pmix_base_verbose 100

> On Jul 4, 2018, at 9:15 AM, Maksym Planeta  
> wrote:
> 
> Hello,
> 
> I have troubles figuring out how can I configure verbose output properly. 
> There is a call to pmix_output_verbose in 
> opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp.c in function try_connect:
> 
>pmix_output_verbose(2, pmix_ptl_base_framework.framework_output,
>"pmix:tcp try connect to %s",
>mca_ptl_tcp_component.super.uri);
> 
> I'm confident that the control flow goes through this function call, because 
> I see a log message from line 692:
> 
> PMIX ERROR: ERROR STRING NOT FOUND in file ptl_tcp.c at line 692
> 
> But my attempts to configure mca parameters properly failed.
> 
> Could you help me with the exact parameters to force the pmix_output_verbose 
> be active?
> 
> -- 
> Regards,
> Maksym Planeta
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread Maksym Planeta

Thanks for quick response,

I tried this out and I do get more output: 
https://pastebin.com/JkXAYdM4. But the line I need does not appear in 
the output.


On 04/07/18 17:38, Nathan Hjelm via users wrote:

--mca pmix_base_verbose 100


--
Regards,
Maksym Planeta
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread r...@open-mpi.org
Try adding PMIX_MCA_ptl_base_verbose=10 to your environment

> On Jul 4, 2018, at 8:51 AM, Maksym Planeta  
> wrote:
> 
> Thanks for quick response,
> 
> I tried this out and I do get more output: https://pastebin.com/JkXAYdM4. But 
> the line I need does not appear in the output.
> 
> On 04/07/18 17:38, Nathan Hjelm via users wrote:
>> --mca pmix_base_verbose 100
> -- 
> Regards,
> Maksym Planeta
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread Maksym Planeta

That worked out. Thank you!

On 04/07/18 19:26, r...@open-mpi.org wrote:

Try adding PMIX_MCA_ptl_base_verbose=10 to your environment


On Jul 4, 2018, at 8:51 AM, Maksym Planeta  
wrote:

Thanks for quick response,

I tried this out and I do get more output: https://pastebin.com/JkXAYdM4. But 
the line I need does not appear in the output.

On 04/07/18 17:38, Nathan Hjelm via users wrote:

--mca pmix_base_verbose 100

--
Regards,
Maksym Planeta
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



--
Regards,
Maksym Planeta
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Disable network interface selection

2018-07-04 Thread carlos aguni
Hi Gilles.

Thank you for your reply! :)
I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work
fine now.
Running `mpirun -n 3 -host c01,c02,c03 hostname` i get:
c01
c02
c03

`mpirun -n 2 -host c01,c02 hostname`:
c02
c01

`mpirun -n 2 -host c01,c03 hostname`:
c01
c03

Which is expected.

Now when I run a MPI_Spawn it prints out a warning message which refers to
it getting the wrong IP.
Check the command. I'll highlight some verbose.
`mpirun -n 1 --machinefile con_c03_hostfile --mca oob_base_verbose 10
con_c03`:
Hello world from processor c01, rank 0 out of 2 processors
Im the spawned rank 0
Hello world from processor c03, rank 1 out of 2 processors
[[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024:
Network is unreachable

[c03:06355] pml_ob1_sendreq.c:235 FATAL

Verbose below:
[c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of
V4 connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4
connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4
connections

Is there a way to suppress it?

My env is as described below:
*c01*
ens8 10.0.0.1/24
ens9 172.16.0.1/24
eth0 172.21.1.136/24

*c02*
eth0 10.0.0.2/24

*c03*
ens8 192.168.0.1/24
eth1 172.16.0.2/24

*c04*
eth0 192.168.0.2/24

Regards,
Carlos.

On Sun, Jul 1, 2018 at 9:01 PM, Gilles Gouaillardet 
wrote:

> Carlos,
>
>
> Open MPI 3.0.2 has been released, and it contains several bug fixes, so I
> do
>
> encourage you to upgrade and try again.
>
>
>
> if it still does not work, can you please run
>
> mpirun --mca oob_base_verbose 10 ...
>
> and then compress and post the output ?
>
>
> out of curiosity, would
>
> mpirun --mca routed_radix 1 ...
>
> work in your environment ?
>
>
> once we can analyze the logs, we should be able to figure out what is
> going wrong.
>
>
> Cheers,
>
> Gilles
>
> On 6/29/2018 4:10 AM, carlos aguni wrote:
>
>> Just realized my email wasn't sent to the archive.
>>
>> On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni > > wrote:
>>
>> Hi!
>>
>> Thank you all for your reply Jeff, Gilles and rhc.
>>
>> Thank you Jeff and rhc for clarifying to me some of the openmpi's
>> internals.
>>
>> >> FWIW: we never send interface names to other hosts - just dot
>> addresses
>> > Should have clarified - when you specify an interface name for the
>> MCA param, then it is the interface name that is transferred as
>> that is the value of the MCA param. However, once we determine our
>> address, we only transfer dot addresses between ourselves
>>
>> If only dot addresses are sent to the hosts then why doesn't
>> openmpi use the default route like `ip route get `
>> instead of choosing a random one? Is it an expected behaviour? Can
>> it be changed?
>>
>> Sorry. As Gilles pointed out I forgot to mention which openmpi
>> version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
>> openhpc. Centos 7.5.
>>
>> > mpirun—mca oob_tcp_if_exclude192.168.100.0/24
>> ...
>>
>> I cannot just exclude that interface cause after that I want to
>> add another computer that's on a different network. And this is
>> where things get messy :( I cannot just include and exclude
>> networks cause I have different machines on different networks.
>> This is what I want to achieve:
>>
>>
>>
>>
>> compute01
>>
>>
>>
>> compute02
>>
>>
>>
>> compute03
>>
>> ens3
>>
>>
>>
>> 192.168.100.104/24 
>>
>>
>>
>> 10.0.0.227/24 
>>
>>
>>
>> 192.168.100.105/24 
>>
>> ens8
>>
>>
>>
>> 10.0.0.228/24 
>>
>>
>>
>> 172.21.1.128/24 
>>
>>
>>
>> ---
>>
>> ens9
>>
>>
>>
>> 172.21.1.155/24 
>>
>>
>>
>> ---
>>
>>
>>
>> ---
>>
>>
>> So I'm in compute01 MPI_spawning another process on compute02 and
>> compute03.
>> With both MPI_Spawn and `mpirun -n 3 -host
>> compute01,compute02,compute03 hostname`
>>
>> Then when I include the mca parameters I get this:
>> `mpirun --oversubscribe --allow-run-as-root -n 3 --mca
>> oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
>>  -host
>> compute01,compute02,compute03 hostname`
>> WARNING: An invalid value was given for oob_tcp_if_include. This
>> value will be ignored.
>> ...
>> Message:Did not find interface matching this subnet
>>
>> This would all work if it were to use the system's internals like
>> `ip route`.
>>
>> Best regards,
>> Carlos.
>>
>>
>>
>>

[OMPI users] COLL-ML ATTENTION

2018-07-04 Thread larkym via users
Good evening,
Can someone help me understand the following error I am getting?
[coll_ml_mca.c:471:hmca_coll_ml_register_params] COLL-ML ATTENTION: Available 
IPoIB interface was not found MCAST capability will be disabled. 
I am currently using open mpi 2.0 that comes with Mellanox. I am running CentOS 
7.x. It is multihomed. 
Is MPI possibly using one of my NICs that does not support RoCE?

Sent from my Verizon, Samsung Galaxy smartphone___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users