Re: [OMPI users] disabling libraries?

2018-04-10 Thread Jeff Squyres (jsquyres)
On Apr 10, 2018, at 9:03 AM, Michael Di Domenico  wrote:
> 
>> We've actually been arguing about exactly how to do this for quite a while.  
>> It's complicated (I can explain further, if you care).  :-\
> 
> i have no doubt its complicated.  i'm not overly interested in the
> detail, but others i'm sure might be.  

The crux of the issue is that there can be/are multiple ways to reach the same 
underlying transport via Open MPI.  E.g., you can use InfiniBand via the openib 
BTL, the UCX PML, the Yalla PML, the hcoll collectives, ...etc.  This results 
in a challenging UI issue: how do you have a simple yet flexible UI that allows 
users to choose *which* way they want to reach a given underlying transport, 
and for which cases?

We have the "mca" run-time parameters, and it's quite flexible.  But it's not 
simple for end users, especially if you don't know Open MPI's myriad of 
underlying framework and component names.

> but my users will complain and ask questions.

Understood / agreed.

> achieving a single build binary where i can disable the
> interconnects/libraries at runtime would be HIGHLY beneficial to me
> (perhaps others as well).  it cuts my build version combinations from
> like 12 to 4 (or less), that's a huge reduction in labour/maintenance.
> which also means i can upgrade openmpi quicker and stay more up to
> date.

Understood.

>> That being said, I think we *do* have a workaround that might be good enough 
>> for you: disable those warnings about plugins not being able to be opened:
>> mpirun --mca mca_component_show_load_errors 0 ...
> 
> disabled this: mca_base_component_repository_open: unable to open
> mca_oob_ud: libibverbs.so.1
> but not this: pmix_mca_base_component_repository_open: unable to open
> mca_pnet_opa: libpsm2.so.2

Ahh, missed that one.

Short version: set the environment variable 
PMIX_MCA_mca_component_show_load_errors=0, too.

More detail:

We actually embed the PMIx library in Open MPI.  PMIx was derived from some of 
the core guts of Open MPI (including the MCA parameter/run-time variable 
system), so it has its own MCA params.  They propagate *slightly* differently 
then OMPI (for reasons that aren't interesting here), so you can't use "--mca 
foo bar" to set them on the mpirun command line.  But you can set an env 
variable, like I showed above, and it should do the trick for you.

But this raises an interesting point: we should be automatically setting this 
PMIX variable for you based on Open MPI's mca_component_show_load_errors' 
value.  I filed https://github.com/open-mpi/ompi/pull/5049 to get this into a 
future release.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] disabling libraries?

2018-04-10 Thread Michael Di Domenico
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres)
 wrote:
> On Apr 6, 2018, at 8:12 AM, Michael Di Domenico  
> wrote:
>> it would be nice if openmpi had (or may already have) a simple switch
>> that lets me disable entire portions of the library chain, ie this
>> host doesn't have a particular interconnect, so don't load any of the
>> libraries.  this might run counter to how openmpi discovers and load
>> libs though.
>
> We've actually been arguing about exactly how to do this for quite a while.  
> It's complicated (I can explain further, if you care).  :-\

i have no doubt its complicated.  i'm not overly interested in the
detail, but others i'm sure might be.  in reality you're correct, i
don't care that openmpi failed to load the libs given the fact that
the job continues to run without issue.  and in fact i don't even care
about the warnings, but my users will complain and ask questions.

achieving a single build binary where i can disable the
interconnects/libraries at runtime would be HIGHLY beneficial to me
(perhaps others as well).  it cuts my build version combinations from
like 12 to 4 (or less), that's a huge reduction in labour/maintenance.
which also means i can upgrade openmpi quicker and stay more up to
date.

i would garner this is probably not a high priority for the team
working on openmpi, but if there's something my organization or I can
do to push this higher, let me know.

> That being said, I think we *do* have a workaround that might be good enough 
> for you: disable those warnings about plugins not being able to be opened:
> mpirun --mca mca_component_show_load_errors 0 ...

disabled this: mca_base_component_repository_open: unable to open
mca_oob_ud: libibverbs.so.1
but not this: pmix_mca_base_component_repository_open: unable to open
mca_pnet_opa: libpsm2.so.2
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] disabling libraries?

2018-04-07 Thread Jeff Squyres (jsquyres)
On Apr 6, 2018, at 8:12 AM, Michael Di Domenico  wrote:
> 
> so the resulting warnings i get
> 
> mca_btl_openib: lbrdmacm.so.1
> mca_btl_usnic: libfabric.so.1
> mca_oob_ud: libibverbs.so.1
> mca_mtl_mxm: libmxm.so.2
> mca_mtl_ofi: libfabric.so.1
> mca_mtl_psm: libpsm_infinipath.so.1
> mca_mtl_psm2: libpsm2.so.2
> mca_pml_yalla: libmxm.so.2
> 
> you referenced them as "errors" above, but mpi actually runs just fine
> for me even with these msgs, so i would consider them more warnings.

Yeah, they're warnings.  They're basically telling you that the relevant 
plugins can't be opened (because the libraries they depend on aren't there).

In your case, that's actually exactly what you want, because even if plugin X 
can't be found, you really wanted plugin Y on that platform, anyway.

> it would be nice if openmpi had (or may already have) a simple switch
> that lets me disable entire portions of the library chain, ie this
> host doesn't have a particular interconnect, so don't load any of the
> libraries.  this might run counter to how openmpi discovers and load
> libs though.

We've actually been arguing about exactly how to do this for quite a while.  
It's complicated (I can explain further, if you care).  :-\

That being said, I think we *do* have a workaround that might be good enough 
for you: disable those warnings about plugins not being able to be opened:

mpirun --mca mca_component_show_load_errors 0 ...

(or put "mca_component_show_load_errors=0" in the system-wide 
openmpi-mca-params.conf file)

I believe that will disable the messages for you.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] disabling libraries?

2018-04-06 Thread Ankita m
Thank You so much sir. I will discuss about this with y Supervisor and will
proceed accordingly



On Fri, Apr 6, 2018 at 5:42 PM, Michael Di Domenico 
wrote:

> On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet
>  wrote:
> > That being said, the error suggest mca_oob_ud.so is a module from a
> > previous install,
> > Open MPI was not built on the system it is running, or libibverbs.so.1
> > has been removed after
> > Open MPI was built.
>
> yes, understood, i compiled openmpi on a node that has all the
> libraries installed for our various interconnects, opa/psm/mxm/ib, but
> i ran mpirun on a node that has none of them
>
> so the resulting warnings i get
>
> mca_btl_openib: lbrdmacm.so.1
> mca_btl_usnic: libfabric.so.1
> mca_oob_ud: libibverbs.so.1
> mca_mtl_mxm: libmxm.so.2
> mca_mtl_ofi: libfabric.so.1
> mca_mtl_psm: libpsm_infinipath.so.1
> mca_mtl_psm2: libpsm2.so.2
> mca_pml_yalla: libmxm.so.2
>
> you referenced them as "errors" above, but mpi actually runs just fine
> for me even with these msgs, so i would consider them more warnings.
>
> > So I do encourage you to take a step back, and think if you can find a
> > better solution for your site.
>
> there are two alternatives
>
> 1 i can compile a specific version of openmpi for each of our clusters
> with each specific interconnect libraries
>
> 2 i can install all the libraries on all the machines regardless of
> whether the interconnect is present
>
> both are certainly plausible, but my effort here is to see if i can
> reduce the size of our software stack and/or reduce the number of
> compiled versions of openmpi
>
> it would be nice if openmpi had (or may already have) a simple switch
> that lets me disable entire portions of the library chain, ie this
> host doesn't have a particular interconnect, so don't load any of the
> libraries.  this might run counter to how openmpi discovers and load
> libs though.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] disabling libraries?

2018-04-06 Thread Michael Di Domenico
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet
 wrote:
> That being said, the error suggest mca_oob_ud.so is a module from a
> previous install,
> Open MPI was not built on the system it is running, or libibverbs.so.1
> has been removed after
> Open MPI was built.

yes, understood, i compiled openmpi on a node that has all the
libraries installed for our various interconnects, opa/psm/mxm/ib, but
i ran mpirun on a node that has none of them

so the resulting warnings i get

mca_btl_openib: lbrdmacm.so.1
mca_btl_usnic: libfabric.so.1
mca_oob_ud: libibverbs.so.1
mca_mtl_mxm: libmxm.so.2
mca_mtl_ofi: libfabric.so.1
mca_mtl_psm: libpsm_infinipath.so.1
mca_mtl_psm2: libpsm2.so.2
mca_pml_yalla: libmxm.so.2

you referenced them as "errors" above, but mpi actually runs just fine
for me even with these msgs, so i would consider them more warnings.

> So I do encourage you to take a step back, and think if you can find a
> better solution for your site.

there are two alternatives

1 i can compile a specific version of openmpi for each of our clusters
with each specific interconnect libraries

2 i can install all the libraries on all the machines regardless of
whether the interconnect is present

both are certainly plausible, but my effort here is to see if i can
reduce the size of our software stack and/or reduce the number of
compiled versions of openmpi

it would be nice if openmpi had (or may already have) a simple switch
that lets me disable entire portions of the library chain, ie this
host doesn't have a particular interconnect, so don't load any of the
libraries.  this might run counter to how openmpi discovers and load
libs though.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] disabling libraries?

2018-04-05 Thread Gilles Gouaillardet
Michael,

in this case, you can
mpirun --mca oob ^ud ...
in order to blacklist the oob/ud component.

an alternative is to add
oob = ^ud
in /.../etc/openmpi-mca-params.conf

If Open MPI is installed on a local filesystem, then this setting can
be node specific.


That being said, the error suggest mca_oob_ud.so is a module from a
previous install,
Open MPI was not built on the system it is running, or libibverbs.so.1
has been removed after
Open MPI was built.
So I do encourage you to take a step back, and think if you can find a
better solution for your site.


Cheers,

Gilles

On Fri, Apr 6, 2018 at 3:37 AM, Michael Di Domenico
 wrote:
> i'm trying to compile openmpi to support all of our interconnects,
> psm/openib/mxm/etc
>
> this works fine, openmpi finds all the libs, compiles and runs on each
> of the respective machines
>
> however, we don't install the libraries for everything everywhere
>
> so when i run things like ompi_info and mpirun i get
>
> mca_base_component_reposity_open: unable to open mca_oob_ud:
> libibverbs.so.1: cannot open shared object file: no such file or
> directory (ignored)
>
> and so on, for a bunch of other libs.
>
> i understand how the lib linking works so this isn't unexpected and
> doesn't stop the mpi programs from running.
>
> here's the part i don't understand, how can i trace the above warning
> and others like it back the required --mca parameters i need to add
> into the configuration to make the warnings go away?
>
> as an aside, i believe i can set most of them via environment
> variables as well as the command, but what i really like to do is set
> them from a file.  i know i can create a default param file, but is
> there a way to feed a param file at invocation depending where mpirun
> is being run?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects,
psm/openib/mxm/etc

this works fine, openmpi finds all the libs, compiles and runs on each
of the respective machines

however, we don't install the libraries for everything everywhere

so when i run things like ompi_info and mpirun i get

mca_base_component_reposity_open: unable to open mca_oob_ud:
libibverbs.so.1: cannot open shared object file: no such file or
directory (ignored)

and so on, for a bunch of other libs.

i understand how the lib linking works so this isn't unexpected and
doesn't stop the mpi programs from running.

here's the part i don't understand, how can i trace the above warning
and others like it back the required --mca parameters i need to add
into the configuration to make the warnings go away?

as an aside, i believe i can set most of them via environment
variables as well as the command, but what i really like to do is set
them from a file.  i know i can create a default param file, but is
there a way to feed a param file at invocation depending where mpirun
is being run?
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users