Re: [OMPI users] disabling libraries?
On Apr 10, 2018, at 9:03 AM, Michael Di Domenico wrote: > >> We've actually been arguing about exactly how to do this for quite a while. >> It's complicated (I can explain further, if you care). :-\ > > i have no doubt its complicated. i'm not overly interested in the > detail, but others i'm sure might be. The crux of the issue is that there can be/are multiple ways to reach the same underlying transport via Open MPI. E.g., you can use InfiniBand via the openib BTL, the UCX PML, the Yalla PML, the hcoll collectives, ...etc. This results in a challenging UI issue: how do you have a simple yet flexible UI that allows users to choose *which* way they want to reach a given underlying transport, and for which cases? We have the "mca" run-time parameters, and it's quite flexible. But it's not simple for end users, especially if you don't know Open MPI's myriad of underlying framework and component names. > but my users will complain and ask questions. Understood / agreed. > achieving a single build binary where i can disable the > interconnects/libraries at runtime would be HIGHLY beneficial to me > (perhaps others as well). it cuts my build version combinations from > like 12 to 4 (or less), that's a huge reduction in labour/maintenance. > which also means i can upgrade openmpi quicker and stay more up to > date. Understood. >> That being said, I think we *do* have a workaround that might be good enough >> for you: disable those warnings about plugins not being able to be opened: >> mpirun --mca mca_component_show_load_errors 0 ... > > disabled this: mca_base_component_repository_open: unable to open > mca_oob_ud: libibverbs.so.1 > but not this: pmix_mca_base_component_repository_open: unable to open > mca_pnet_opa: libpsm2.so.2 Ahh, missed that one. Short version: set the environment variable PMIX_MCA_mca_component_show_load_errors=0, too. More detail: We actually embed the PMIx library in Open MPI. PMIx was derived from some of the core guts of Open MPI (including the MCA parameter/run-time variable system), so it has its own MCA params. They propagate *slightly* differently then OMPI (for reasons that aren't interesting here), so you can't use "--mca foo bar" to set them on the mpirun command line. But you can set an env variable, like I showed above, and it should do the trick for you. But this raises an interesting point: we should be automatically setting this PMIX variable for you based on Open MPI's mca_component_show_load_errors' value. I filed https://github.com/open-mpi/ompi/pull/5049 to get this into a future release. -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable entire portions of the library chain, ie this >> host doesn't have a particular interconnect, so don't load any of the >> libraries. this might run counter to how openmpi discovers and load >> libs though. > > We've actually been arguing about exactly how to do this for quite a while. > It's complicated (I can explain further, if you care). :-\ i have no doubt its complicated. i'm not overly interested in the detail, but others i'm sure might be. in reality you're correct, i don't care that openmpi failed to load the libs given the fact that the job continues to run without issue. and in fact i don't even care about the warnings, but my users will complain and ask questions. achieving a single build binary where i can disable the interconnects/libraries at runtime would be HIGHLY beneficial to me (perhaps others as well). it cuts my build version combinations from like 12 to 4 (or less), that's a huge reduction in labour/maintenance. which also means i can upgrade openmpi quicker and stay more up to date. i would garner this is probably not a high priority for the team working on openmpi, but if there's something my organization or I can do to push this higher, let me know. > That being said, I think we *do* have a workaround that might be good enough > for you: disable those warnings about plugins not being able to be opened: > mpirun --mca mca_component_show_load_errors 0 ... disabled this: mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1 but not this: pmix_mca_base_component_repository_open: unable to open mca_pnet_opa: libpsm2.so.2 ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
On Apr 6, 2018, at 8:12 AM, Michael Di Domenico wrote: > > so the resulting warnings i get > > mca_btl_openib: lbrdmacm.so.1 > mca_btl_usnic: libfabric.so.1 > mca_oob_ud: libibverbs.so.1 > mca_mtl_mxm: libmxm.so.2 > mca_mtl_ofi: libfabric.so.1 > mca_mtl_psm: libpsm_infinipath.so.1 > mca_mtl_psm2: libpsm2.so.2 > mca_pml_yalla: libmxm.so.2 > > you referenced them as "errors" above, but mpi actually runs just fine > for me even with these msgs, so i would consider them more warnings. Yeah, they're warnings. They're basically telling you that the relevant plugins can't be opened (because the libraries they depend on aren't there). In your case, that's actually exactly what you want, because even if plugin X can't be found, you really wanted plugin Y on that platform, anyway. > it would be nice if openmpi had (or may already have) a simple switch > that lets me disable entire portions of the library chain, ie this > host doesn't have a particular interconnect, so don't load any of the > libraries. this might run counter to how openmpi discovers and load > libs though. We've actually been arguing about exactly how to do this for quite a while. It's complicated (I can explain further, if you care). :-\ That being said, I think we *do* have a workaround that might be good enough for you: disable those warnings about plugins not being able to be opened: mpirun --mca mca_component_show_load_errors 0 ... (or put "mca_component_show_load_errors=0" in the system-wide openmpi-mca-params.conf file) I believe that will disable the messages for you. -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
Thank You so much sir. I will discuss about this with y Supervisor and will proceed accordingly On Fri, Apr 6, 2018 at 5:42 PM, Michael Di Domenico wrote: > On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet > wrote: > > That being said, the error suggest mca_oob_ud.so is a module from a > > previous install, > > Open MPI was not built on the system it is running, or libibverbs.so.1 > > has been removed after > > Open MPI was built. > > yes, understood, i compiled openmpi on a node that has all the > libraries installed for our various interconnects, opa/psm/mxm/ib, but > i ran mpirun on a node that has none of them > > so the resulting warnings i get > > mca_btl_openib: lbrdmacm.so.1 > mca_btl_usnic: libfabric.so.1 > mca_oob_ud: libibverbs.so.1 > mca_mtl_mxm: libmxm.so.2 > mca_mtl_ofi: libfabric.so.1 > mca_mtl_psm: libpsm_infinipath.so.1 > mca_mtl_psm2: libpsm2.so.2 > mca_pml_yalla: libmxm.so.2 > > you referenced them as "errors" above, but mpi actually runs just fine > for me even with these msgs, so i would consider them more warnings. > > > So I do encourage you to take a step back, and think if you can find a > > better solution for your site. > > there are two alternatives > > 1 i can compile a specific version of openmpi for each of our clusters > with each specific interconnect libraries > > 2 i can install all the libraries on all the machines regardless of > whether the interconnect is present > > both are certainly plausible, but my effort here is to see if i can > reduce the size of our software stack and/or reduce the number of > compiled versions of openmpi > > it would be nice if openmpi had (or may already have) a simple switch > that lets me disable entire portions of the library chain, ie this > host doesn't have a particular interconnect, so don't load any of the > libraries. this might run counter to how openmpi discovers and load > libs though. > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built. yes, understood, i compiled openmpi on a node that has all the libraries installed for our various interconnects, opa/psm/mxm/ib, but i ran mpirun on a node that has none of them so the resulting warnings i get mca_btl_openib: lbrdmacm.so.1 mca_btl_usnic: libfabric.so.1 mca_oob_ud: libibverbs.so.1 mca_mtl_mxm: libmxm.so.2 mca_mtl_ofi: libfabric.so.1 mca_mtl_psm: libpsm_infinipath.so.1 mca_mtl_psm2: libpsm2.so.2 mca_pml_yalla: libmxm.so.2 you referenced them as "errors" above, but mpi actually runs just fine for me even with these msgs, so i would consider them more warnings. > So I do encourage you to take a step back, and think if you can find a > better solution for your site. there are two alternatives 1 i can compile a specific version of openmpi for each of our clusters with each specific interconnect libraries 2 i can install all the libraries on all the machines regardless of whether the interconnect is present both are certainly plausible, but my effort here is to see if i can reduce the size of our software stack and/or reduce the number of compiled versions of openmpi it would be nice if openmpi had (or may already have) a simple switch that lets me disable entire portions of the library chain, ie this host doesn't have a particular interconnect, so don't load any of the libraries. this might run counter to how openmpi discovers and load libs though. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
Michael, in this case, you can mpirun --mca oob ^ud ... in order to blacklist the oob/ud component. an alternative is to add oob = ^ud in /.../etc/openmpi-mca-params.conf If Open MPI is installed on a local filesystem, then this setting can be node specific. That being said, the error suggest mca_oob_ud.so is a module from a previous install, Open MPI was not built on the system it is running, or libibverbs.so.1 has been removed after Open MPI was built. So I do encourage you to take a step back, and think if you can find a better solution for your site. Cheers, Gilles On Fri, Apr 6, 2018 at 3:37 AM, Michael Di Domenico wrote: > i'm trying to compile openmpi to support all of our interconnects, > psm/openib/mxm/etc > > this works fine, openmpi finds all the libs, compiles and runs on each > of the respective machines > > however, we don't install the libraries for everything everywhere > > so when i run things like ompi_info and mpirun i get > > mca_base_component_reposity_open: unable to open mca_oob_ud: > libibverbs.so.1: cannot open shared object file: no such file or > directory (ignored) > > and so on, for a bunch of other libs. > > i understand how the lib linking works so this isn't unexpected and > doesn't stop the mpi programs from running. > > here's the part i don't understand, how can i trace the above warning > and others like it back the required --mca parameters i need to add > into the configuration to make the warnings go away? > > as an aside, i believe i can set most of them via environment > variables as well as the command, but what i really like to do is set > them from a file. i know i can create a default param file, but is > there a way to feed a param file at invocation depending where mpirun > is being run? > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] disabling libraries?
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and mpirun i get mca_base_component_reposity_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: no such file or directory (ignored) and so on, for a bunch of other libs. i understand how the lib linking works so this isn't unexpected and doesn't stop the mpi programs from running. here's the part i don't understand, how can i trace the above warning and others like it back the required --mca parameters i need to add into the configuration to make the warnings go away? as an aside, i believe i can set most of them via environment variables as well as the command, but what i really like to do is set them from a file. i know i can create a default param file, but is there a way to feed a param file at invocation depending where mpirun is being run? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users