William,

What is the desired behavior if Open MPI built with CUDA is used on a
system where CUDA is not available or cannot be used because of ABI
compatibility issues?
 - issue a warning (could not open the DSO because of unsatisfied
dependencies)?
 - silently ignore the CUDA related components?

I guess this should be configurable by yet an other MCA parameter, but that
begs the question of what should be the default value for this parameter.


Cheers,

Gilles


On Sat, Sep 10, 2022 at 6:25 AM Zhang, William via devel <
devel@lists.open-mpi.org> wrote:

> Hello interested parties,
>
>
>
> As part of the work for the accelerator framework, the non standard
> behavior of the existing cuda code in Open MPI is being reworked. One of
> the proposed changes involves a change to the behavior of linking/compiling
> cuda components.
>
>
>
> Currently, cuda functions are loaded dynamically using dlopen and stored
> in a function pointer table, with some code to search through typical paths
> to locate libcuda. This means that we can compile Open MPI
> –with-cuda=/path/to/cuda and the resulting build should work on both cuda
> and non cuda environments.
>
>
>
> The change we are making involves removing the function pointer table and
> instead, having relevant components have a direct dependency on libcuda.
> This is in line with the rest of Open MPI’s MCA system where you can build
> components as dsos.
>
>
>
> The difference here are: Open MPI will call libcuda functions directly and
> components that have a cuda dependency will be built as dso’s (ie.
> –with-cuda=/path/to/cuda/ –enable-mca-dso=accelerator-cuda). During
> linking, these dso’s may fail to load, such as on a non cuda environment,
> but this won’t prevent Open MPI from functioning. A related work -
> https://github.com/open-mpi/ompi/pull/10763 - to have an option to
> silence warnings that occur in this expected behavior path is also being
> worked on.
>
>
> From a user behavior, nothing changes. From compilation, dependent
> components will need to be built as dso’s. From code, we can remove dlopen
> dependency for cuda builds, standardize the cuda code with the rest of Open
> MPI, and remove code involved with storing function pointers and detecting
> libcuda location.
>
>
>
> Please provide feedback if you have any suggestions or are against these
> changes.
>
>
>
> Thanks,
>
> William Zhang
>

Reply via email to