On 2021-01-15 23:14, Alastair McKinstry wrote:
Ugh. Thanks Drew.
What are the contents of /etc/openmpi/openmpi-mca-params.conf on the node?
Does a simple hello world (see Debian/tests/hello* ) work without
errors in the environment ?


Hi Alastair, sorry for the delay replying to these questions.

I'm attaching /etc/openmpi/openmpi-mca-params.conf

In summary, the lines at the end that I think would be dealing with libfabric are

# Silence this warning on Debian, as many systems don't have openfabric
# but the warning breaks higher layers or application
btl_base_warn_component_unused=0
# Avoid openib an in case applications use fork: see https://github.com/ofiwg/libfabric/issues/6332 # If you wish to use openib and know your application is safe, remove the following:
# Similarly for UCX: https://github.com/open-mpi/ompi/issues/8367
btl = ^uct,openib
pml = ^ucx
osc = ^ucx


(the last line with osc doesn't have a newline at the end, but I guess that's not important for runtime)


hello world doesn't generate any error. The error seems to be specific to python execution, perhaps when forking mpi process from python? That said, petsc4py is not generating the error. mpi4py is probably the most direct way of probing the problem.

The test log for hello world is:

$ autopkgtest -B -- null 2>&1 | tee ../openmpi-test.log
autopkgtest [14:37:31]: starting date: 2021-01-19
autopkgtest [14:37:31]: version 5.15
autopkgtest [14:37:31]: host sandy; command line: /usr/bin/autopkgtest -B -- null
autopkgtest [14:37:31]: testbed dpkg architecture: amd64
autopkgtest [14:37:31]: testbed running kernel: Linux 5.10.0-1-amd64 #1 SMP Debian 5.10.5-1 (2021-01-09)
autopkgtest [14:37:31]: @@@@@@@@@@@@@@@@@@@@ unbuilt-tree .
autopkgtest [14:37:32]: testing package openmpi version 4.1.0-6
autopkgtest [14:37:32]: build not needed
autopkgtest [14:37:32]: test hello1: preparing testbed
autopkgtest [14:37:32]: test hello1: [-----------------------
Hello world from processor sandy, rank 0 out of 1 processors
autopkgtest [14:37:37]: test hello1: -----------------------]
autopkgtest [14:37:37]: test hello1: - - - - - - - - - - results - - - - - - - - - -
hello1               PASS
autopkgtest [14:37:37]: test hello2: preparing testbed
autopkgtest [14:37:37]: test hello2: [-----------------------
 node           0 : Hello world
autopkgtest [14:37:39]: test hello2: -----------------------]
autopkgtest [14:37:39]: test hello2: - - - - - - - - - - results - - - - - - - - - -
hello2               PASS
autopkgtest [14:37:39]: test hello4: preparing testbed
autopkgtest [14:37:39]: test hello4: [-----------------------
 node           0 : Hello world
autopkgtest [14:37:41]: test hello4: -----------------------]
autopkgtest [14:37:41]: test hello4: - - - - - - - - - - results - - - - - - - - - -
hello4               PASS
autopkgtest [14:37:41]: @@@@@@@@@@@@@@@@@@@@ summary
hello1               PASS
hello2               PASS
hello4               PASS
$ echo $?
0
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
#                         University Research and Technology
#                         Corporation.  All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
#                         of Tennessee Research Foundation.  All rights
#                         reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
#                         University of Stuttgart.  All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
#                         All rights reserved.
# Copyright (c) 2006-2017 Cisco Systems, Inc.  All rights reserved
# Copyright (c) 2018      Intel, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

# This is the default system-wide MCA parameters defaults file.
# Specifically, the MCA parameter "mca_param_files" defaults to a
# value of
# "$HOME/.openmpi/mca-params.conf:$sysconf/openmpi-mca-params.conf"
# (this file is the latter of the two).  So if the default value of
# mca_param_files is not changed, this file is used to set system-wide
# MCA parameters.  This file can therefore be used to set system-wide
# default MCA parameters for all users.  Of course, users can override
# these values if they want, but this file is an excellent location
# for setting system-specific MCA parameters for those users who don't
# know / care enough to investigate the proper values for them.

# Note that this file is only applicable where it is visible (in a
# filesystem sense).  Specifically, MPI processes each read this file
# during their startup to determine what default values for MCA
# parameters should be used.  mpirun does not bundle up the values in
# this file from the node where it was run and send them to all nodes;
# the default value decisions are effectively distributed.  Hence,
# these values are only applicable on nodes that "see" this file.  If
# $sysconf is a directory on a local disk, it is likely that changes
# to this file will need to be propagated to other nodes.  If $sysconf
# is a directory that is shared via a networked filesystem, changes to
# this file will be visible to all nodes that share this $sysconf.

# The format is straightforward: one per line, mca_param_name =
# rvalue.  Quoting is ignored (so if you use quotes or escape
# characters, they'll be included as part of the value).  For example:

# Disable run-time MPI parameter checking
#   mpi_param_check = 0

# Note that the value "~/" will be expanded to the current user's home
# directory.  For example:

# Change component loading path
#   mca_base_component_path = /usr/local/lib/openmpi:~/my_openmpi_components

# See "ompi_info --param all all --level 9" for a full listing of Open
# MPI MCA parameters available and their default values.

# Silence this warning on Debian, as many systems don't have openfabric
# but the warning breaks higher layers or application
btl_base_warn_component_unused=0
# Avoid openib an in case applications use fork: see 
https://github.com/ofiwg/libfabric/issues/6332
# If you wish to use openib and know your application is safe, remove the 
following:
# Similarly for UCX: https://github.com/open-mpi/ompi/issues/8367
btl = ^uct,openib
pml = ^ucx
osc = ^ucx

Reply via email to