On 13/01/2021 09:16, Drew Parsons wrote:
Package: libucx0
Version: 1.10.0~rc1-2
Severity: serious
Justification: debci
Our next round of whack-a-mole comes from the new UCX.
pmix 4.0.0-3 seems to have fixed the pmix error from bug#979744.
debci tests next report a problem with UCX, with
openmpi 4.1.0-5
pmix 4.0.0-3
ucx 1.10.0~rc1-2
Thanks. This appears to be unwanted warnings from UCX that RDMA is not
present.
I'm looking at silencing this via openmpi conf params.
Alastair
The openmpi debci test at
https://ci.debian.net/data/autopkgtest/testing/arm64/o/openmpi/9650495/log.gz
reports:
autopkgtest [15:16:16]: test hello4: [-----------------------
[1610522176.588740] [ci-013-36a60f22:1417 :0] rdmacm_cm.c:638 UCX ERROR
rdma_create_event_channel failed: No such device
[1610522176.588779] [ci-013-36a60f22:1417 :0] ucp_worker.c:1432 UCX ERROR
failed to open CM on component rdmacm with status Input/output error
[ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273
Error: Failed to create UCP worker
node 0 : Hello world
autopkgtest [15:16:17]: test hello4: -----------------------]
autopkgtest [15:16:18]: test hello4: - - - - - - - - - - results - - - - - - -
- - -
hello4 FAIL stderr: [ci-013-36a60f22:01417]
../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP
worker
autopkgtest [15:16:18]: test hello4: - - - - - - - - - - stderr - - - - - - -
- - -
[ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273
Error: Failed to create UCP worker
autopkgtest [15:16:18]: @@@@@@@@@@@@@@@@@@@@ summary
hello1 FAIL stderr: [ci-013-36a60f22:01292]
../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP
worker
hello2 FAIL stderr: [ci-013-36a60f22:01218]
../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP
worker
hello4 FAIL stderr: [ci-013-36a60f22:01417]
../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP
worker
Other client applications fail with the same error.
-- System Information:
Debian Release: bullseye/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 5.10.0-1-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8),
LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libucx0 depends on:
ii ibverbs-providers 33.0-1
ii libbinutils 2.35.1-7
ii libc6 2.31-9
ii libibverbs1 33.0-1
ii libnuma1 2.0.12-1+b1
ii librdmacm1 33.0-1
libucx0 recommends no packages.
libucx0 suggests no packages.
-- no debconf information
--
Alastair McKinstry, email: alast...@sceal.ie, matrix: @alastair:sceal.ie,
phone: 087-6847928
Green Party Councillor, Galway County Council
--
debian-science-maintainers mailing list
debian-science-maintainers@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/debian-science-maintainers