Bug#1070300: pmix_psquash_base_select failed during MPI_INIT on 32bit architectures
Samuel Thibault, le sam. 04 mai 2024 11:49:40 +0200, a ecrit: > Samuel Thibault, le ven. 03 mai 2024 19:00:22 +0200, a ecrit: > > This has been posing migration issues for quite some time, I have > > uploaded the attached fix to delayed/0. > > Some of the components depend on libmca_common_libdstore which also > needs to be installed, otherwise openmpi emits some text on stderr, > which some autopkgtest don't like, I have uploaded the attached changes > to delayed/0 Sorry it seems my tests had gone bogus, I do remember testing the result but apparently obviously failed to. I have double-checked my changes this time, as attached and uploaded to delayed/0 (now that openmpi got a bit force-migrated to testing) Samuel diff -Nru openmpi-4.1.6/debian/changelog openmpi-4.1.6/debian/changelog --- openmpi-4.1.6/debian/changelog 2024-05-04 11:32:26.0 +0200 +++ openmpi-4.1.6/debian/changelog 2024-05-05 20:38:36.0 +0200 @@ -1,3 +1,10 @@ +openmpi (4.1.6-13.3) unstable; urgency=medium + + * Non-maintainer Upload + * Really install libmca_common_dstore. + + -- Samuel Thibault Sun, 05 May 2024 20:38:36 +0200 + openmpi (4.1.6-13.2) unstable; urgency=medium * Non-maintainer Upload diff -Nru openmpi-4.1.6/debian/rules openmpi-4.1.6/debian/rules --- openmpi-4.1.6/debian/rules 2024-05-04 11:32:26.0 +0200 +++ openmpi-4.1.6/debian/rules 2024-05-05 20:38:36.0 +0200 @@ -289,10 +289,11 @@ dh_install -p libopenmpi3t64 $(LIBDIR)/openmpi/lib/libpmix.so.2.2.35 $(LIBDIR) ; \ dh_install -p libopenmpi3t64 /usr/share/pmix ; \ dh_install -p libopenmpi3t64 "/usr/lib/$(DEB_HOST_MULTIARCH)/openmpi/lib/pmix/*.so" ; \ - if test -f $(DESTDIR)/$(LIBDIR)/openmpi/lib/libmca_common_libdstore.so.1.0.2 ; then \ - dh_install -p libopenmpi3t64 $(LIBDIR)/libmca_common_libdstore.so.1.0.2 ; \ - dh_link -p libopenmpi3t64 $(LIBDIR)/libmca_common_libdstore.so.1.0.2 $(LIBDIR)/libmca_common_libdstore.so.1 ; \ - dh_link -p libopenmpi-dev $(LIBDIR)/libmca_common_libdstore.so.1 $(LIBDIR)/libmca_common_libdstore.so ; \ + if test -f $(DESTDIR)/$(LIBDIR)/openmpi/lib/libmca_common_dstore.so.1.0.2 ; then \ + dh_install -p libopenmpi3t64 $(LIBDIR)/openmpi/lib/libmca_common_dstore.so.1.0.2 $(LIBDIR) ; \ + dh_link -p libopenmpi3t64 $(LIBDIR)/libmca_common_dstore.so.1.0.2 $(LIBDIR)/libmca_common_dstore.so.1 ; \ + dh_link -p libopenmpi-dev $(LIBDIR)/libmca_common_dstore.so.1 $(LIBDIR)/openmpi/lib/libmca_common_dstore.so ; \ + dh_link -p libopenmpi-dev $(LIBDIR)/libmca_common_dstore.so.1 $(LIBDIR)/libmca_common_dstore.so ; \ fi ; \ dh_link -p libopenmpi3t64 $(LIBDIR)/libpmix.so.2.2.35 $(LIBDIR)/libpmix.so.2 ; \ dh_link -p libopenmpi-dev $(LIBDIR)/libpmix.so.2 $(LIBDIR)/openmpi/lib/libpmix.so ; \
Bug#1070300: pmix_psquash_base_select failed during MPI_INIT on 32bit architectures
Samuel Thibault, le ven. 03 mai 2024 19:00:22 +0200, a ecrit: > This has been posing migration issues for quite some time, I have > uploaded the attached fix to delayed/0. Some of the components depend on libmca_common_libdstore which also needs to be installed, otherwise openmpi emits some text on stderr, which some autopkgtest don't like, I have uploaded the attached changes to delayed/0 Samuel diff -Nru openmpi-4.1.6/debian/changelog openmpi-4.1.6/debian/changelog --- openmpi-4.1.6/debian/changelog 2024-05-03 18:53:52.0 +0200 +++ openmpi-4.1.6/debian/changelog 2024-05-04 11:32:26.0 +0200 @@ -1,3 +1,11 @@ +openmpi (4.1.6-13.2) unstable; urgency=medium + + * Non-maintainer Upload + * Also install libmca_common_dstore. + * Do not install .la pmix files. + + -- Samuel Thibault Sat, 04 May 2024 11:32:26 +0200 + openmpi (4.1.6-13.1) unstable; urgency=medium * Non-maintainer Upload diff -Nru openmpi-4.1.6/debian/rules openmpi-4.1.6/debian/rules --- openmpi-4.1.6/debian/rules 2024-05-03 18:49:28.0 +0200 +++ openmpi-4.1.6/debian/rules 2024-05-04 11:32:26.0 +0200 @@ -288,7 +288,12 @@ echo "PMIX: install " ; \ dh_install -p libopenmpi3t64 $(LIBDIR)/openmpi/lib/libpmix.so.2.2.35 $(LIBDIR) ; \ dh_install -p libopenmpi3t64 /usr/share/pmix ; \ - dh_install -p libopenmpi3t64 /usr/lib/$(DEB_HOST_MULTIARCH)/openmpi/lib/pmix ; \ + dh_install -p libopenmpi3t64 "/usr/lib/$(DEB_HOST_MULTIARCH)/openmpi/lib/pmix/*.so" ; \ + if test -f $(DESTDIR)/$(LIBDIR)/openmpi/lib/libmca_common_libdstore.so.1.0.2 ; then \ + dh_install -p libopenmpi3t64 $(LIBDIR)/libmca_common_libdstore.so.1.0.2 ; \ + dh_link -p libopenmpi3t64 $(LIBDIR)/libmca_common_libdstore.so.1.0.2 $(LIBDIR)/libmca_common_libdstore.so.1 ; \ + dh_link -p libopenmpi-dev $(LIBDIR)/libmca_common_libdstore.so.1 $(LIBDIR)/libmca_common_libdstore.so ; \ + fi ; \ dh_link -p libopenmpi3t64 $(LIBDIR)/libpmix.so.2.2.35 $(LIBDIR)/libpmix.so.2 ; \ dh_link -p libopenmpi-dev $(LIBDIR)/libpmix.so.2 $(LIBDIR)/openmpi/lib/libpmix.so ; \ dh_link -p libopenmpi-dev $(LIBDIR)/libpmix.so.2 $(LIBDIR)/libpmix.so ; \
Bug#1070300: pmix_psquash_base_select failed during MPI_INIT on 32bit architectures
Hello, This has been posing migration issues for quite some time, I have uploaded the attached fix to delayed/0. Samuel diff -Nru openmpi-4.1.6/debian/changelog openmpi-4.1.6/debian/changelog --- openmpi-4.1.6/debian/changelog 2024-04-27 18:37:26.0 +0200 +++ openmpi-4.1.6/debian/changelog 2024-05-03 18:53:52.0 +0200 @@ -1,3 +1,10 @@ +openmpi (4.1.6-13.1) unstable; urgency=medium + + * Non-maintainer Upload + * Also install pmix components on 32-bit systems. Closes: #1070300 + + -- Samuel Thibault Fri, 03 May 2024 18:53:52 +0200 + openmpi (4.1.6-13) unstable; urgency=medium * Move pmix help files to libopenmpi3t64, not openmpi3-common diff -Nru openmpi-4.1.6/debian/rules openmpi-4.1.6/debian/rules --- openmpi-4.1.6/debian/rules 2024-04-27 18:37:26.0 +0200 +++ openmpi-4.1.6/debian/rules 2024-05-03 18:49:28.0 +0200 @@ -287,7 +287,8 @@ if $(DO_OWN_PMIX); then \ echo "PMIX: install " ; \ dh_install -p libopenmpi3t64 $(LIBDIR)/openmpi/lib/libpmix.so.2.2.35 $(LIBDIR) ; \ - dh_install -p libopenmpi3t64 /usr/share/pmix/* ; \ + dh_install -p libopenmpi3t64 /usr/share/pmix ; \ + dh_install -p libopenmpi3t64 /usr/lib/$(DEB_HOST_MULTIARCH)/openmpi/lib/pmix ; \ dh_link -p libopenmpi3t64 $(LIBDIR)/libpmix.so.2.2.35 $(LIBDIR)/libpmix.so.2 ; \ dh_link -p libopenmpi-dev $(LIBDIR)/libpmix.so.2 $(LIBDIR)/openmpi/lib/libpmix.so ; \ dh_link -p libopenmpi-dev $(LIBDIR)/libpmix.so.2 $(LIBDIR)/libpmix.so ; \
Bug#1070300: pmix_psquash_base_select failed during MPI_INIT on 32bit architectures
Hi, the problem already appears in OpenMPI's own autopkgtests, see [1] Best, Markus [1] https://ci.debian.net/packages/o/openmpi/unstable/i386/46207866/
Bug#1070300: pmix_psquash_base_select failed during MPI_INIT on 32bit architectures
Source: openmpi Version: 4.1.6-13 Severity: serious Justification: unkown Control: affects -1 src:dune-grid Dear Maintainer, I just uploaded a new version of package dune-grid and noticed that none of our parallel tests start successfully on 32bit architectures. 2/66 Test #2: scsgmappertest ***Failed0.15 sec -- It looks like pmix_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during pmix_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an PMIX developer): pmix_psquash_base_select failed --> Returned value -46 instead of PMIX_SUCCESS -- [arm-ubc-05:12560] PMIX ERROR: NOT-FOUND in file ../../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 237 [arm-ubc-05:12559] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 716 [arm-ubc-05:12559] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 172 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0) -- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [arm-ubc-05:12559] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! See [1] for a complete build where the tests using mpirun fail in this way. This happens on these architectures: armel, armhf, i386, hppa Best, Markus [1] https://buildd.debian.org/status/fetch.php?pkg=dune- grid=armel=2.9.0-4=1714724856=0 -- System Information: Debian Release: 12.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-20-amd64 (SMP w/64 CPU threads; PREEMPT) Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages openmpi-bin depends on: ii libc62.36-9+deb12u6 ii libevent-core-2.1-7 2.1.12-stable-8 ii libopenmpi3 4.1.4-3+b1 ii openmpi-common 4.1.4-3 ii openssh-client [ssh-client] 1:9.2p1-2+deb12u2 openmpi-bin recommends no packages. Versions of packages openmpi-bin suggests: ii gfortran [fortran-compiler] 4:12.2.0-3 -- no debconf information