On 31/08/2018 11:04, Drew Parsons wrote:
On 2018-08-30 14:18, Alastair McKinstry wrote:
On 30/08/2018 09:39, Drew Parsons wrote:
If you want a break from the openmpi angst then go ahead and drop
mpich 3.3b3 into unstable. It won't make the overall MPI situation
any worse... :)
Drew
Ok, I've pushed 3.3b3 to unstable.
Great!
For me there are two concerns:
(1) The current setup (openmpi default) shakes out issues in openmpi3
that should be fixed. It would be good to get that done.
That's fair. If we're going to "drop" openmpi, it's a good policy to
leave it in as stable a state as possible.
At this stage it appears there is a remaining "hang" / threading issue
thats affecting 32-bit platforms
(See #907267). Once thats fixed, I'm favouring no further updates before
Buster - ie ship openmpi 3.1.2 with pmix 3.0.1
(openmpi now has a dependency on libpmix, the Process Management
Interface for exascale, that handles the launching of processes (up to
millions, hierarchically).
the openmpi /pmix interface has been flaky, I suspect, and not well
tested on non-traditional HPC architectures (eg. I suspect its the
source of the 32-bit issue).
mpich _can_ be built with pmix but I'm recommending not doing so for Buster.
(2) moving to mpich as default is a transition and should be pushed
before the deadline - say setting 30 Sept?
This is probably a good point to confer with the Release Team, so I'm
cc:ing them.
Release Team: we have nearly completed the openmpi3 transition. But
there is a broader question of switching mpi-defaults to mpich instead
of openmpi. mpich is reported to be more stable than openmpi and is
recommended by several upstream authors of the HPC software
libraries. We have some consensus that switching to mpich is probably
a good idea, it's just a question of timing at this point.
Does an MPI / mpich transition overlap with other transitions planned
for Buster - say hwloc, hdf5 ?
hdf5 already builds against both openmpi and mpich, so it should not
be a particular problem. It has had more build failures on the minor
arches (with the new hdf5 version in experimental), but there's no
reason to blame mpich for that.
I don't know about hwloc, but the builds in experimental look clean.
Drew
--
Alastair McKinstry, <alast...@sceal.ie>, <mckins...@debian.org>,
https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.