Good morning, We have a cluster with two kind of infiniband cards, one connectx-4 and the other connectx-6. Openmpi-3.1.3 works fine, but when we start with connectx-6 we started to use openmpi-4.0.3 (that support connectx-6) and the programs that have several parts, first a call to a secuencial program and inside it a call to a parallel program, … (in our case the program is WRF, but we have others like this with the same problem), this kind of programs suddenly stop,
….. 0 S 4556 87383 87361 0 80 0 - 126676 hrtime ? 00:05:25 real.exe 0 S 4556 87384 87361 0 80 0 - 126677 hrtime ? 00:05:33 real.exe 0 S 4556 87385 87361 0 80 0 - 126675 hrtime ? 00:05:28 real.exe …… The WCHAN=hrtime, and it looks that it is running, but really it doesn´t work We don´t know if it could be problem with slurm and this version of openmpi… Any idea? ________________________________________________ Angelines Alberto Morillas Unidad de Arquitectura Informática Despacho: 22.1.32 Telf.: +34 91 346 6119 Fax: +34 91 346 6537 skype: angelines.alberto CIEMAT Avenida Complutense, 40 28040 MADRID ________________________________________________