I was able to create the fix - it is in OMPI master. I have provided a patch for OMPI v3.1.5 here:
https://github.com/open-mpi/ompi/pull/7276 Ralph > On Jan 3, 2020, at 6:04 PM, Ralph Castain via devel > <devel@lists.open-mpi.org> wrote: > > I'm afraid the fix uncovered an issue in the ds21 component that will require > Mellanox to address it - unsure of the timetable for that to happen. > > >> On Jan 3, 2020, at 6:28 AM, Ralph Castain via devel >> <devel@lists.open-mpi.org> wrote: >> >> I committed something upstream in PMIx master and v3.1 that probably >> resolves this - another user reported it over there and provided a patch. I >> can probably backport it to v2.x and give you a patch for OMPI v3.1. >> >> >>> On Jan 3, 2020, at 3:25 AM, Jeff Squyres (jsquyres) via devel >>> <devel@lists.open-mpi.org> wrote: >>> >>> Is there a configure test we can add to make this kind of behavior be the >>> default? >>> >>> >>>> On Jan 1, 2020, at 11:50 PM, Marco Atzeri via devel >>>> <devel@lists.open-mpi.org> wrote: >>>> >>>> thanks Ralph >>>> >>>> gds = ^ds21 >>>> works as expected >>>> >>>> Am 31.12.2019 um 19:27 schrieb Ralph Castain via devel: >>>>> PMIx likely defaults to the ds12 component - which will work fine but a >>>>> tad slower than ds21. It is likely something to do with the way cygwin >>>>> handles memory locks. You can avoid the error message by simply adding >>>>> "gds = ^ds21" to your default MCA param file (the pmix one - should be >>>>> named pmix-mca-params.conf). >>>>> Artem - any advice here? >>>>>> On Dec 25, 2019, at 9:56 AM, Marco Atzeri via devel >>>>>> <devel@lists.open-mpi.org> wrote: >>>>>> >>>>>> I have no multinode around for testing >>>>>> >>>>>> I will need to setup one for testing after the holidays >>>>>> >>>>>> Am 24.12.2019 um 23:27 schrieb Jeff Squyres (jsquyres): >>>>>>> That actually looks like a legit error -- it's failing to initialize a >>>>>>> shared mutex. >>>>>>> I'm not sure what the consequence is of this failure, though, since the >>>>>>> job seemed to run ok. >>>>>>> Are you able to run multi-node jobs ok? >>>>>>>> On Dec 22, 2019, at 1:20 AM, Marco Atzeri via devel >>>>>>>> <devel@lists.open-mpi.org> wrote: >>>>>>>> >>>>>>>> Hi Developers, >>>>>>>> >>>>>>>> Cygwin 64bit, openmpi-3.1.5-1 >>>>>>>> testing the cygwin package before releasing it >>>>>>>> I see a never seen before spurious error messages that do not seem >>>>>>>> about error at all: >>>>>>>> >>>>>>>> $ mpirun -n 4 ./hello_c.exe >>>>>>>> [LAPTOP-82F08ILC:02395] PMIX ERROR: INIT in file >>>>>>>> /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c >>>>>>>> at line 188 >>>>>>>> [LAPTOP-82F08ILC:02395] PMIX ERROR: SUCCESS in file >>>>>>>> /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/common/dstore/dstore_base.c >>>>>>>> at line 2432 >>>>>>>> Hello, world, I am 0 of 4, (Open MPI v3.1.5, package: Open MPI >>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, >>>>>>>> Nov 15, 2019, 116) >>>>>>>> Hello, world, I am 1 of 4, (Open MPI v3.1.5, package: Open MPI >>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, >>>>>>>> Nov 15, 2019, 116) >>>>>>>> Hello, world, I am 2 of 4, (Open MPI v3.1.5, package: Open MPI >>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, >>>>>>>> Nov 15, 2019, 116) >>>>>>>> Hello, world, I am 3 of 4, (Open MPI v3.1.5, package: Open MPI >>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, >>>>>>>> Nov 15, 2019, 116) >>>>>>>> [LAPTOP-82F08ILC:02395] [[20101,0],0] unable to open debugger attach >>>>>>>> fifo >>>>>>>> >>>>>>>> There is a know workaround ? >>>>>>>> I have not found anything on the issue list. >>>>>>>> >>>>>>>> Regards >>>>>>>> MArcp >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> >> >> > >