I was able to create the fix - it is in OMPI master. I have provided a patch 
for OMPI v3.1.5 here:

https://github.com/open-mpi/ompi/pull/7276

Ralph


> On Jan 3, 2020, at 6:04 PM, Ralph Castain via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> I'm afraid the fix uncovered an issue in the ds21 component that will require 
> Mellanox to address it - unsure of the timetable for that to happen.
> 
> 
>> On Jan 3, 2020, at 6:28 AM, Ralph Castain via devel 
>> <devel@lists.open-mpi.org> wrote:
>> 
>> I committed something upstream in PMIx master and v3.1 that probably 
>> resolves this - another user reported it over there and provided a patch. I 
>> can probably backport it to v2.x and give you a patch for OMPI v3.1.
>> 
>> 
>>> On Jan 3, 2020, at 3:25 AM, Jeff Squyres (jsquyres) via devel 
>>> <devel@lists.open-mpi.org> wrote:
>>> 
>>> Is there a configure test we can add to make this kind of behavior be the 
>>> default?
>>> 
>>> 
>>>> On Jan 1, 2020, at 11:50 PM, Marco Atzeri via devel 
>>>> <devel@lists.open-mpi.org> wrote:
>>>> 
>>>> thanks Ralph
>>>> 
>>>> gds = ^ds21
>>>> works as expected
>>>> 
>>>> Am 31.12.2019 um 19:27 schrieb Ralph Castain via devel:
>>>>> PMIx likely defaults to the ds12 component - which will work fine but a 
>>>>> tad slower than ds21. It is likely something to do with the way cygwin 
>>>>> handles memory locks. You can avoid the error message by simply adding 
>>>>> "gds = ^ds21" to your default MCA param file (the pmix one - should be 
>>>>> named pmix-mca-params.conf).
>>>>> Artem - any advice here?
>>>>>> On Dec 25, 2019, at 9:56 AM, Marco Atzeri via devel 
>>>>>> <devel@lists.open-mpi.org> wrote:
>>>>>> 
>>>>>> I have no multinode around for testing
>>>>>> 
>>>>>> I will need to setup one for testing after the holidays
>>>>>> 
>>>>>> Am 24.12.2019 um 23:27 schrieb Jeff Squyres (jsquyres):
>>>>>>> That actually looks like a legit error -- it's failing to initialize a 
>>>>>>> shared mutex.
>>>>>>> I'm not sure what the consequence is of this failure, though, since the 
>>>>>>> job seemed to run ok.
>>>>>>> Are you able to run multi-node jobs ok?
>>>>>>>> On Dec 22, 2019, at 1:20 AM, Marco Atzeri via devel 
>>>>>>>> <devel@lists.open-mpi.org> wrote:
>>>>>>>> 
>>>>>>>> Hi Developers,
>>>>>>>> 
>>>>>>>> Cygwin 64bit, openmpi-3.1.5-1
>>>>>>>> testing the cygwin package before releasing it
>>>>>>>> I see a never seen before spurious error messages that do not seem
>>>>>>>> about error at all:
>>>>>>>> 
>>>>>>>> $ mpirun -n 4 ./hello_c.exe
>>>>>>>> [LAPTOP-82F08ILC:02395] PMIX ERROR: INIT in file 
>>>>>>>> /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c
>>>>>>>>  at line 188
>>>>>>>> [LAPTOP-82F08ILC:02395] PMIX ERROR: SUCCESS in file 
>>>>>>>> /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/common/dstore/dstore_base.c
>>>>>>>>  at line 2432
>>>>>>>> Hello, world, I am 0 of 4, (Open MPI v3.1.5, package: Open MPI 
>>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, 
>>>>>>>> Nov 15, 2019, 116)
>>>>>>>> Hello, world, I am 1 of 4, (Open MPI v3.1.5, package: Open MPI 
>>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, 
>>>>>>>> Nov 15, 2019, 116)
>>>>>>>> Hello, world, I am 2 of 4, (Open MPI v3.1.5, package: Open MPI 
>>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, 
>>>>>>>> Nov 15, 2019, 116)
>>>>>>>> Hello, world, I am 3 of 4, (Open MPI v3.1.5, package: Open MPI 
>>>>>>>> Marco@LAPTOP-82F08ILC Distribution, ident: 3.1.5, repo rev: v3.1.5, 
>>>>>>>> Nov 15, 2019, 116)
>>>>>>>> [LAPTOP-82F08ILC:02395] [[20101,0],0] unable to open debugger attach 
>>>>>>>> fifo
>>>>>>>> 
>>>>>>>> There is a know workaround ?
>>>>>>>> I have not found anything on the issue list.
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> MArcp
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>> 
>> 
> 
> 


Reply via email to