Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-12 Thread Prentice Bisbal via users
I should give more background. In the slurm error log for this job, 
there was another error about a memcpy operation failing listed first, 
so that caused the job to fail. I suspect these errors below are the 
result of the other MPI ranks being killed in a not exactly simultaneous 
manner, which is to be expected. I just want to make sure that this was 
the case, and the error below wasn't a sign of another issue with the job.


Prentice

On 11/11/20 5:47 PM, Ralph Castain via users wrote:

Looks like it is coming from the Slurm PMIx plugin, not OMPI.

Artem - any ideas?
Ralph



On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users 
 wrote:

One of my users recently reported a failed job that was using OpenMPI 4.0.4 
compiled with PGI 20.4. There  two different errors reported. One was reported 
once, and I think had nothing to do with OpenMPI or PMIX, and then this error 
was repeated multiple times in the Slurm error output for the job:

pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: 
status = -25: No such file or directory (2)

Anyone else see this before? Any idea what would cause this error? I did a 
google search but couldn't find any discussion of this error anywhere.

--
Prentice




--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov



Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-12 Thread Ralph Castain via users
Yeah - this can be safely ignored. Basically, what's happening is an async 
cleanup of a tmp directory and the code is barking that it wasn't found 
(because it was already deleted).


> On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users 
>  wrote:
> 
> I should give more background. In the slurm error log for this job, there was 
> another error about a memcpy operation failing listed first, so that caused 
> the job to fail. I suspect these errors below are the result of the other MPI 
> ranks being killed in a not exactly simultaneous manner, which is to be 
> expected. I just want to make sure that this was the case, and the error 
> below wasn't a sign of another issue with the job.
> 
> Prentice
> 
> On 11/11/20 5:47 PM, Ralph Castain via users wrote:
>> Looks like it is coming from the Slurm PMIx plugin, not OMPI.
>> 
>> Artem - any ideas?
>> Ralph
>> 
>> 
>>> On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users 
>>>  wrote:
>>> 
>>> One of my users recently reported a failed job that was using OpenMPI 4.0.4 
>>> compiled with PGI 20.4. There  two different errors reported. One was 
>>> reported once, and I think had nothing to do with OpenMPI or PMIX, and then 
>>> this error was repeated multiple times in the Slurm error output for the 
>>> job:
>>> 
>>> pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: 
>>> status = -25: No such file or directory (2)
>>> 
>>> Anyone else see this before? Any idea what would cause this error? I did a 
>>> google search but couldn't find any discussion of this error anywhere.
>>> 
>>> -- 
>>> Prentice
>>> 
>> 
> -- 
> Prentice Bisbal
> Lead Software Engineer
> Research Computing
> Princeton Plasma Physics Laboratory
> http://www.pppl.gov
> 




Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-12 Thread Prentice Bisbal via users

That's what I suspected. Thanks for confirming.

Prentice

On 11/12/20 1:46 PM, Ralph Castain via users wrote:

Yeah - this can be safely ignored. Basically, what's happening is an async 
cleanup of a tmp directory and the code is barking that it wasn't found 
(because it was already deleted).



On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users 
 wrote:

I should give more background. In the slurm error log for this job, there was 
another error about a memcpy operation failing listed first, so that caused the 
job to fail. I suspect these errors below are the result of the other MPI ranks 
being killed in a not exactly simultaneous manner, which is to be expected. I 
just want to make sure that this was the case, and the error below wasn't a 
sign of another issue with the job.

Prentice

On 11/11/20 5:47 PM, Ralph Castain via users wrote:

Looks like it is coming from the Slurm PMIx plugin, not OMPI.

Artem - any ideas?
Ralph



On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users 
 wrote:

One of my users recently reported a failed job that was using OpenMPI 4.0.4 
compiled with PGI 20.4. There  two different errors reported. One was reported 
once, and I think had nothing to do with OpenMPI or PMIX, and then this error 
was repeated multiple times in the Slurm error output for the job:

pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: 
status = -25: No such file or directory (2)

Anyone else see this before? Any idea what would cause this error? I did a 
google search but couldn't find any discussion of this error anywhere.

--
Prentice


--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov




--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov