Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)
I should give more background. In the slurm error log for this job, there was another error about a memcpy operation failing listed first, so that caused the job to fail. I suspect these errors below are the result of the other MPI ranks being killed in a not exactly simultaneous manner, which is to be expected. I just want to make sure that this was the case, and the error below wasn't a sign of another issue with the job. Prentice On 11/11/20 5:47 PM, Ralph Castain via users wrote: Looks like it is coming from the Slurm PMIx plugin, not OMPI. Artem - any ideas? Ralph On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users wrote: One of my users recently reported a failed job that was using OpenMPI 4.0.4 compiled with PGI 20.4. There two different errors reported. One was reported once, and I think had nothing to do with OpenMPI or PMIX, and then this error was repeated multiple times in the Slurm error output for the job: pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2) Anyone else see this before? Any idea what would cause this error? I did a google search but couldn't find any discussion of this error anywhere. -- Prentice -- Prentice Bisbal Lead Software Engineer Research Computing Princeton Plasma Physics Laboratory http://www.pppl.gov
Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)
Yeah - this can be safely ignored. Basically, what's happening is an async cleanup of a tmp directory and the code is barking that it wasn't found (because it was already deleted). > On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users > wrote: > > I should give more background. In the slurm error log for this job, there was > another error about a memcpy operation failing listed first, so that caused > the job to fail. I suspect these errors below are the result of the other MPI > ranks being killed in a not exactly simultaneous manner, which is to be > expected. I just want to make sure that this was the case, and the error > below wasn't a sign of another issue with the job. > > Prentice > > On 11/11/20 5:47 PM, Ralph Castain via users wrote: >> Looks like it is coming from the Slurm PMIx plugin, not OMPI. >> >> Artem - any ideas? >> Ralph >> >> >>> On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users >>> wrote: >>> >>> One of my users recently reported a failed job that was using OpenMPI 4.0.4 >>> compiled with PGI 20.4. There two different errors reported. One was >>> reported once, and I think had nothing to do with OpenMPI or PMIX, and then >>> this error was repeated multiple times in the Slurm error output for the >>> job: >>> >>> pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: >>> status = -25: No such file or directory (2) >>> >>> Anyone else see this before? Any idea what would cause this error? I did a >>> google search but couldn't find any discussion of this error anywhere. >>> >>> -- >>> Prentice >>> >> > -- > Prentice Bisbal > Lead Software Engineer > Research Computing > Princeton Plasma Physics Laboratory > http://www.pppl.gov >
Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)
That's what I suspected. Thanks for confirming. Prentice On 11/12/20 1:46 PM, Ralph Castain via users wrote: Yeah - this can be safely ignored. Basically, what's happening is an async cleanup of a tmp directory and the code is barking that it wasn't found (because it was already deleted). On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users wrote: I should give more background. In the slurm error log for this job, there was another error about a memcpy operation failing listed first, so that caused the job to fail. I suspect these errors below are the result of the other MPI ranks being killed in a not exactly simultaneous manner, which is to be expected. I just want to make sure that this was the case, and the error below wasn't a sign of another issue with the job. Prentice On 11/11/20 5:47 PM, Ralph Castain via users wrote: Looks like it is coming from the Slurm PMIx plugin, not OMPI. Artem - any ideas? Ralph On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users wrote: One of my users recently reported a failed job that was using OpenMPI 4.0.4 compiled with PGI 20.4. There two different errors reported. One was reported once, and I think had nothing to do with OpenMPI or PMIX, and then this error was repeated multiple times in the Slurm error output for the job: pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2) Anyone else see this before? Any idea what would cause this error? I did a google search but couldn't find any discussion of this error anywhere. -- Prentice -- Prentice Bisbal Lead Software Engineer Research Computing Princeton Plasma Physics Laboratory http://www.pppl.gov -- Prentice Bisbal Lead Software Engineer Research Computing Princeton Plasma Physics Laboratory http://www.pppl.gov