Paul, Can you please double check pmix1xx is compiled with -D_REENTRANT ? We ran into similar issues in the past, and they only occurred with Solaris
Cheers, Gilles On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote: > Ralph, > The output from the requested run is attached. > -Paul > > On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: > >> Ah, okay - that makes more sense. I’ll have to let Brice see if he can >> figure out how to silence the hwloc error message as I can’t find where it >> came from. The other errors are real and are the reason why the job was >> terminated. >> >> The problem is that we are trying to establish a communication between >> the app and the daemon via unix domain socket, and we failed to do so. The >> error tells me that we were able to create and connect to the socket, but >> failed when the daemon tried to do a blocking send to the app. >> >> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the >> value of the error number that was returned >> >> Thanks >> Ralph >> >> >> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov >> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>> wrote: >> >> Ralph, >> >> No it did not run. >> The complete output (which I really should have included in the first >> place) is below. >> >> -Paul >> >> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >> Error opening /devices/pci@0,0:reg: Permission denied >> [pcp-d-3:26054] PMIX ERROR: ERROR in file >> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >> at line 181 >> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file >> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >> at line 463 >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or >> environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: ompi_rte_init failed >> --> Returned "(null)" (-43) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> *** and potentially your MPI job) >> [pcp-d-3:26054] Local abort before MPI_INIT completed completed >> successfully, but am not able to aggregate error messages, and not able to >> guarantee that all other processes were killed! >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun detected that one or more processes exited with non-zero status, >> thus causing >> the job to be terminated. The first process to do so was: >> >> Process name: [[11371,1],0] >> Exit code: 1 >> -------------------------------------------------------------------------- >> >> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org >> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: >> >>> Paul, can you clarify something for me? The error in this case indicates >>> that the client wasn’t able to reach the daemon - this should have resulted >>> in termination of the job. Did the job actually run? >>> >>> >>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org >>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: >>> >>> I'm on travel right now, but it should be an easy fix when I return. >>> Sorry for the annoyance >>> >>> >>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov >>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>> wrote: >>> >>>> Any suggestion how I (as a non-root user) can avoid seeing this hwloc >>>> error message on every run? >>>> >>>> -Paul >>>> >>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet < >>>> gil...@rist.or.jp <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> >>>> wrote: >>>> >>>>> Paul, >>>>> >>>>> IIRC, the "Permission denied" is coming from hwloc that cannot collect >>>>> all the info it would like. >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote: >>>>> >>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the >>>>> Studio Compilers (default ILP32 output) and saw the following result >>>>> >>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >>>>> Error opening /devices/pci@0,0:reg: Permission denied >>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file >>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >>>>> at line 181 >>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file >>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >>>>> at line 463 >>>>> >>>>> I don't know if the Permission denied error is related to the >>>>> subsequent PMIX errors, but any message that says "UNREACHABLE" is clearly >>>>> worth reporting. >>>>> >>>>> -Paul >>>>> >>>>> -- >>>>> Paul H. Hargrove >>>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>phhargr...@lbl.gov >>>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');> >>>>> Computer Languages & Systems Software (CLaSS) Group >>>>> Computer Science Department Tel: +1-510-495-2352 >>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php >>>>> >>>> >>>> >>>> >>>> -- >>>> Paul H. Hargrove phhargr...@lbl.gov >>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');> >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: +1-510-495-2352 >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php >>>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');> >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');> > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >