Paul - can you please confirm that you gave mpirun a level of 10 for the pmix_base_verbose param? This output isn’t what I would have expected from that level - it looks more like the verbosity was set to 5, and so the error number isn’t printed.
Thanks Ralph > On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Paul, > > I do not remember it like that ... > > at that time, the issue in ompi was that the global errno was uses instead of > the per thread errno. > though the man pages tells -mt should be used fir multithreaded apps, you > tried -D_REENTRANT on all your platforms, and it was enough to get the > expected result. > > I just wanted to check pmix1xx (sub)configure did correctly pass the > -D_REENTRANT flag, and it does. so this is very likely a new and unrelated > error > > Cheers, > > Gilles > > On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov > <mailto:phhargr...@lbl.gov>> wrote: > Gilles, > > Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT". > However, they don't include "-mt". > I believe we concluded (when we had problems previously) that "-mt" was the > proper flag (at compile and link) for multi-threaded with the Studio > compilers. > > -Paul > > On Sat, Sep 19, 2015 at 11:29 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com <>> wrote: > Paul, > > Can you please double check pmix1xx is compiled with -D_REENTRANT ? > We ran into similar issues in the past, and they only occurred with Solaris > > Cheers, > > Gilles > > > On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov <>> wrote: > Ralph, > The output from the requested run is attached. > -Paul > > On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org <>> wrote: > Ah, okay - that makes more sense. I’ll have to let Brice see if he can figure > out how to silence the hwloc error message as I can’t find where it came > from. The other errors are real and are the reason why the job was terminated. > > The problem is that we are trying to establish a communication between the > app and the daemon via unix domain socket, and we failed to do so. The error > tells me that we were able to create and connect to the socket, but failed > when the daemon tried to do a blocking send to the app. > > Can you rerun it with -mca pmix_base_verbose 10? It will tell us the value of > the error number that was returned > > Thanks > Ralph > > >> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov <>> wrote: >> >> Ralph, >> >> No it did not run. >> The complete output (which I really should have included in the first place) >> is below. >> >> -Paul >> >> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >> Error opening /devices/pci@0,0:reg: Permission denied >> [pcp-d-3:26054] PMIX ERROR: ERROR in file >> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >> at line 181 >> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file >> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >> at line 463 >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: ompi_rte_init failed >> --> Returned "(null)" (-43) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> *** and potentially your MPI job) >> [pcp-d-3:26054] Local abort before MPI_INIT completed completed >> successfully, but am not able to aggregate error messages, and not able to >> guarantee that all other processes were killed! >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun detected that one or more processes exited with non-zero status, thus >> causing >> the job to be terminated. The first process to do so was: >> >> Process name: [[11371,1],0] >> Exit code: 1 >> -------------------------------------------------------------------------- >> >> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org <>> wrote: >> Paul, can you clarify something for me? The error in this case indicates >> that the client wasn’t able to reach the daemon - this should have resulted >> in termination of the job. Did the job actually run? >> >> >>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org <>> wrote: >>> >>> I'm on travel right now, but it should be an easy fix when I return. Sorry >>> for the annoyance >>> >>> >>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov <>> >>> wrote: >>> Any suggestion how I (as a non-root user) can avoid seeing this hwloc error >>> message on every run? >>> >>> -Paul >>> >>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <gil...@rist.or.jp >>> <>> wrote: >>> Paul, >>> >>> IIRC, the "Permission denied" is coming from hwloc that cannot collect all >>> the info it would like. >>> >>> Cheers, >>> >>> Gilles >>> >>> On 9/18/2015 2:34 PM, Paul Hargrove wrote: >>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the Studio >>>> Compilers (default ILP32 output) and saw the following result >>>> >>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >>>> Error opening /devices/pci@0,0:reg: Permission denied >>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file >>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >>>> at line 181 >>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file >>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >>>> at line 463 >>>> >>>> I don't know if the Permission denied error is related to the subsequent >>>> PMIX errors, but any message that says "UNREACHABLE" is clearly worth >>>> reporting. >>>> >>>> -Paul >>>> >>>> -- >>>> Paul H. Hargrove <>phhargr...@lbl.gov <> >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: +1-510-495-2352 >>>> <tel:%2B1-510-495-2352> >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> <tel:%2B1-510-486-6900> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org <> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php >>>> <http://www.open-mpi.org/community/lists/devel/2015/09/18074.php> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php >>> <http://www.open-mpi.org/community/lists/devel/2015/09/18075.php> >>> >>> >>> >>> -- >>> Paul H. Hargrove phhargr...@lbl.gov <> >>> Computer Languages & Systems Software (CLaSS) Group >>> Computer Science Department Tel: +1-510-495-2352 >>> <tel:%2B1-510-495-2352> >>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>> <tel:%2B1-510-486-6900> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php >>> <http://www.open-mpi.org/community/lists/devel/2015/09/18076.php> >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php >> <http://www.open-mpi.org/community/lists/devel/2015/09/18078.php> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov <> >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> <tel:%2B1-510-495-2352> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> <tel:%2B1-510-486-6900>_______________________________________________ >> devel mailing list >> de...@open-mpi.org <> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php >> <http://www.open-mpi.org/community/lists/devel/2015/09/18080.php> > > _______________________________________________ > devel mailing list > de...@open-mpi.org <> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > <http://www.open-mpi.org/mailman/listinfo.cgi/devel> > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18081.php > <http://www.open-mpi.org/community/lists/devel/2015/09/18081.php> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov <> > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > <tel:%2B1-510-495-2352> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > <tel:%2B1-510-486-6900> > _______________________________________________ > devel mailing list > de...@open-mpi.org <> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > <http://www.open-mpi.org/mailman/listinfo.cgi/devel> > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18083.php > <http://www.open-mpi.org/community/lists/devel/2015/09/18083.php> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov <> > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18085.php > <http://www.open-mpi.org/community/lists/devel/2015/09/18085.php>