Yes, it is definitely at 10. Another attempt is attached. -Paul On Sun, Sep 20, 2015 at 8:19 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Paul - can you please confirm that you gave mpirun a level of 10 for the > pmix_base_verbose param? This output isn’t what I would have expected from > that level - it looks more like the verbosity was set to 5, and so the > error number isn’t printed. > > Thanks > Ralph > > > On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Paul, > > I do not remember it like that ... > > at that time, the issue in ompi was that the global errno was uses instead > of the per thread errno. > though the man pages tells -mt should be used fir multithreaded apps, you > tried -D_REENTRANT on all your platforms, and it was enough to get the > expected result. > > I just wanted to check pmix1xx (sub)configure did correctly pass the > -D_REENTRANT flag, and it does. so this is very likely a new and unrelated > error > > Cheers, > > Gilles > > On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> Gilles, >> >> Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT". >> However, they don't include "-mt". >> I believe we concluded (when we had problems previously) that "-mt" was >> the proper flag (at compile and link) for multi-threaded with the Studio >> compilers. >> >> -Paul >> >> On Sat, Sep 19, 2015 at 11:29 PM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> >>> Paul, >>> >>> Can you please double check pmix1xx is compiled with -D_REENTRANT ? >>> We ran into similar issues in the past, and they only occurred with >>> Solaris >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote: >>> >>>> Ralph, >>>> The output from the requested run is attached. >>>> -Paul >>>> >>>> On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >>>>> Ah, okay - that makes more sense. I’ll have to let Brice see if he can >>>>> figure out how to silence the hwloc error message as I can’t find where it >>>>> came from. The other errors are real and are the reason why the job was >>>>> terminated. >>>>> >>>>> The problem is that we are trying to establish a communication between >>>>> the app and the daemon via unix domain socket, and we failed to do so. The >>>>> error tells me that we were able to create and connect to the socket, but >>>>> failed when the daemon tried to do a blocking send to the app. >>>>> >>>>> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the >>>>> value of the error number that was returned >>>>> >>>>> Thanks >>>>> Ralph >>>>> >>>>> >>>>> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >>>>> >>>>> Ralph, >>>>> >>>>> No it did not run. >>>>> The complete output (which I really should have included in the first >>>>> place) is below. >>>>> >>>>> -Paul >>>>> >>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >>>>> Error opening /devices/pci@0,0:reg: Permission denied >>>>> [pcp-d-3:26054] PMIX ERROR: ERROR in file >>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >>>>> at line 181 >>>>> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file >>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >>>>> at line 463 >>>>> >>>>> -------------------------------------------------------------------------- >>>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during MPI_INIT; some of which are due to configuration or >>>>> environment >>>>> problems. This failure appears to be an internal failure; here's some >>>>> additional information (which may only be relevant to an Open MPI >>>>> developer): >>>>> >>>>> ompi_mpi_init: ompi_rte_init failed >>>>> --> Returned "(null)" (-43) instead of "Success" (0) >>>>> >>>>> -------------------------------------------------------------------------- >>>>> *** An error occurred in MPI_Init >>>>> *** on a NULL communicator >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >>>>> abort, >>>>> *** and potentially your MPI job) >>>>> [pcp-d-3:26054] Local abort before MPI_INIT completed completed >>>>> successfully, but am not able to aggregate error messages, and not able to >>>>> guarantee that all other processes were killed! >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun detected that one or more processes exited with non-zero >>>>> status, thus causing >>>>> the job to be terminated. The first process to do so was: >>>>> >>>>> Process name: [[11371,1],0] >>>>> Exit code: 1 >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org> >>>>> wrote: >>>>> >>>>>> Paul, can you clarify something for me? The error in this case >>>>>> indicates that the client wasn’t able to reach the daemon - this should >>>>>> have resulted in termination of the job. Did the job actually run? >>>>>> >>>>>> >>>>>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> >>>>>> I'm on travel right now, but it should be an easy fix when I return. >>>>>> Sorry for the annoyance >>>>>> >>>>>> >>>>>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov> >>>>>> wrote: >>>>>> >>>>>>> Any suggestion how I (as a non-root user) can avoid seeing this >>>>>>> hwloc error message on every run? >>>>>>> >>>>>>> -Paul >>>>>>> >>>>>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet < >>>>>>> gil...@rist.or.jp> wrote: >>>>>>> >>>>>>>> Paul, >>>>>>>> >>>>>>>> IIRC, the "Permission denied" is coming from hwloc that cannot >>>>>>>> collect all the info it would like. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Gilles >>>>>>>> >>>>>>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote: >>>>>>>> >>>>>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the >>>>>>>> Studio Compilers (default ILP32 output) and saw the following result >>>>>>>> >>>>>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c' >>>>>>>> Error opening /devices/pci@0,0:reg: Permission denied >>>>>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file >>>>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c >>>>>>>> at line 181 >>>>>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file >>>>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c >>>>>>>> at line 463 >>>>>>>> >>>>>>>> I don't know if the Permission denied error is related to the >>>>>>>> subsequent PMIX errors, but any message that says "UNREACHABLE" is >>>>>>>> clearly >>>>>>>> worth reporting. >>>>>>>> >>>>>>>> -Paul >>>>>>>> >>>>>>>> -- >>>>>>>> Paul H. Hargrove phhargr...@lbl.gov >>>>>>>> Computer Languages & Systems Software (CLaSS) Group >>>>>>>> Computer Science Department Tel: +1-510-495-2352 >>>>>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing listde...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Paul H. Hargrove phhargr...@lbl.gov >>>>>>> Computer Languages & Systems Software (CLaSS) Group >>>>>>> Computer Science Department Tel: +1-510-495-2352 >>>>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Paul H. Hargrove phhargr...@lbl.gov >>>>> Computer Languages & Systems Software (CLaSS) Group >>>>> Computer Science Department Tel: +1-510-495-2352 >>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php >>>>> >>>> >>>> >>>> >>>> -- >>>> Paul H. Hargrove phhargr...@lbl.gov >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: +1-510-495-2352 >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18083.php >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18085.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18086.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
typescript
Description: Binary data