Ralph,
The output from the requested run is attached.
-Paul

On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Ah, okay - that makes more sense. I’ll have to let Brice see if he can
> figure out how to silence the hwloc error message as I can’t find where it
> came from. The other errors are real and are the reason why the job was
> terminated.
>
> The problem is that we are trying to establish a communication between the
> app and the daemon via unix domain socket, and we failed to do so. The
> error tells me that we were able to create and connect to the socket, but
> failed when the daemon tried to do a blocking send to the app.
>
> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the value
> of the error number that was returned
>
> Thanks
> Ralph
>
>
> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Ralph,
>
> No it did not run.
> The complete output (which I really should have included in the first
> place) is below.
>
> -Paul
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> Error opening /devices/pci@0,0:reg: Permission denied
> [pcp-d-3:26054] PMIX ERROR: ERROR in file
> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
> at line 181
> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file
> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
> at line 463
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [pcp-d-3:26054] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[11371,1],0]
>   Exit code:    1
> --------------------------------------------------------------------------
>
> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Paul, can you clarify something for me? The error in this case indicates
>> that the client wasn’t able to reach the daemon - this should have resulted
>> in termination of the job. Did the job actually run?
>>
>>
>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>> I'm on travel right now, but it should be an easy fix when I return.
>> Sorry for the annoyance
>>
>>
>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov>
>> wrote:
>>
>>> Any suggestion how I (as a non-root user) can avoid seeing this hwloc
>>> error message on every run?
>>>
>>> -Paul
>>>
>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <gil...@rist.or.jp
>>> > wrote:
>>>
>>>> Paul,
>>>>
>>>> IIRC, the "Permission denied" is coming from hwloc that cannot collect
>>>> all the info it would like.
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote:
>>>>
>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the
>>>> Studio Compilers  (default ILP32 output) and saw the following result
>>>>
>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>>> Error opening /devices/pci@0,0:reg: Permission denied
>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file
>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>>> at line 181
>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file
>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>> at line 463
>>>>
>>>> I don't know if the Permission denied error is related to the
>>>> subsequent PMIX errors, but any message that says "UNREACHABLE" is clearly
>>>> worth reporting.
>>>>
>>>> -Paul
>>>>
>>>> --
>>>> Paul H. Hargrove                           <phhargr...@lbl.gov>
>>>> phhargr...@lbl.gov
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing listde...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php
>>>>
>>>
>>>
>>>
>>> --
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php
>>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
Error opening /devices/pci@0,0:reg: Permission denied
[pcp-d-3:25836] mca: base: components_register: registering framework pmix 
components
[pcp-d-3:25836] mca: base: components_register: found loaded component pmix1xx
[pcp-d-3:25836] mca: base: components_register: component pmix1xx has no 
register or open function
[pcp-d-3:25836] mca: base: components_open: opening pmix components
[pcp-d-3:25836] mca: base: components_open: found loaded component pmix1xx
[pcp-d-3:25836] mca: base: components_open: component pmix1xx open function 
successful
[pcp-d-3:25836] mca:base:select: Auto-selecting pmix components
[pcp-d-3:25836] mca:base:select:( pmix) Querying component [pmix1xx]
[pcp-d-3:25836] mca:base:select:( pmix) Query of component [pmix1xx] set 
priority to 5
[pcp-d-3:25836] mca:base:select:( pmix) Selected component [pmix1xx]
[pcp-d-3:25836] pmix:server init called
[pcp-d-3:25836] sec: native init
[pcp-d-3:25836] sec: SPC native active
[pcp-d-3:25836] pmix:server constructed uri 
pmix-server:25836:/tmp/openmpi-sessions-19214@pcp-d-3_0/11586/0/0/pmix-25836
[pcp-d-3:25836] listen_thread: active
[pcp-d-3:25836] pmix:server register client 759300097:0
[pcp-d-3:25836] pmix:server register client 759300097:1
[pcp-d-3:25836] pmix:server _register_client for nspace 759300097 rank 0
[pcp-d-3:25836] pmix:server setup_fork for nspace 759300097 rank 0
[pcp-d-3:25836] pmix:server _register_client for nspace 759300097 rank 1
[pcp-d-3:25836] pmix:server _register_nspace
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.ltopo
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.jobid
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.offset
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.nmap
[pcp-d-3:25836] pmix:extract:nodes: checking list: pcp-d-3
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.pmap
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.nodeid
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.node.size
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.lpeers
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.lcpus
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.lldr
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.univ.size
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.job.size
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.local.size
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.max.size
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.pdata
[pcp-d-3:25836] pmix:server _register_nspace recording pmix.pdata
[pcp-d-3:25836] pmix:server setup_fork for nspace 759300097 rank 1
[pcp-d-3:25839] mca: base: components_register: registering framework pmix 
components
[pcp-d-3:25839] mca: base: components_register: found loaded component pmix1xx
[pcp-d-3:25839] mca: base: components_register: component pmix1xx has no 
register or open function
[pcp-d-3:25839] mca: base: components_open: opening pmix components
[pcp-d-3:25839] mca: base: components_open: found loaded component pmix1xx
[pcp-d-3:25839] mca: base: components_open: component pmix1xx open function 
successful
[pcp-d-3:25839] mca:base:select: Auto-selecting pmix components
[pcp-d-3:25839] mca:base:select:( pmix) Querying component [pmix1xx]
[pcp-d-3:25839] mca:base:select:( pmix) Query of component [pmix1xx] set 
priority to 100
[pcp-d-3:25839] mca:base:select:( pmix) Selected component [pmix1xx]
[pcp-d-3:25839] PMIx_client init
[pcp-d-3:25839] pmix: init called
[pcp-d-3:25839] posting notification recv on tag 0
[pcp-d-3:25839] sec: native init
[pcp-d-3:25839] sec: SPC native active
[pcp-d-3:25839] usock_peer_try_connect: attempting to connect to server
[pcp-d-3:25839] usock_peer_try_connect: attempting to connect to server on 
socket 15
[pcp-d-3:25839] pmix: SEND CONNECT ACK
[pcp-d-3:25839] sec: native create_cred
[pcp-d-3:25839] sec: using credential 19214:5513
[pcp-d-3:25839] send blocking of 49 bytes to socket 15
[pcp-d-3:25839] blocking send complete to socket 15
[pcp-d-3:25839] pmix: RECV CONNECT ACK FROM SERVER
[pcp-d-3:25839] PMIX ERROR: ERROR in file 
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
 at line 181
[pcp-d-3:25836] listen_thread: new connection: (32, 0)
[pcp-d-3:25839] mca: base: close: component pmix1xx closed
[pcp-d-3:25839] mca: base: close: unloading component pmix1xx
[pcp-d-3:25836] connection_handler: new connection: 32
[pcp-d-3:25836] RECV CONNECT ACK FROM PEER ON SOCKET 32
[pcp-d-3:25836] waiting for blocking recv of 16 bytes
[pcp-d-3:25836] blocking receive complete from remote
[pcp-d-3:25836] waiting for blocking recv of 33 bytes
[pcp-d-3:25836] blocking receive complete from remote
[pcp-d-3:25836] connect-ack recvd from peer 759300097:1
[pcp-d-3:25836] sec: native validate_cred 19214:5513
[pcp-d-3:25836] sec: native credential valid
[pcp-d-3:25836] client credential validated
[pcp-d-3:25836] send blocking of 4 bytes to socket 32
[pcp-d-3:25836] PMIX ERROR: UNREACHABLE in file 
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
 at line 463
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[pcp-d-3:25839] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[11586,1],1]
  Exit code:    1
--------------------------------------------------------------------------
[pcp-d-3:25836] pmix:server finalize called
[pcp-d-3:25836] listen_thread: shutdown
[pcp-d-3:25836] sec: native finalize
[pcp-d-3:25836] pmix:server finalize complete
[pcp-d-3:25836] mca: base: close: component pmix1xx closed
[pcp-d-3:25836] mca: base: close: unloading component pmix1xx

Reply via email to