Argh - found a typo in the output line. Could you please try the attached patch and do it again? This might fix it, but if not it will provide me with some idea of the returned error.

Thanks
Ralph

Attachment: paul.diff
Description: Binary data


On Sep 20, 2015, at 12:40 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

Yes, it is definitely at 10.
Another attempt is attached.
-Paul

On Sun, Sep 20, 2015 at 8:19 AM, Ralph Castain <r...@open-mpi.org> wrote:
Paul - can you please confirm that you gave mpirun a level of 10 for the pmix_base_verbose param? This output isn’t what I would have expected from that level - it looks more like the verbosity was set to 5, and so the error number isn’t printed.

Thanks
Ralph


On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote:

Paul,

I do not remember it like that ...

at that time, the issue in ompi was that the global errno was uses instead of the per thread errno.
though the man pages tells -mt should be used fir multithreaded apps, you tried -D_REENTRANT on all your platforms, and it was enough to get the expected result.

I just wanted to check pmix1xx (sub)configure did correctly pass the -D_REENTRANT flag, and it does. so this is very likely a new and unrelated error

Cheers,

Gilles

On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:
Gilles,

Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT".
However, they don't include "-mt".
I believe we concluded (when we had problems previously) that "-mt" was the proper flag (at compile and link) for multi-threaded with the Studio compilers.

-Paul

On Sat, Sep 19, 2015 at 11:29 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote:
Paul,

Can you please double check pmix1xx is compiled with -D_REENTRANT ?
We ran into similar issues in the past, and they only occurred with Solaris 

Cheers,

Gilles


On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:
Ralph,
The output from the requested run is attached.
-Paul

On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org> wrote:
Ah, okay - that makes more sense. I’ll have to let Brice see if he can figure out how to silence the hwloc error message as I can’t find where it came from. The other errors are real and are the reason why the job was terminated.

The problem is that we are trying to establish a communication between the app and the daemon via unix domain socket, and we failed to do so. The error tells me that we were able to create and connect to the socket, but failed when the daemon tried to do a blocking send to the app.

Can you rerun it with -mca pmix_base_verbose 10? It will tell us the value of the error number that was returned

Thanks
Ralph


On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

Ralph,

No it did not run.
The complete output (which I really should have included in the first place) is below.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
Error opening /devices/pci@0,0:reg: Permission denied
[pcp-d-3:26054] PMIX ERROR: ERROR in file /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c at line 181
[pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c at line 463
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[pcp-d-3:26054] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[11371,1],0]
  Exit code:    1
--------------------------------------------------------------------------

On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org> wrote:
Paul, can you clarify something for me? The error in this case indicates that the client wasn’t able to reach the daemon - this should have resulted in termination of the job. Did the job actually run?


On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org> wrote:

I'm on travel right now, but it should be an easy fix when I return. Sorry for the annoyance


On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
Any suggestion how I (as a non-root user) can avoid seeing this hwloc error message on every run?

-Paul

On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
Paul,

IIRC, the "Permission denied" is coming from hwloc that cannot collect all the info it would like.

Cheers,

Gilles 

On 9/18/2015 2:34 PM, Paul Hargrove wrote:
Tried tonight's master tarball on Solaris 11.2 on x86-64 with the Studio Compilers  (default ILP32 output) and saw the following result

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
Error opening /devices/pci@0,0:reg: Permission denied
[pcp-d-4:00492] PMIX ERROR: ERROR in file /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c at line 181
[pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c at line 463

I don't know if the Permission denied error is related to the subsequent PMIX errors, but any message that says "UNREACHABLE" is clearly worth reporting.

-Paul

-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18074.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18075.php



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18076.php



_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18078.php



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18080.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18081.php



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18083.php



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18085.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18086.php



--
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
<typescript>_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18087.php

Reply via email to