On Sep 20, 2015, at 3:42 AM, Gilles
Gouaillardet <gilles.gouaillar...@gmail.com> wrote:
Paul,
I do not remember it like that ...
at that time, the issue in ompi was that the
global errno was uses instead of the per thread
errno.
though the man pages tells -mt should be used
fir multithreaded apps, you tried -D_REENTRANT
on all your platforms, and it was enough to get
the expected result.
I just wanted to check pmix1xx (sub)configure
did correctly pass the -D_REENTRANT flag, and
it does. so this is very likely a new and
unrelated error
Cheers,
Gilles
On Sunday, September 20, 2015, Paul Hargrove
<phhargr...@lbl.gov> wrote:
Gilles,
Yes every $CC invocation
in opal/mca/pmix/pmix1xx includes
"-D_REENTRANT".
However, they don't include "-mt".
I believe we concluded (when we had
problems previously) that "-mt" was the
proper flag (at compile and link) for
multi-threaded with the Studio compilers.
-Paul
On Sat, Sep 19, 2015 at 11:29 PM, Gilles
Gouaillardet<gilles.gouaillar...@gmail.com>wrote:
Paul,
Can you please double check pmix1xx is
compiled with -D_REENTRANT ?
We ran into similar issues in the past,
and they only occurred with Solaris
Cheers,
Gilles
On Sunday, September 20, 2015, Paul
Hargrove <phhargr...@lbl.gov> wrote:
Ralph,
The output from the requested run
is attached.
-Paul
On Sat, Sep 19, 2015 at 9:46 PM,
Ralph Castain<r...@open-mpi.org>wrote:
Ah, okay - that makes more
sense. I’ll have to let Brice
see if he can figure out how to
silence the hwloc error message
as I can’t find where it came
from. The other errors are real
and are the reason why the job
was terminated.
The problem is that we are
trying to establish a
communication between the app
and the daemon via unix domain
socket, and we failed to do so.
The error tells me that we were
able to create and connect to
the socket, but failed when the
daemon tried to do a blocking
send to the app.
Can you rerun it with -mca
pmix_base_verbose 10? It will
tell us the value of the error
number that was returned
Thanks
Ralph
On Sep 19, 2015, at 9:37 PM,
Paul Hargrove
<phhargr...@lbl.gov> wrote:
Ralph,
No it did not run.
The complete output (which I
really should have included in
the first place) is below.
-Paul
$ mpirun -mca btl sm,self -np
2 examples/ring_c'
Error opening
/devices/pci@0,0:reg:
Permission denied
[pcp-d-3:26054] PMIX ERROR:
ERROR in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
at line 181
[pcp-d-3:26053] PMIX ERROR:
UNREACHABLE in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
at line 463
--------------------------------------------------------------------------
It looks like MPI_INIT failed
for some reason; your parallel
process is
likely to abort. There are
many reasons that a parallel
process can
fail during MPI_INIT; some of
which are due to configuration
or environment
problems. This failure appears
to be an internal failure;
here's some
additional information (which
may only be relevant to an
Open MPI
developer):
ompi_mpi_init: ompi_rte_init
failed
--> Returned "(null)" (-43)
instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL
(processes in this
communicator will now abort,
*** and potentially your
MPI job)
[pcp-d-3:26054] Local abort
before MPI_INIT completed
completed successfully, but am
not able to aggregate error
messages, and not able to
guarantee that all other
processes were killed!
-------------------------------------------------------
Primary job terminated
normally, but 1 process returned
a non-zero exit code.. Per
user-direction, the job has
been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or
more processes exited with
non-zero status, thus causing
the job to be terminated. The
first process to do so was:
Process name: [[11371,1],0]
Exit code: 1
--------------------------------------------------------------------------
On Sat, Sep 19, 2015 at 8:50
PM, Ralph
Castain<r...@open-mpi.org>wrote:
Paul, can you clarify
something for me? The
error in this case
indicates that the client
wasn’t able to reach the
daemon - this should have
resulted in termination of
the job. Did the job
actually run?
On Sep 18, 2015, at 2:50
AM, Ralph Castain
<r...@open-mpi.org> wrote:
I'm on travel right now,
but it should be an easy
fix when I return. Sorry
for the annoyance
On Thu, Sep 17, 2015 at
11:13 PM, Paul
Hargrove<phhargr...@lbl.gov>wrote:
Any suggestion how I
(as a non-root user)
can avoid seeing this
hwloc error message
on every run?
-Paul
On Thu, Sep 17, 2015
at 11:00 PM, Gilles
Gouaillardet<gil...@rist.or.jp>wrote:
Paul,
IIRC, the
"Permission
denied" is coming
from hwloc that
cannot collect
all the info it
would like.
Cheers,
Gilles
On 9/18/2015 2:34
PM, Paul Hargrove
wrote:
Tried tonight's
master tarball
on Solaris 11.2
on x86-64 with
the Studio
Compilers
(default ILP32
output) and saw
the following
result
$ mpirun -mca
btl sm,self -np
2 examples/ring_c'
Error opening
/devices/pci@0,0:reg:
Permission denied
[pcp-d-4:00492]
PMIX ERROR:
ERROR in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
at line 181
[pcp-d-4:00491]
PMIX ERROR:
UNREACHABLE in
file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
at line 463
I don't know if
the Permission
denied error is
related to the
subsequent PMIX
errors, but any
message that
says
"UNREACHABLE" is
clearly worth
reporting.
-Paul
--
Paul H. Hargrove
phhargr...@lbl.gov
Computer
Languages &
Systems Software
(CLaSS) Group
Computer Science
Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence
Berkeley
National
Laboratory
Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18075.php
--
Paul H. Hargrove
phhargr...@lbl.gov
Computer Languages &
Systems Software
(CLaSS) Group
Computer Science
Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley
National Laboratory
Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18076.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18078.php
--
Paul H. Hargrove
phhargr...@lbl.gov
Computer Languages & Systems
Software (CLaSS) Group
Computer Science Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National
Laboratory Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18080.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18081.php
--
Paul H. Hargrove phhargr...@lbl.gov
Computer Languages & Systems
Software (CLaSS) Group
Computer Science Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National
Laboratory Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18083.php
--
Paul H. Hargrove phhargr...@lbl.gov
Computer Languages & Systems Software
(CLaSS) Group
Computer Science Department Tel:
+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18085.php