Re: [OMPI devel] Non-threaded test fails with thread-safe library

2006-05-16 Thread Rolf Vandevaart


Hi Brian:

Here is the stack trace from the core dump.  I am also trying to understand
better what is happening here, but I figured I needed to get this off
to you.
Rolf

burl-ct-v440-4 96 =>dbx connectivity core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.4' in 
your .dbxrc

Reading connectivity
core file header read successfully
[...snip...]
(dbx) where
current thread: t@1
 [1] _lwp_kill(0x0, 0x6, 0x0, 0x6, 0xfc00, 0x0), at 0xfd840f90
 [2] raise(0x6, 0x0, 0xfd824a98, 0x, 0xfd868284, 0x6), at 
0xfd7dfd78

 [3] abort(0xffbfee00, 0x1, 0x0, 0xa83f0, 0xfd86b298, 0x0), at 0xfd7bff98
=>[4] opal_mutex_lock(m = 0xfd0b12e8), line 101 in "mutex_unix.h"
 [5] __ompi_free_list_wait(fl = 0xfd0b1298, item = 0xffbfef88), line 
167 in "ompi_free_list.h"
 [6] mca_pml_ob1_recv_frag_match(btl = 0xfcfbc778, hdr = 0xdc897260, 
segments = 0xdc897218, num_segments = 1U), line 550 in "pml_ob1_recvfrag.c"
 [7] mca_pml_ob1_recv_frag_callback(btl = 0xfcfbc778, tag = '\001', des 
= 0xdc8971d0, cbdata = (nil)), line 80 in "pml_ob1_recvfrag.c"

 [8] mca_btl_sm_component_progress(), line 396 in "btl_sm_component.c"
 [9] mca_bml_r2_progress(), line 103 in "bml_r2.c"
 [10] opal_progress(), line 288 in "opal_progress.c"
 [11] opal_condition_wait(c = 0xff29d3b8, m = 0xff29d430), line 75 in 
"condition.h"
 [12] mca_pml_ob1_recv(addr = 0xffbff4b0, count = 1U, datatype = 
0x21458, src = 0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line 
101 in "pml_ob1_irecv.c"
 [13] PMPI_Recv(buf = 0xffbff4b0, count = 1, type = 0x21458, source = 
0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line 66 in "precv.c"

 [14] main(argc = 2, argv = 0xffbff53c), line 69 in "connectivity.c"
(dbx)



Brian Barrett wrote On 05/11/06 02:57,:

Eeeks!  That sounds like a bug.  Can you attach a debugger and get a  
stack trace for the situation where that occurs?


Brian

On May 10, 2006, at 10:17 PM, Rolf Vandevaart wrote:

 

I have built a library with "--enable-mpi-threads --with- 
threads=posix"

(using
the trunk) and tried running a simple non-threaded program linked
against it.
The program just calls to MPI_Send and MPI_Recv so every process  
sends an

MPI_INT to one another.

When I run it I see the following:

burl-ct-v440-4 86 =>mpirun -np 4 connectivity -v
burl-ct-v440-4: checking connection0 <-> 1
burl-ct-v440-4: checking connection1 <-> 2
burl-ct-v440-4: checking connection0 <-> 2
opal_mutex_lock(): Deadlock situation detected/avoided
Signal:6 info.si_errno:0(Error 0) si_code:-1()
*** End of error message ***
burl-ct-v440-4 87 =>

Since I had the debug enabled, I get to see that one of the processes
was trying to grab a lock that it already head.(Nice feature  
having

that error printed out!)

Has anyone else seen this?  As I said, this is a non-threaded program
so there is only one thread per process.   I am wondering if I am  
missing
something basic in the building of my library.  This test works  
fine against
a library configured without "--enable-mpi-threads --with- 
threads=posix".


Rolf







--

=
rolf.vandeva...@sun.com
781-442-3043
=

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



--

=
rolf.vandeva...@sun.com
781-442-3043
=



Re: [OMPI devel] Non-threaded test fails with thread-safe library

2006-05-16 Thread George Bosilca
Commit 9946 solve the problem. I mixed the return value of the  
trylock call, considering that any not zero value was a success when  
in fact 0 is a success. Anyway, now it's fixed on the trunk.


  george.

On May 16, 2006, at 11:07 AM, Rolf Vandevaart wrote:



Hi Brian:

Here is the stack trace from the core dump.  I am also trying to  
understand

better what is happening here, but I figured I needed to get this off
to you.
Rolf

 burl-ct-v440-4 96 =>dbx connectivity core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.4' in
your .dbxrc
Reading connectivity
core file header read successfully
[...snip...]
(dbx) where
current thread: t@1
  [1] _lwp_kill(0x0, 0x6, 0x0, 0x6, 0xfc00, 0x0), at 0xfd840f90
  [2] raise(0x6, 0x0, 0xfd824a98, 0x, 0xfd868284, 0x6), at
0xfd7dfd78
  [3] abort(0xffbfee00, 0x1, 0x0, 0xa83f0, 0xfd86b298, 0x0), at  
0xfd7bff98

=>[4] opal_mutex_lock(m = 0xfd0b12e8), line 101 in "mutex_unix.h"
  [5] __ompi_free_list_wait(fl = 0xfd0b1298, item = 0xffbfef88), line
167 in "ompi_free_list.h"
  [6] mca_pml_ob1_recv_frag_match(btl = 0xfcfbc778, hdr = 0xdc897260,
segments = 0xdc897218, num_segments = 1U), line 550 in  
"pml_ob1_recvfrag.c"
  [7] mca_pml_ob1_recv_frag_callback(btl = 0xfcfbc778, tag =  
'\001', des

= 0xdc8971d0, cbdata = (nil)), line 80 in "pml_ob1_recvfrag.c"
  [8] mca_btl_sm_component_progress(), line 396 in  
"btl_sm_component.c"

  [9] mca_bml_r2_progress(), line 103 in "bml_r2.c"
  [10] opal_progress(), line 288 in "opal_progress.c"
  [11] opal_condition_wait(c = 0xff29d3b8, m = 0xff29d430), line 75 in
"condition.h"
  [12] mca_pml_ob1_recv(addr = 0xffbff4b0, count = 1U, datatype =
0x21458, src = 0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line
101 in "pml_ob1_irecv.c"
  [13] PMPI_Recv(buf = 0xffbff4b0, count = 1, type = 0x21458, source =
0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line 66 in "precv.c"
  [14] main(argc = 2, argv = 0xffbff53c), line 69 in "connectivity.c"
(dbx)



Brian Barrett wrote On 05/11/06 02:57,:


Eeeks!  That sounds like a bug.  Can you attach a debugger and get a
stack trace for the situation where that occurs?

Brian

On May 10, 2006, at 10:17 PM, Rolf Vandevaart wrote:




I have built a library with "--enable-mpi-threads --with-
threads=posix"
(using
the trunk) and tried running a simple non-threaded program linked
against it.
The program just calls to MPI_Send and MPI_Recv so every process
sends an
MPI_INT to one another.

When I run it I see the following:

burl-ct-v440-4 86 =>mpirun -np 4 connectivity -v
burl-ct-v440-4: checking connection0 <-> 1
burl-ct-v440-4: checking connection1 <-> 2
burl-ct-v440-4: checking connection0 <-> 2
opal_mutex_lock(): Deadlock situation detected/avoided
Signal:6 info.si_errno:0(Error 0) si_code:-1()
*** End of error message ***
burl-ct-v440-4 87 =>

Since I had the debug enabled, I get to see that one of the  
processes

was trying to grab a lock that it already head.(Nice feature
having
that error printed out!)

Has anyone else seen this?  As I said, this is a non-threaded  
program

so there is only one thread per process.   I am wondering if I am
missing
something basic in the building of my library.  This test works
fine against
a library configured without "--enable-mpi-threads --with-
threads=posix".

Rolf







--

=
rolf.vandeva...@sun.com
781-442-3043
=

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--

=
rolf.vandeva...@sun.com
781-442-3043
=

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran