Hi Zhi,

Zhi Huang wrote:
> Hi all,
>
> I am working on a multi-thread application hang problem on Solaris. Stack 
> trace shows that a thread is trying to grep a mutex by calling: 
>
> -----------------  lwp# 12 / thread# 11  --------------------
>  ff11f7f8 lwp_mutex_lock (fe7a73c8)
>  ff26ae80 _mutex_lwp_lock (ff299218, ff28c000, ffffabcd, 561b94, fe7a73c8, 0) 
> + 4
> ....
>
> Then it sits there and waits forever for mutex (fe7a73c8), the resources hold 
> by this thread (lwp# 12) blocks other threads, and thus cause the entire 
> application to hang. 
>
> I am just wondering if there are any methods to print out the information 
> about mutex (fe7a73c8) so that it tells what ** THREAD ** within the 
> application is holding such mutex. Ideally, I can add those diagnostic codes 
> into the application, it then collects those mutex information when the 
> problem re-occurs, and allows me to further investigate the problem. 
>   
With mdb, you can print the mutex information (in Solaris 10).  I have 
an example application that does deadlock
on purpose, but my stack trace looks a little different.  Regardless, 
this should work.  There may be a simpler
way to do this with dbx/gdb, but I'll show it with mdb.

bash-3.00$ ./deadlock &   <-- first, start the program that has deadlock
[2] 1177

bash-3.00$ pstack 1177   <-- here is the stack trace
1177:   ./deadlock
-----------------  lwp# 1 / thread# 1  --------------------
 d2b3ec05 pause    ()
 08050c8c main     (1, 80471b0, 80471b8) + 4c
 08050aea _start   (1, 80472ec, 0, 80472f7, 80473d7, 80473e9) + 7a
-----------------  lwp# 2 / thread# 2  --------------------
 d2b3dc09 lwp_park (0, 0, 0)
 d2b36ab3 mutex_lock_impl (8060ea8, 0) + f3   <-- thread 2 is waiting on 
lock at 8060ea8
 d2b36b42 mutex_lock (8060ea8) + 10
 08050ba4 dead1    (0) + 24
 d2b3d952 _thr_setup (d2950200) + 52
 d2b3dbb0 _lwp_start (d2950200, 0, 0, 0, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------
 d2b3dc09 lwp_park (0, 0, 0)
 d2b36ab3 mutex_lock_impl (8060e90, 0) + f3   <-- thread 3 is waiting on 
lock at 8060e90
 d2b36b42 mutex_lock (8060e90) + 10
 08050c04 dead2    (0) + 24
 d2b3d952 _thr_setup (d2950a00) + 52
 d2b3dbb0 _lwp_start (d2950a00, 0, 0, 0, 0, 0)

bash-3.00$ mdb ./deadlock  <-- start mdb on the program
 > 0t1177:A    <-- attach to the deadlocked process
Loading modules: [ ld.so.1 libc.so.1 ]

 > 8060ea8::print mutex_t data  <-- print the mutex_t  (the mutex_t 
looks almost identical to _lwp_mutex)
data = 0xd2950a00  <-- this is the owner ulwp_t

 > d2950a00::print ulwp_t ul_lwpid ul_wchan  <-- print the lwpid of the 
owner and what it is waiting on (wchan)
ul_lwpid = 0x3
ul_wchan = deadlock`l1   <-- you will probably see an address instead of 
deadlock`l1

 > deadlock`l1::print mutex_t data   <-- now print that mutex_t (again, 
you will probably use an address, not deadlock`l1)
data = 0xd2950200

 > d2950200::print ulwp_t ul_lwpid ul_wchan  <-- print the lwpid and 
wchan of the owner
ul_lwpid = 0x2
ul_wchan = deadlock`l2   <-- the owner of the first mutex is waiting on 
the owner of the second, and vice versa

 > deadlock`l2=K  <-- print the address of the second mutex
                8060ea8     <-- this is back where we started, cycle in 
blocking chain indicates deadlock  
 >

Your situation looks a little different, but technique should be the same.

I hope this helps.

max

PS.  Here is the deadlock.c program...

bash-3.00$ cat deadlock.c
#include <pthread.h>

pthread_mutex_t l1, l2;

void *
dead1(void *arg)
{
  pthread_mutex_lock(&l1);
  sleep(5);
  pthread_mutex_lock(&l2);
  sleep(5);
  pthread_mutex_unlock(&l2);
  sleep(5);
  pthread_mutex_unlock(&l1);
}

void *
dead2(void *arg)
{
  pthread_mutex_lock(&l2);
  sleep(5);
  pthread_mutex_lock(&l1);
  sleep(5);
  pthread_mutex_unlock(&l1);
  sleep(5);
  pthread_mutex_unlock(&l2);
}


main()
{
  pthread_mutex_init(&l1, NULL);
  pthread_mutex_init(&l2, NULL);

  pthread_create(NULL, NULL, dead1, NULL);
  pthread_create(NULL, NULL, dead2, NULL);

  pause();
}
bash-3.00$


_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to