Hi Zhi,
Zhi Huang wrote:
> Hi all,
>
> I am working on a multi-thread application hang problem on Solaris. Stack
> trace shows that a thread is trying to grep a mutex by calling:
>
> ----------------- lwp# 12 / thread# 11 --------------------
> ff11f7f8 lwp_mutex_lock (fe7a73c8)
> ff26ae80 _mutex_lwp_lock (ff299218, ff28c000, ffffabcd, 561b94, fe7a73c8, 0)
> + 4
> ....
>
> Then it sits there and waits forever for mutex (fe7a73c8), the resources hold
> by this thread (lwp# 12) blocks other threads, and thus cause the entire
> application to hang.
>
> I am just wondering if there are any methods to print out the information
> about mutex (fe7a73c8) so that it tells what ** THREAD ** within the
> application is holding such mutex. Ideally, I can add those diagnostic codes
> into the application, it then collects those mutex information when the
> problem re-occurs, and allows me to further investigate the problem.
>
With mdb, you can print the mutex information (in Solaris 10). I have
an example application that does deadlock
on purpose, but my stack trace looks a little different. Regardless,
this should work. There may be a simpler
way to do this with dbx/gdb, but I'll show it with mdb.
bash-3.00$ ./deadlock & <-- first, start the program that has deadlock
[2] 1177
bash-3.00$ pstack 1177 <-- here is the stack trace
1177: ./deadlock
----------------- lwp# 1 / thread# 1 --------------------
d2b3ec05 pause ()
08050c8c main (1, 80471b0, 80471b8) + 4c
08050aea _start (1, 80472ec, 0, 80472f7, 80473d7, 80473e9) + 7a
----------------- lwp# 2 / thread# 2 --------------------
d2b3dc09 lwp_park (0, 0, 0)
d2b36ab3 mutex_lock_impl (8060ea8, 0) + f3 <-- thread 2 is waiting on
lock at 8060ea8
d2b36b42 mutex_lock (8060ea8) + 10
08050ba4 dead1 (0) + 24
d2b3d952 _thr_setup (d2950200) + 52
d2b3dbb0 _lwp_start (d2950200, 0, 0, 0, 0, 0)
----------------- lwp# 3 / thread# 3 --------------------
d2b3dc09 lwp_park (0, 0, 0)
d2b36ab3 mutex_lock_impl (8060e90, 0) + f3 <-- thread 3 is waiting on
lock at 8060e90
d2b36b42 mutex_lock (8060e90) + 10
08050c04 dead2 (0) + 24
d2b3d952 _thr_setup (d2950a00) + 52
d2b3dbb0 _lwp_start (d2950a00, 0, 0, 0, 0, 0)
bash-3.00$ mdb ./deadlock <-- start mdb on the program
> 0t1177:A <-- attach to the deadlocked process
Loading modules: [ ld.so.1 libc.so.1 ]
> 8060ea8::print mutex_t data <-- print the mutex_t (the mutex_t
looks almost identical to _lwp_mutex)
data = 0xd2950a00 <-- this is the owner ulwp_t
> d2950a00::print ulwp_t ul_lwpid ul_wchan <-- print the lwpid of the
owner and what it is waiting on (wchan)
ul_lwpid = 0x3
ul_wchan = deadlock`l1 <-- you will probably see an address instead of
deadlock`l1
> deadlock`l1::print mutex_t data <-- now print that mutex_t (again,
you will probably use an address, not deadlock`l1)
data = 0xd2950200
> d2950200::print ulwp_t ul_lwpid ul_wchan <-- print the lwpid and
wchan of the owner
ul_lwpid = 0x2
ul_wchan = deadlock`l2 <-- the owner of the first mutex is waiting on
the owner of the second, and vice versa
> deadlock`l2=K <-- print the address of the second mutex
8060ea8 <-- this is back where we started, cycle in
blocking chain indicates deadlock
>
Your situation looks a little different, but technique should be the same.
I hope this helps.
max
PS. Here is the deadlock.c program...
bash-3.00$ cat deadlock.c
#include <pthread.h>
pthread_mutex_t l1, l2;
void *
dead1(void *arg)
{
pthread_mutex_lock(&l1);
sleep(5);
pthread_mutex_lock(&l2);
sleep(5);
pthread_mutex_unlock(&l2);
sleep(5);
pthread_mutex_unlock(&l1);
}
void *
dead2(void *arg)
{
pthread_mutex_lock(&l2);
sleep(5);
pthread_mutex_lock(&l1);
sleep(5);
pthread_mutex_unlock(&l1);
sleep(5);
pthread_mutex_unlock(&l2);
}
main()
{
pthread_mutex_init(&l1, NULL);
pthread_mutex_init(&l2, NULL);
pthread_create(NULL, NULL, dead1, NULL);
pthread_create(NULL, NULL, dead2, NULL);
pause();
}
bash-3.00$
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code