Douglas Atique wrote:
>Are all these threads waiting on the same condition variable? What are they
>waiting for? Just avoiding busy looping while a mutex doesn't get released?
>Which thread is supposed to wake them up? Where in the source tree is the code
>that does ndi_devi_enter/ndi_devi_online?
>
>
A beautiful thing in Solaris is that, using mdb and Dtrace, one can
easily know what exactly is happening in the kernel. This is ideal for
developers.
I'll use a crash dump that I reproduced on my laptop with a GPRS Card.
Let's start with the kernel thread for prtconf -D
bash# mdb 0
> ::threadlist -v 9 ! less
d5245800 d4dd01b0 d519c920 1 59 db0c0f70
PC: 0xfe82cea8 CMD: prtconf -D
stack pointer for thread d5245800: d534ece8
swtch+0x165()
cv_wait+0x4e(db0c0f70, db0c0ee0)
ndi_devi_enter+0x4a(db0c0ea8, dadfa940)
di_copytree+0x60(db0c0ea8, d56bf020, d5c83500)
di_snapshot+0x13c(d5c83500)
di_snapshot_and_clean+0x16(d5c83500)
di_ioctl+0x3c2()
cdev_ioctl+0x2e(1600006, df01, 804775c, 100001, d58e4330, d534ef78)
spec_ioctl+0x65(d5aa8f00, df01, 804775c, 100001, d58e4330, d534ef78)
fop_ioctl+0x27(d5aa8f00, df01, 804775c, 100001, d58e4330, d534ef78)
ioctl+0x151()
sys_sysenter+0x100()
The first argument of ndi_devi_enter() is a dev_info structure. It has
a devi_busy_thread member showing the thread that blocks the caller of
ndi_devi_enter(). So prtconf is blocked by some thread:
> db0c0ea8::print struct dev_info devi_busy_thread
devi_busy_thread = 0xd4763de0
> 0xd4763de0::findstack -v
stack pointer for thread d4763de0: d4763b58
d4763b84 swtch+0x165()
d4763b94 cv_wait+0x4e(db0c08b0, db0c0820)
d4763bb4 ndi_devi_enter+0x4a(db0c07e8, d4763bdc)
d4763be0 find_dip+0x109(db0c07e8, d5543eb8, 1)
d4763c18 find_dip+0x11f(db0c0b48, d4763c31, 1)
d4763d44 pm_name_to_dip+0x60(d5543e80, 1)
d4763d7c pm_keeper+0x31(d5543e80)
d4763d98 pm_process_dep_request+0x71(d5ac7210)
d4763dc8 pm_dep_thread+0xce(0, 0)
d4763dd8 thread_start+8()
This thread is also calling ndi_devi_enter() thus blocked by
yet another thread:
> db0c07e8::print struct dev_info devi_busy_thread
devi_busy_thread = 0xd4655de0
> 0xd4655de0::findstack -v
stack pointer for thread d4655de0: d4655c28
d4655c54 swtch+0x165()
d4655c64 cv_wait+0x4e(d5543f18, d5543f10)
d4655c80 mt_config_fini+0x25(d5543f10)
d4655c94 config_grand_children+0x2f(d52b25a0, 1020008, ffffffff)
d4655cbc devi_config_common+0xb2()
d4655cd4 ndi_devi_config+0x13(d52b25a0, 1020008)
d4655d04 ndi_devi_online+0xe0(d52b25a0, 0)
d4655d68 hubd_hotplug_thread+0x3e3()
d4655dc8 taskq_d_thread+0x9c(d44d9fa0, 0)
d4655dd8 thread_start+8()
mt_config_fini() is waiting on a conditional variable (cv_wait()) that
needs to be signaled by cv_broadcast() in mt_config_thread(), see
usr/src/uts/common/os/devcfg.c (config_grand_children()).
Let's look at what mt_config_thread() is doing:
d4d07de0 fec1f0bc 0 0 60 db0c08b0
PC: 0xfe82cea8 THREAD: mt_config_thread()
stack pointer for thread d4d07de0: d4d07968
swtch+0x165()
cv_wait+0x4e(db0c08b0, db0c0820)
ndi_devi_enter+0x4a(db0c07e8, d4d07ae0)
pm_lock_power_single+0x7e(db0c07e8, d4d07ae0)
pm_default_ctlops+0x78()
pm_ctlops+0x4c(0, db0c07e8, 15, d4d07a54, d4d07a80)
pm_lock_power+0x37(db0c07e8, d4d07ae0)
pm_busop_set_power+0x1ba(db0c07e8, 0, 0, d4d07b18, d4d07bb8)
pm_busop_bus_power+0x1bf(db0c07e8, 0, 1, d4d07b80, d4d07bb8)
pm_all_to_normal_nexus+0x88(db0c07e8, 0)
pm_busop_bus_power+0x6f(db0c0b48, 0, 0, d4d07cc8, d4d07cfc)
pm_busop_bus_power+0x97(db0c0ea8, 0, 0, d4d07cc8, d4d07cfc)
pm_set_power+0xbc(d52b26c0, 0, 3, fffffffe, 0, 0, d4d07d54)
pm_all_to_normal+0x62(d52b26c0, 0)
pm_pre_config+0x30(d52b26c0, 0)
devi_config_common+0x2a(d52b26c0, 1020008, ffffffff)
mt_config_thread+0x40(d5a9aba0, 0)
thread_start+8()
It's blocked by ndi_devi_enter(), waiting for the previous thread:
> db0c07e8::print struct dev_info devi_busy_thread
devi_busy_thread = 0xd4655de0
mt_config_thread() and config_grand_children() are deadlocking each other.
Vincent.