I'm running snv_108 I see something that looks a lot like 6545725,
which was supposedly fixed in snv_105. When I search the history via
opengrok and mercurial I see no mention of 6545725. The hang may be
similar to 6799144 as well.
PSARC/2008/242 changed the code where the kernel is trying to acquire
a mutex after snv_108. I'm not sure if these changes would have any
effect on the behavior that I see.
If more info is needed, please ask for it ASAP.
What I did:
On a T5210, I was trying to see if any packets were passing on vsw0
using "snoop -d vsw0"
# ps -o pid,args -p `pgrep snoop`
PID COMMAND
12009 snoop -d vsw0
After seeing no packets I tried to quit snoop with ^C and ^Z. kill -9
didn't work either.
Poking around in mdb using the same steps used for 6545725, I find...
> ::pgrep snoop
S PID PPID PGID SID UID FLAGS ADDR NAME
R 12009 11967 12009 11895 0 0x4a004900 0000060027e438f8 snoop
> 0000060027e438f8::walk thread | ::findstack -v
stack pointer for thread 30008303720: 2a100506bf1
[ 000002a100506bf1 cv_wait+0x3c() ]
000002a100506ca1 dld_close+0x2c(3000c420a58, 3, 60017ace550, 8000,
ff00, 60017ace488)
000002a100506d51 qdetach+0x8c(3000c420a58, 7bebb6a0, 3, 60028857ca8,
6001173dcc8, 3000c420b50)
000002a100506e01 strclose+0x36c(6002b559b80, 3, 60028857ca8,
6002a8d7128, 0, 6002bd723a0)
000002a100506ec1 device_close+0x84(6002b559c80, 3, 6002bd72318,
60028857ca8, 2000, 4)
000002a100506f71 spec_close+0x13c(6002b559c80, 3, 1, 6002a7f2278,
60028857ca8, 6002a7f2398)
000002a100507021 fop_close+0x48(6002b559c80, 3, 1, 0, 60028857ca8, 0)
000002a1005070d1 closef+0x50(60023810150, 3, 1, 6002a0ea890, 0, 18fc800)
000002a100507181 closeandsetf+0x348(3, 0, 60023810150, 6002b9be000,
c0, 60027e438f8)
000002a100507231 close+8(3, 0, 2710, 2400, ffbfaf90, 0)
000002a1005072e1 syscall_trap32+0xcc(3, 0, 2710, 2400, ffbfaf90, 0)
> dld_close+0x2c::dis
dld_close+4: ldx [%i0 + 0x28], %i5
dld_close+8: sethi %hi(0x8000), %i3
dld_close+0xc: add %i5, 0xa0, %l7
dld_close+0x10: call -0x7ae6d8b0 <mutex_enter>
dld_close+0x14: mov %l7, %o0
dld_close+0x18: ld [%i5 + 0xc8], %i2
dld_close+0x1c: btst %i3, %i2
dld_close+0x20: be,pn %icc, +0x24 <dld_close+0x44>
dld_close+0x24: add %i5, 0xc8, %i2
dld_close+0x28: mov %l7, %o1
dld_close+0x2c: call -0x7adcfcd0 <cv_wait>
dld_close+0x30: mov %i2, %o0
dld_close+0x34: ld [%i5 + 0xc8], %i1
dld_close+0x38: btst %i3, %i1
dld_close+0x3c: bne,pt %icc, -0x10 <dld_close+0x2c>
dld_close+0x40: mov %l7, %o1
dld_close+0x44: call -0x7ae6d864 <mutex_exit>
dld_close+0x48: mov %l7, %o0
dld_close+0x4c: call -0x7adc2f4c <qprocsoff>
dld_close+0x50: mov %i0, %o0
dld_close+0x54: ld [%i5 + 0x24], %i4
> 3000c420a58::print -t queue_t q_ptr
void *q_ptr = 0x60017ace488
> 0x60017ace488::print -t dld_str_t
; (forward declaration)
The snv_108 code looks like:
282 dld_close(queue_t *rq)
283 {
284 dld_str_t *dsp = rq->q_ptr;
285
286 /*
287 * All modules on top have been popped off. So there
can't be any
288 * threads from the top.
289 */
290 ASSERT(dsp->ds_datathr_cnt == 0);
291
292 /*
293 * Wait until pending DLPI requests are processed.
294 */
295 mutex_enter(&dsp->ds_lock);
296 while (dsp->ds_dlpi_pending)
297 cv_wait(&dsp->ds_dlpi_pending_cv, &dsp->ds_lock);
298 mutex_exit(&dsp->ds_lock);
299
300 /*
301 * Disable the queue srv(9e) routine.
302 */
303 qprocsoff(rq);
As I mentioned before, the code above looks different than what I see
today due to the putback for PSARC/2008/242.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
networking-discuss mailing list
[email protected]