I'm running snv_108 I see something that looks a lot like 6545725,
which was supposedly fixed in snv_105.  When I search the history via
opengrok and mercurial I see no mention of 6545725.  The hang may be
similar to 6799144 as well.

PSARC/2008/242 changed the code where the kernel is trying to acquire
a mutex after snv_108.  I'm not sure if these changes would have any
effect on the behavior that I see.

If more info is needed, please ask for it ASAP.

What I did:

On a T5210, I was trying to see if any packets were passing on vsw0
using "snoop -d vsw0"

# ps -o pid,args -p `pgrep snoop`
  PID COMMAND
12009 snoop -d vsw0

After seeing no packets I tried to quit snoop with ^C and ^Z.  kill -9
didn't work either.

Poking around in mdb using the same steps used for 6545725, I find...

> ::pgrep snoop
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  12009  11967  12009  11895      0 0x4a004900 0000060027e438f8 snoop
> 0000060027e438f8::walk thread | ::findstack -v
stack pointer for thread 30008303720: 2a100506bf1
[ 000002a100506bf1 cv_wait+0x3c() ]
  000002a100506ca1 dld_close+0x2c(3000c420a58, 3, 60017ace550, 8000,
ff00, 60017ace488)
  000002a100506d51 qdetach+0x8c(3000c420a58, 7bebb6a0, 3, 60028857ca8,
6001173dcc8, 3000c420b50)
  000002a100506e01 strclose+0x36c(6002b559b80, 3, 60028857ca8,
6002a8d7128, 0, 6002bd723a0)
  000002a100506ec1 device_close+0x84(6002b559c80, 3, 6002bd72318,
60028857ca8, 2000, 4)
  000002a100506f71 spec_close+0x13c(6002b559c80, 3, 1, 6002a7f2278,
60028857ca8, 6002a7f2398)
  000002a100507021 fop_close+0x48(6002b559c80, 3, 1, 0, 60028857ca8, 0)
  000002a1005070d1 closef+0x50(60023810150, 3, 1, 6002a0ea890, 0, 18fc800)
  000002a100507181 closeandsetf+0x348(3, 0, 60023810150, 6002b9be000,
c0, 60027e438f8)
  000002a100507231 close+8(3, 0, 2710, 2400, ffbfaf90, 0)
  000002a1005072e1 syscall_trap32+0xcc(3, 0, 2710, 2400, ffbfaf90, 0)
> dld_close+0x2c::dis
dld_close+4:                    ldx       [%i0 + 0x28], %i5
dld_close+8:                    sethi     %hi(0x8000), %i3
dld_close+0xc:                  add       %i5, 0xa0, %l7
dld_close+0x10:                 call      -0x7ae6d8b0   <mutex_enter>
dld_close+0x14:                 mov       %l7, %o0
dld_close+0x18:                 ld        [%i5 + 0xc8], %i2
dld_close+0x1c:                 btst      %i3, %i2
dld_close+0x20:                 be,pn     %icc, +0x24   <dld_close+0x44>
dld_close+0x24:                 add       %i5, 0xc8, %i2
dld_close+0x28:                 mov       %l7, %o1
dld_close+0x2c:                 call      -0x7adcfcd0   <cv_wait>
dld_close+0x30:                 mov       %i2, %o0
dld_close+0x34:                 ld        [%i5 + 0xc8], %i1
dld_close+0x38:                 btst      %i3, %i1
dld_close+0x3c:                 bne,pt    %icc, -0x10   <dld_close+0x2c>
dld_close+0x40:                 mov       %l7, %o1
dld_close+0x44:                 call      -0x7ae6d864   <mutex_exit>
dld_close+0x48:                 mov       %l7, %o0
dld_close+0x4c:                 call      -0x7adc2f4c   <qprocsoff>
dld_close+0x50:                 mov       %i0, %o0
dld_close+0x54:                 ld        [%i5 + 0x24], %i4
> 3000c420a58::print -t queue_t q_ptr
void *q_ptr = 0x60017ace488
> 0x60017ace488::print -t dld_str_t
; (forward declaration)

The snv_108 code looks like:

   282  dld_close(queue_t *rq)
   283  {
   284          dld_str_t       *dsp = rq->q_ptr;
   285
   286          /*
   287           * All modules on top have been popped off. So there
can't be any
   288           * threads from the top.
   289           */
   290          ASSERT(dsp->ds_datathr_cnt == 0);
   291
   292          /*
   293           * Wait until pending DLPI requests are processed.
   294           */
   295          mutex_enter(&dsp->ds_lock);
   296          while (dsp->ds_dlpi_pending)
   297                  cv_wait(&dsp->ds_dlpi_pending_cv, &dsp->ds_lock);
   298          mutex_exit(&dsp->ds_lock);
   299
   300          /*
   301           * Disable the queue srv(9e) routine.
   302           */
   303          qprocsoff(rq);

As I mentioned before, the code above looks different than what I see
today due to the putback for PSARC/2008/242.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to