On Tue, Feb 10, 2009 at 4:41 PM, Jose Borrego <Jose.Borrego at sun.com> wrote:
> This could be related to the oplocks being enabled. To know more I need the
> crash dump. If you can send it to me I'll take a look at it.
>
> - Jose
>
>
> On 02/10/09 00:11, Brent Jones wrote:
>>
>> Hello all,
>> I got an alert this morning from one of my X4540's running OpenSolaris
>> snv_105 that it had rebooted. Upon closer inspection, it it had a
>> kernel panic while many machines were connections to it to perform
>> backups (over CIFS).
>> Of note, it says "rw_destroy: lock still active" down below.
>> I had enabled the CIFS oplocks to increase performance (I had some bad
>> performance under some load conditions) but I am aware this is not a
>> tested or recommended setting.
>> May I be seeing some effects of this? I tried all day to reproduce the
>> situation unsuccessfully, but when tomorrows backups run again, I'll
>> see if it happens again.
>>
>> Feb  9 14:47:31 pdxfilu01 unix: [ID 836849 kern.notice]
>> Feb  9 14:47:31 pdxfilu01 ^Mpanic[cpu0]/thread=ffffff007a99cc60:
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 763660 kern.notice] rw_destroy:
>> lock still active, lp=ffffff11aa2760d0 wwwh=ffffff007ae97c64
>> thread=ffffff007a99cc60
>> Feb  9 14:47:31 pdxfilu01 unix: [ID 100000 kern.notice]
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99c990 unix:rw_panic+6f ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99c9b0 unix:rw_destroy+33 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99c9d0 smbsrv:smb_rwx_destroy+2c ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99ca10 smbsrv:smb_node_release+d7 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99ca50 smbsrv:smb_node_release+a9 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99ca70 smbsrv:smb_ofile_delete+98 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99caa0 smbsrv:smb_ofile_release+53 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99cac0 smbsrv:smbsr_disconnect_file+29 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99caf0 smbsrv:smbsr_cleanup+29 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99cb80 smbsrv:smb_dispatch_request+495 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99cbb0 smbsrv:smb_session_worker+3a ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99cc40 genunix:taskq_d_thread+b1 ()
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 655072 kern.notice]
>> ffffff007a99cc50 unix:thread_start+8 ()
>> Feb  9 14:47:31 pdxfilu01 unix: [ID 100000 kern.notice]
>> Feb  9 14:47:31 pdxfilu01 genunix: [ID 672855 kern.notice] syncing
>> file systems...
>> Feb  9 14:47:34 pdxfilu01 genunix: [ID 904073 kern.notice]  done
>> Feb  9 14:47:35 pdxfilu01 genunix: [ID 111219 kern.notice] dumping to
>> /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
>> Feb  9 14:56:18 pdxfilu01 genunix: [ID 409368 kern.notice] ^M100%
>> done: 6271242 pages dumped, compression ratio 3.62,
>> Feb  9 14:56:18 pdxfilu01 genunix: [ID 851671 kern.notice] dump succeeded
>>
>>
>> I'll keep you all posted of my findings.
>>
>>
>
>

Well, I got another dump even with oplocks set back to default, this
time referencing something entirely different:

Feb 11 16:07:59 pdxfilu01 unix: [ID 836849 kern.notice]
Feb 11 16:07:59 pdxfilu01 ^Mpanic[cpu1]/thread=ffffff007abd4c60:
Feb 11 16:07:59 pdxfilu01 genunix: [ID 335743 kern.notice] BAD TRAP:
type=e (#pf Page fault) rp=ffffff007abd4870 addr=0 occurred in module
"ip" due to a NULL pointer dereference
Feb 11 16:07:59 pdxfilu01 unix: [ID 100000 kern.notice]
Feb 11 16:07:59 pdxfilu01 unix: [ID 839527 kern.notice] sched:
Feb 11 16:07:59 pdxfilu01 unix: [ID 753105 kern.notice] #pf Page fault
Feb 11 16:07:59 pdxfilu01 unix: [ID 532287 kern.notice] Bad kernel
fault at addr=0x0
Feb 11 16:07:59 pdxfilu01 unix: [ID 243837 kern.notice] pid=0,
pc=0xfffffffff7a87e7a, sp=0xffffff007abd4960, eflags=0x10286
Feb 11 16:07:59 pdxfilu01 unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
Feb 11 16:07:59 pdxfilu01 unix: [ID 624947 kern.notice] cr2: 0
Feb 11 16:07:59 pdxfilu01 unix: [ID 625075 kern.notice] cr3: 3400000
Feb 11 16:07:59 pdxfilu01 unix: [ID 625715 kern.notice] cr8: c
Feb 11 16:07:59 pdxfilu01 unix: [ID 100000 kern.notice]
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         rdi:
ffffff115aa8b030 rsi:                0 rdx: ffffff1144fa53a8
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         rcx:
             0  r8: ffffff11585b2000  r9:                0
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         rax:
             0 rbx:                0 rbp: ffffff007abd4a00
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         r10:
ffffff116a4d0e00 r11:                0 r12: ffffff114f20b000
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         r13:
ffffff1144fa53a8 r14: ffffff115e708940 r15:                0
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         fsb:
             0 gsb: ffffff1158e98540  ds:               4b
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]          es:
            4b  fs:                0  gs:              1c3
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]         trp:
             e err:                0 rip: fffffffff7a87e7a
Feb 11 16:07:59 pdxfilu01 unix: [ID 592667 kern.notice]          cs:
            30 rfl:            10286 rsp: ffffff007abd4960
Feb 11 16:07:59 pdxfilu01 unix: [ID 266532 kern.notice]          ss:
            38
Feb 11 16:07:59 pdxfilu01 unix: [ID 100000 kern.notice]
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4750 unix:die+dd ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4860 unix:trap+1752 ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4870 unix:_cmntrap+e9 ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4a00 ip:ip_tcp_input+6a ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4bb0 ip:ip_accept_tcp+779 ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4c40 ip:squeue_polling_thread+13f ()
Feb 11 16:07:59 pdxfilu01 genunix: [ID 655072 kern.notice]
ffffff007abd4c50 unix:thread_start+8 ()
Feb 11 16:07:59 pdxfilu01 unix: [ID 100000 kern.notice]
Feb 11 16:07:59 pdxfilu01 genunix: [ID 672855 kern.notice] syncing
file systems...
Feb 11 16:08:01 pdxfilu01 genunix: [ID 904073 kern.notice]  done
Feb 11 16:08:02 pdxfilu01 genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 11 16:16:30 pdxfilu01 genunix: [ID 409368 kern.notice] ^M100%
done: 7891464 pages dumped, compression ratio 3.94,
Feb 11 16:16:30 pdxfilu01 genunix: [ID 851671 kern.notice] dump succeeded


I tried giving rpool/dump a mountpoint to get the data off, but I
guess it doesn't work that way.
Ill try to find some way to pull the dump data and get it to you


-- 
Brent Jones
brent at servuhome.net

Reply via email to