Hi there. I'm getting kernel panics, and I don't know why.
I have a 6 drive SCSI multipack connected to a LSI Logic / Symbios Logic 53c875
(using the ncrs driver). The box itself is an older Dell 1600SC with 1.GB RAM.
(32 bit xeon). The box, scsi card, and multipack have been rock solid for the
past 7 years.
I installed opensolaris 2008.05 (snv_86) and created a ZFS volume (raid 1+0)
across the 6 drives. When I copy files across the network to the volume, the
machine will eventually (anywhere between 5 minutes and 2 hours) panic.
Interestingly, I have the same model card, another SCSI disk pack, and another
machine (PowerEdge SC440, core2 duo). On this box, I'm also running
opensolaris 2008.05. I get identical panics, whether using the 64 bit (glm?)
driver or the 32 bit ncrs driver.
I upgraded the Dell 1600SC to snv_91 in the hope that the problem would
magically go away. It didn't :-(
I added "set kmem_flags=0xf" to /etc/system & here's the most recent panic:
Jun 26 21:31:03 barcelona genunix: [ID 478202 kern.notice] kernel memory
allocator:
Jun 26 21:31:03 barcelona genunix: [ID 432124 kern.notice] buffer freed to
wrong cache
Jun 26 21:31:03 barcelona genunix: [ID 815666 kern.notice] buffer was allocated
from kmem_alloc_320,
Jun 26 21:31:03 barcelona genunix: [ID 530907 kern.notice] caller attempting
free to kmem_alloc_8.
Jun 26 21:31:03 barcelona genunix: [ID 563406 kern.notice] buffer=e52c7400
bufctl=e5279200 cache: kmem_alloc_8
Jun 26 21:31:03 barcelona genunix: [ID 341866 kern.notice] previous transaction
on buffer e52c7400:
Jun 26 21:31:03 barcelona genunix: [ID 991227 kern.notice] thread=e12e7ce0
time=T-0.013422618 slab=e509c088 cache: k
mem_alloc_320
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice]
kmem_cache_alloc_debug+258
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_cache_alloc+8d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_zalloc+4b
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice]
glm_pkt_alloc_extern+83
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] glm_scsi_init_pkt+129
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] scsi_init_pkt+48
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice]
sd_initpkt_for_uscsi+9e
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_start_cmds+15f
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_core_iostart+158
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_uscsi_strategy+108
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] default_physio+31b
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] physio+1d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice]
scsi_uscsi_handle_cmd+16d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_send_scsi_cmd+13f
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sdioctl+c86
Jun 26 21:31:03 barcelona unix: [ID 836849 kern.notice]
Jun 26 21:31:03 barcelona ^Mpanic[cpu0]/thread=d391cde0:
Jun 26 21:31:03 barcelona genunix: [ID 812275 kern.notice] kernel heap
corruption detected
Jun 26 21:31:03 barcelona unix: [ID 100000 kern.notice]
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc20
genunix:kmem_error+421 (6, d1024398, e52c74)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc5c
genunix:kmem_free+bf (e52c7400, 8)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc78
ncrs:glm_pkt_destroy_extern+60 (d7a77600, e9767388)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc90
ncrs:glm_scsi_destroy_pkt+42 (e97674a8, e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cca8
scsi:scsi_destroy_pkt+16 (e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccc8
sd:sd_destroypkt_for_uscsi+89 (d9365de0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccf4
sd:sd_return_command+124 (d4106a80, d9365de0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd28
sd:sdintr+499 (e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd4c
ncrs:glm_doneq_empty+3b (d7a77600)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd60
ncrs:glm_intr+75 (d7a77600, 0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdac
unix:av_dispatch_autovect+69 (14)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdcc
unix:dispatch_hardint+1a (14, 0)
jwa at barcelona:/var/crash/barcelona# mdb -k unix.8 vmcore.8
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp
scsi_vhci zfs mpt sd ip hook neti sctp arp usba fctl md lofs random sppp crypto
ptm nfs fcip fcp cpc logindmux nsctl ii sdbc ufs rdc nsmb sv ]
> ::status
debugging crash dump vmcore.8 (32-bit) from barcelona
operating system: 5.11 snv_91 (i86pc)
panic message: kernel heap corruption detected
dump content: kernel pages only
> ::panicinfo
cpu 0
thread d391cde0
message kernel heap corruption detected
gs fec301b0
fs fec30000
es fec30160
ds fec30160
edi f
esi e5279200
ebp d391cbd4
esp d391cbc4
ebx e5279264
edx 0
ecx f
eax d391cbe0
trapno 0
err 0
eip fe838350
cs fec30158
eflags 282
uesp 0
ss fec30160
gdt fe7fe00002cf
idt fe7fd00007ff
ldt 0
task 150
cr0 8005003b
cr2 cfe23174
cr3 24c0000
cr4 6d8
> $C
d391cbd4 vpanic(fea67a08)
d391cc20 kmem_error+0x421(6, d1024398, e52c7400)
d391cc5c kmem_free+0xbf(e52c7400, 8)
d391cc78 glm_pkt_destroy_extern+0x60(d7a77600, e9767388)
d391cc90 glm_scsi_destroy_pkt+0x42(e97674a8, e97674a4)
d391cca8 scsi_destroy_pkt+0x16(e97674a4)
d391ccc8 sd_destroypkt_for_uscsi+0x89(d9365de0)
d391ccf4 sd_return_command+0x124(d4106a80, d9365de0)
d391cd28 sdintr+0x499(e97674a4)
d391cd4c glm_doneq_empty+0x3b(d7a77600)
d391cd60 glm_intr+0x75(d7a77600, 0)
d391cdac av_dispatch_autovect+0x69(14)
d391cdcc dispatch_hardint+0x1a(14, 0)
d918bc6c switch_sp_and_call+0xf(d391cddc, fe8196c4, 14, 0)
d918bca8 do_interrupt+0x7c(d918bcb8, f6c57c80)
d918bcb8 _interrupt+0x59()
d918bd38 bcopy+0x13(d42e8b68)
d918bd60 zio_done+0x2a(d42e8b68)
d918bd78 zio_execute+0x66()
d918bdc8 taskq_thread+0x176(d547e388, 0)
d918bdd8 thread_start+8()
jwa at barcelona:/var/crash/barcelona# modinfo | grep ncrs
163 f8c1c000 abb4 75 1 ncrs (NCRS SCSI HBA Driver 1.25)
I've also booted off of the 2008.05 CD and tried to do I/O (mostly tars &
copying large files around); it panics from there, too. So it's not some funny
thing I've done to /etc/system or a /kernel/drv/*.conf file.
Because this is affecting two different machines with two different identical
model SCSI cards, I'm tempted to point the finger at the SCSI driver... but
about two years ago, I put one of these SCSI cards in an older x86 box running
Solaris 10 (01/06 I believe) as well as an Ultra 10 running 06/06 and it
worked w/o panicing.
Another tidbit: sometimes it panics when I run the 'format' command.
Any suggestions?
thanks,
James
This message posted from opensolaris.org