On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote: > I'm encountering a kernel BUG() in guests using SCSI-interfaced disk > images. I've tried with the Debian packaging of KVM 79 and 82; both > exhibit the same behavior (disclaimer: Debian has about a dozen patches in > their kvm packaging, but they all seem to be changes to the build/install > process or security-related).
Not to be pushy, but does anyone have any ideas on this, or can I provide any additional information? I'm afraid I'm a bit over my head when debugging kernel internals. john > IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian > lenny (32-bit/i386) running kernel 2.6.26 (Debian > linux-image-2.6.26-1-amd64 2.6.26-12). > > After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB > filesystem is a reliable trigger), the kernel BUGs (oops output below). > > I was previously using KVM 72, and tried upgrading to 79 because both > Debian lenny and Ubuntu hardy guests were panicing due to sym > disconnects/timeouts. 79 makes the lenny guest start BUGging as described > above. 82 is not perceivably different from 79 for the lenny guest. > > FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, > although it emits: > > Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : > No Sense [current] > Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0 > Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: > No additional sense information > > at seemingly random intervals. The upgrade to 82 made the hardy guest > start BUGging on soft lockups at random intervals (I can provide the full > output if anyone's interested, but I'm much more interested in the lenny > guest oops at this point). > > john > > > run via libvirt: > /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \ > -boot c -drive file=image.qcow,if=scsi,index=0,boot=on > -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \ > -net tap,fd=17,script=,vlan=0,ifname=vnet2 \ > -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \ > -net tap,fd=18,script=,vlan=1,ifname=vnet3 \ > -serial pty -parallel none -usb -vnc 0.0.0.0:1 > > [The KVMWiki asks whether the problem is reproducible with > -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the > above command line by hand (outside of libvirt), the VNC console was > always blank and there was no console output on the serial pty. If this > would be useful information to have in this case, I'd love to know what > I'm doing wrong, or if there's a way to specify additional command line > arguments with libvirt.] > > oops generated in the guest: > [ 140.101828] sym0: unexpected disconnect > [ 140.102748] BUG: unable to handle kernel NULL pointer dereference at > 00000358 > [ 140.103818] IP: [<e08e2670>] :sym53c8xx:sym_int_sir+0x547/0x118f > [ 140.106449] *pdpt = 000000001f5f9001 *pde = 0000000000000000 > [ 140.107356] Oops: 0000 [#1] SMP > [ 140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr > serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod > cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring > virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix > ide_core thermal processor fan thermal_sys > [ 140.108062] > [ 140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1) > [ 140.108062] EIP: 0060:[<e08e2670>] EFLAGS: 00010287 CPU: 0 > [ 140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx] > [ 140.108062] EAX: 0000000a EBX: 00000000 ECX: 1f98c084 EDX: 00000030 > [ 140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0 > [ 140.108062] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 > task.ti=de0f2000) > [ 140.108062] Stack: 00000000 000144d6 7f5a222c c011a853 0021d496 00000000 > 00000000 00000000 > [ 140.108062] 00000000 df98c000 e08e08cd 00000000 00000000 00000001 > 00000000 df98c000 > [ 140.108062] 00000084 e08e3f2f df988c00 00000046 00000000 df544400 > 00000196 00000000 > [ 140.108062] Call Trace: > [ 140.108062] [<c011a853>] pvclock_clocksource_read+0x4b/0xd0 > [ 140.108062] [<e08e08cd>] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx] > [ 140.108062] [<e08e3f2f>] sym_interrupt+0x3ee/0x5fd [sym53c8xx] > [ 140.108062] [<e08df3dc>] sym53c8xx_intr+0x35/0x56 [sym53c8xx] > [ 140.108062] [<c0158e4e>] handle_IRQ_event+0x23/0x51 > [ 140.108062] [<c0159f4d>] handle_fasteoi_irq+0x71/0xa4 > [ 140.108062] [<c010afd2>] do_IRQ+0x4d/0x63 > [ 140.108062] [<c01092a7>] common_interrupt+0x23/0x28 > [ 140.108062] [<c01300d8>] ptrace_request+0x1ec/0x278 > [ 140.108062] [<c012d0c6>] __do_softirq+0x57/0xd3 > [ 140.108062] [<c012d187>] do_softirq+0x45/0x53 > [ 140.108062] [<c012d43e>] irq_exit+0x35/0x67 > [ 140.108062] [<c01152b6>] smp_apic_timer_interrupt+0x6b/0x75 > [ 140.108062] [<c0109364>] apic_timer_interrupt+0x28/0x30 > [ 140.108062] [<c02c9953>] _spin_unlock_irqrestore+0x7/0x10 > [ 140.108062] [<e0865a94>] scsi_dispatch_cmd+0x197/0x205 [scsi_mod] > [ 140.108062] [<e086ab2e>] scsi_request_fn+0x264/0x32a [scsi_mod] > [ 140.108063] [<c01dcbd6>] __generic_unplug_device+0x1a/0x1c > [ 140.108063] [<c01dd3e9>] __make_request+0x2fe/0x348 > [ 140.108063] [<c01dc008>] generic_make_request+0x34d/0x37b > [ 140.108063] [<c015f9f1>] mempool_alloc+0x1c/0xba > [ 140.108063] [<c01dd0e4>] submit_bio+0xc6/0xcd > [ 140.108063] [<c019cdff>] bio_alloc_bioset+0x9b/0xf3 > [ 140.108063] [<c0199983>] submit_bh+0xcf/0xed > [ 140.108063] [<c019b32e>] __block_write_full_page+0x1fa/0x2da > [ 140.108063] [<c019eb73>] blkdev_get_block+0x0/0x43 > [ 140.108063] [<c019b4ef>] block_write_full_page+0xe1/0xea > [ 140.108063] [<c019eb73>] blkdev_get_block+0x0/0x43 > [ 140.108063] [<c01626d5>] __writepage+0x8/0x21 > [ 140.108063] [<c0162b50>] write_cache_pages+0x16a/0x27b > [ 140.108063] [<c01626cd>] __writepage+0x0/0x21 > [ 140.108063] [<c0162c61>] generic_writepages+0x0/0x21 > [ 140.108063] [<c0162c7b>] generic_writepages+0x1a/0x21 > [ 140.108063] [<c0162ca2>] do_writepages+0x20/0x30 > [ 140.108063] [<c0196525>] __writeback_single_inode+0x127/0x251 > [ 140.108063] [<c019691c>] sync_sb_inodes+0x17c/0x233 > [ 140.108063] [<c0196c93>] writeback_inodes+0x53/0x99 > [ 140.108063] [<c01638c1>] pdflush+0x0/0x1cc > [ 140.108063] [<c016357c>] wb_kupdate+0x7b/0xdb > [ 140.108063] [<c01639f0>] pdflush+0x12f/0x1cc > [ 140.108063] [<c0163501>] wb_kupdate+0x0/0xdb > [ 140.108063] [<c0138643>] kthread+0x38/0x5d > [ 140.108063] [<c013860b>] kthread+0x0/0x5d > [ 140.108063] [<c01094f3>] kernel_thread_helper+0x7/0x10 > [ 140.108063] ======================= > [ 140.108063] Code: 93 4c 01 00 00 52 50 68 42 76 8e e0 eb 4e 8d 83 b0 00 00 > 00 e8 32 71 96 df 8d 93 4c 01 00 00 52 50 68 7c 76 8e e0 eb 59 8b 1c 24 <8b> > 93 58 03 00 00 8b 82 84 00 00 00 8b 1a 8b 70 60 85 f6 74 29 > [ 140.108063] EIP: [<e08e2670>] sym_int_sir+0x547/0x118f [sym53c8xx] SS:ESP > 0068:de0f3ba0 > [ 140.162446] Kernel panic - not syncing: Fatal exception in interrupt > > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz > stepping : 6 > cpu MHz : 2500.087 > cache size : 6144 KB > [...] > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm > constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 > ssse3 cx16 xtpr dca sse4_1 lahf_lm > bogomips : 5000.23 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: -- John Morrissey _o /\ ---- __o j...@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html