Re: BUG() with SCSI-interfaced disk images
* John Morrissey j...@horde.net [2009-01-07 20:16]: On Wed, Jan 07, 2009 at 04:34:50PM -0600, Ryan Harper wrote: * John Morrissey j...@horde.net [2009-01-07 15:59]: On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote: I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images. I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same behavior (disclaimer: Debian has about a dozen patches in their kvm packaging, but they all seem to be changes to the build/install process or security-related). Not to be pushy, but does anyone have any ideas on this, or can I provide any additional information? I'm afraid I'm a bit over my head when debugging kernel internals. Sorry, I meant to respond. This is more than likely a SCSI emulation error rather than a kernel error. I've seen the error a couple of times, but I don't have a fix for the issue yet as I don't have a reliable way to reproduce the error. If you have an easy way to reproduce the bug, I'll see if I can figure out a fix. I can reproduce this reliably when fscking a filesystem in a .vmdk I have. I can't give you the vmdk or a dump of the filesystem, but I can devote some time to troubleshoot this if you can guide me a little. If having the vmdk is really important, I might be able to sanitize it enough to send it to you (hopefully not making this bug unreproducible in the process). I don't need the vmdk, but if there is some other repeatable process that can trigger this for you, getting that will allow me to recreate the issue myself. For example, installing Debian into a vmdk, reboot, and then fsck'ing from inside the vm would trigger it. Finding some sort of repeatable process that can trip the bug but without using any of your specific data would be the best way to move forward with the bug. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG() with SCSI-interfaced disk images
On Thu, Jan 08, 2009 at 08:01:03AM -0600, Ryan Harper wrote: I don't need the vmdk, but if there is some other repeatable process that can trigger this for you, getting that will allow me to recreate the issue myself. For example, installing Debian into a vmdk, reboot, and then fsck'ing from inside the vm would trigger it. Finding some sort of repeatable process that can trip the bug but without using any of your specific data would be the best way to move forward with the bug. I can reproduce this when installing Debian lenny i386 (using the lenny rc1 install images from http://www.debian.org/devel/debian-installer/). The installer will complain of I/O problems when trying to mkfs(8) the filesystem and will prompt you to retry/ignore. Shortly thereafter, the domain kernel panics. Attached is the libvirt configuration; it's pretty straightforward and translates into this kvm(1) invocation: /usr/bin/kvm -S -M pc -m 512 -smp 1 -name scsi -monitor pty -boot n \ -drive file=/var/lib/libvirt/images/scsi.qcow,if=scsi,index=0 \ -net nic,macaddr=02:00:00:4d:58:13,vlan=0,model=e1000 \ -net tap,fd=11,script=,vlan=0,ifname=vnet0 \ -net nic,macaddr=02:00:00:23:7f:3d,vlan=1,model=e1000 \ -net tap,fd=13,script=,vlan=1,ifname=vnet1 \ -serial pty -parallel none -usb -vnc 0.0.0.0:0 john -- John Morrissey _o/\ __o j...@horde.net_- \_ / \ \, www.horde.net/__(_)/_(_)/\___(_) /_(_)__ scsi.xml Description: XML document
Re: BUG() with SCSI-interfaced disk images
On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote: I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images. I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same behavior (disclaimer: Debian has about a dozen patches in their kvm packaging, but they all seem to be changes to the build/install process or security-related). Not to be pushy, but does anyone have any ideas on this, or can I provide any additional information? I'm afraid I'm a bit over my head when debugging kernel internals. john IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64 2.6.26-12). After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB filesystem is a reliable trigger), the kernel BUGs (oops output below). I was previously using KVM 72, and tried upgrading to 79 because both Debian lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts. 79 makes the lenny guest start BUGging as described above. 82 is not perceivably different from 79 for the lenny guest. FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although it emits: Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information at seemingly random intervals. The upgrade to 82 made the hardy guest start BUGging on soft lockups at random intervals (I can provide the full output if anyone's interested, but I'm much more interested in the lenny guest oops at this point). john run via libvirt: /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \ -boot c -drive file=image.qcow,if=scsi,index=0,boot=on -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \ -net tap,fd=17,script=,vlan=0,ifname=vnet2 \ -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \ -net tap,fd=18,script=,vlan=1,ifname=vnet3 \ -serial pty -parallel none -usb -vnc 0.0.0.0:1 [The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line by hand (outside of libvirt), the VNC console was always blank and there was no console output on the serial pty. If this would be useful information to have in this case, I'd love to know what I'm doing wrong, or if there's a way to specify additional command line arguments with libvirt.] oops generated in the guest: [ 140.101828] sym0: unexpected disconnect [ 140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358 [ 140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f [ 140.106449] *pdpt = 1f5f9001 *pde = [ 140.107356] Oops: [#1] SMP [ 140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys [ 140.108062] [ 140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1) [ 140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0 [ 140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx] [ 140.108062] EAX: 000a EBX: ECX: 1f98c084 EDX: 0030 [ 140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0 [ 140.108062] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [ 140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000) [ 140.108062] Stack: 000144d6 7f5a222c c011a853 0021d496 [ 140.108062] df98c000 e08e08cd 0001 df98c000 [ 140.108062]0084 e08e3f2f df988c00 0046 df544400 0196 [ 140.108062] Call Trace: [ 140.108062] [c011a853] pvclock_clocksource_read+0x4b/0xd0 [ 140.108062] [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx] [ 140.108062] [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx] [ 140.108062] [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx] [ 140.108062] [c0158e4e] handle_IRQ_event+0x23/0x51 [ 140.108062] [c0159f4d] handle_fasteoi_irq+0x71/0xa4 [ 140.108062] [c010afd2] do_IRQ+0x4d/0x63 [ 140.108062] [c01092a7] common_interrupt+0x23/0x28 [ 140.108062] [c01300d8] ptrace_request+0x1ec/0x278 [ 140.108062] [c012d0c6] __do_softirq+0x57/0xd3 [ 140.108062] [c012d187] do_softirq+0x45/0x53 [ 140.108062] [c012d43e] irq_exit+0x35/0x67 [ 140.108062] [c01152b6] smp_apic_timer_interrupt+0x6b/0x75 [ 140.108062] [c0109364] apic_timer_interrupt+0x28/0x30 [ 140.108062]
BUG() with SCSI-interfaced disk images
I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images. I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same behavior (disclaimer: Debian has about a dozen patches in their kvm packaging, but they all seem to be changes to the build/install process or security-related). IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64 2.6.26-12). After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB filesystem is a reliable trigger), the kernel BUGs (oops output below). I was previously using KVM 72, and tried upgrading to 79 because both Debian lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts. 79 makes the lenny guest start BUGging as described above. 82 is not perceivably different from 79 for the lenny guest. FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although it emits: Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information at seemingly random intervals. The upgrade to 82 made the hardy guest start BUGging on soft lockups at random intervals (I can provide the full output if anyone's interested, but I'm much more interested in the lenny guest oops at this point). john run via libvirt: /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \ -boot c -drive file=image.qcow,if=scsi,index=0,boot=on -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \ -net tap,fd=17,script=,vlan=0,ifname=vnet2 \ -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \ -net tap,fd=18,script=,vlan=1,ifname=vnet3 \ -serial pty -parallel none -usb -vnc 0.0.0.0:1 [The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line by hand (outside of libvirt), the VNC console was always blank and there was no console output on the serial pty. If this would be useful information to have in this case, I'd love to know what I'm doing wrong, or if there's a way to specify additional command line arguments with libvirt.] oops generated in the guest: [ 140.101828] sym0: unexpected disconnect [ 140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358 [ 140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f [ 140.106449] *pdpt = 1f5f9001 *pde = [ 140.107356] Oops: [#1] SMP [ 140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys [ 140.108062] [ 140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1) [ 140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0 [ 140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx] [ 140.108062] EAX: 000a EBX: ECX: 1f98c084 EDX: 0030 [ 140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0 [ 140.108062] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [ 140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000) [ 140.108062] Stack: 000144d6 7f5a222c c011a853 0021d496 [ 140.108062] df98c000 e08e08cd 0001 df98c000 [ 140.108062]0084 e08e3f2f df988c00 0046 df544400 0196 [ 140.108062] Call Trace: [ 140.108062] [c011a853] pvclock_clocksource_read+0x4b/0xd0 [ 140.108062] [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx] [ 140.108062] [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx] [ 140.108062] [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx] [ 140.108062] [c0158e4e] handle_IRQ_event+0x23/0x51 [ 140.108062] [c0159f4d] handle_fasteoi_irq+0x71/0xa4 [ 140.108062] [c010afd2] do_IRQ+0x4d/0x63 [ 140.108062] [c01092a7] common_interrupt+0x23/0x28 [ 140.108062] [c01300d8] ptrace_request+0x1ec/0x278 [ 140.108062] [c012d0c6] __do_softirq+0x57/0xd3 [ 140.108062] [c012d187] do_softirq+0x45/0x53 [ 140.108062] [c012d43e] irq_exit+0x35/0x67 [ 140.108062] [c01152b6] smp_apic_timer_interrupt+0x6b/0x75 [ 140.108062] [c0109364] apic_timer_interrupt+0x28/0x30 [ 140.108062] [c02c9953] _spin_unlock_irqrestore+0x7/0x10 [ 140.108062] [e0865a94] scsi_dispatch_cmd+0x197/0x205 [scsi_mod] [ 140.108062] [e086ab2e] scsi_request_fn+0x264/0x32a [scsi_mod] [ 140.108063] [c01dcbd6] __generic_unplug_device+0x1a/0x1c [ 140.108063] [c01dd3e9] __make_request+0x2fe/0x348 [ 140.108063] [c01dc008]