Re: BUG() with SCSI-interfaced disk images

2009-01-08 Thread Ryan Harper
* John Morrissey j...@horde.net [2009-01-07 20:16]:
 On Wed, Jan 07, 2009 at 04:34:50PM -0600, Ryan Harper wrote:
  * John Morrissey j...@horde.net [2009-01-07 15:59]:
   On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
images. I've tried with the Debian packaging of KVM 79 and 82; both
exhibit the same behavior (disclaimer: Debian has about a dozen
patches in their kvm packaging, but they all seem to be changes to the
build/install process or security-related).
   
   Not to be pushy, but does anyone have any ideas on this, or can I
   provide any additional information? I'm afraid I'm a bit over my head
   when debugging kernel internals.
  
  Sorry, I meant to respond.  This is more than likely a SCSI emulation
  error rather than a kernel error.  I've seen the error a couple of
  times, but I don't have a fix for the issue yet as I don't have a
  reliable way to reproduce the error.  If you have an easy way to
  reproduce the bug, I'll see if I can figure out a fix.  
 
 I can reproduce this reliably when fscking a filesystem in a .vmdk I have.
 I can't give you the vmdk or a dump of the filesystem, but I can devote some
 time to troubleshoot this if you can guide me a little. If having the vmdk
 is really important, I might be able to sanitize it enough to send it to you
 (hopefully not making this bug unreproducible in the process).

I don't need the vmdk, but if there is some other repeatable process
that can trigger this for you, getting that will allow me to recreate
the issue myself.  For example, installing Debian into a vmdk, reboot,
and then fsck'ing from inside the vm would trigger it.  Finding some
sort of repeatable process that can trip the bug but without using any
of your specific data would be the best way to move forward with the
bug.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG() with SCSI-interfaced disk images

2009-01-08 Thread John Morrissey
On Thu, Jan 08, 2009 at 08:01:03AM -0600, Ryan Harper wrote:
 I don't need the vmdk, but if there is some other repeatable process
 that can trigger this for you, getting that will allow me to recreate
 the issue myself.  For example, installing Debian into a vmdk, reboot,
 and then fsck'ing from inside the vm would trigger it.  Finding some
 sort of repeatable process that can trip the bug but without using any
 of your specific data would be the best way to move forward with the
 bug.

I can reproduce this when installing Debian lenny i386 (using the lenny rc1
install images from http://www.debian.org/devel/debian-installer/). The
installer will complain of I/O problems when trying to mkfs(8) the
filesystem and will prompt you to retry/ignore. Shortly thereafter, the
domain kernel panics.

Attached is the libvirt configuration; it's pretty straightforward and
translates into this kvm(1) invocation:

/usr/bin/kvm -S -M pc -m 512 -smp 1 -name scsi -monitor pty -boot n \
-drive file=/var/lib/libvirt/images/scsi.qcow,if=scsi,index=0 \
-net nic,macaddr=02:00:00:4d:58:13,vlan=0,model=e1000 \
-net tap,fd=11,script=,vlan=0,ifname=vnet0 \
-net nic,macaddr=02:00:00:23:7f:3d,vlan=1,model=e1000 \
-net tap,fd=13,script=,vlan=1,ifname=vnet1 \
-serial pty -parallel none -usb -vnc 0.0.0.0:0

john
-- 
John Morrissey  _o/\   __o
j...@horde.net_- \_  /  \   \,
www.horde.net/__(_)/_(_)/\___(_) /_(_)__


scsi.xml
Description: XML document


Re: BUG() with SCSI-interfaced disk images

2009-01-07 Thread John Morrissey
On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
 I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
 images. I've tried with the Debian packaging of KVM 79 and 82; both
 exhibit the same behavior (disclaimer: Debian has about a dozen patches in
 their kvm packaging, but they all seem to be changes to the build/install
 process or security-related).

Not to be pushy, but does anyone have any ideas on this, or can I provide
any additional information? I'm afraid I'm a bit over my head when debugging
kernel internals.

john

 IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
 lenny (32-bit/i386) running kernel 2.6.26 (Debian
 linux-image-2.6.26-1-amd64 2.6.26-12).
 
 After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
 filesystem is a reliable trigger), the kernel BUGs (oops output below).
 
 I was previously using KVM 72, and tried upgrading to 79 because both
 Debian lenny and Ubuntu hardy guests were panicing due to sym
 disconnects/timeouts. 79 makes the lenny guest start BUGging as described
 above. 82 is not perceivably different from 79 for the lenny guest.
 
 FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up,
 although it emits:
 
 Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : 
 No Sense [current] 
 Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: 
 No additional sense information
 
 at seemingly random intervals. The upgrade to 82 made the hardy guest
 start BUGging on soft lockups at random intervals (I can provide the full
 output if anyone's interested, but I'm much more interested in the lenny
 guest oops at this point).
 
 john
 
 
 run via libvirt:
 /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
   -boot c -drive file=image.qcow,if=scsi,index=0,boot=on
   -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
   -net tap,fd=17,script=,vlan=0,ifname=vnet2 \
   -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
   -net tap,fd=18,script=,vlan=1,ifname=vnet3 \
   -serial pty -parallel none -usb -vnc 0.0.0.0:1
 
 [The KVMWiki asks whether the problem is reproducible with
  -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the
  above command line by hand (outside of libvirt), the VNC console was
  always blank and there was no console output on the serial pty. If this
  would be useful information to have in this case, I'd love to know what
  I'm doing wrong, or if there's a way to specify additional command line
  arguments with libvirt.]
 
 oops generated in the guest:
 [  140.101828] sym0: unexpected disconnect
 [  140.102748] BUG: unable to handle kernel NULL pointer dereference at 
 0358
 [  140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f
 [  140.106449] *pdpt = 1f5f9001 *pde =  
 [  140.107356] Oops:  [#1] SMP 
 [  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr 
 serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod 
 cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring 
 virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix 
 ide_core thermal processor fan thermal_sys
 [  140.108062] 
 [  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
 [  140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0
 [  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
 [  140.108062] EAX: 000a EBX:  ECX: 1f98c084 EDX: 0030
 [  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
 [  140.108062]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 [  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 
 task.ti=de0f2000)
 [  140.108062] Stack:  000144d6 7f5a222c c011a853 0021d496  
   
 [  140.108062] df98c000 e08e08cd   0001 
  df98c000 
 [  140.108062]0084 e08e3f2f df988c00 0046  df544400 
 0196  
 [  140.108062] Call Trace:
 [  140.108062]  [c011a853] pvclock_clocksource_read+0x4b/0xd0
 [  140.108062]  [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
 [  140.108062]  [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
 [  140.108062]  [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
 [  140.108062]  [c0158e4e] handle_IRQ_event+0x23/0x51
 [  140.108062]  [c0159f4d] handle_fasteoi_irq+0x71/0xa4
 [  140.108062]  [c010afd2] do_IRQ+0x4d/0x63
 [  140.108062]  [c01092a7] common_interrupt+0x23/0x28
 [  140.108062]  [c01300d8] ptrace_request+0x1ec/0x278
 [  140.108062]  [c012d0c6] __do_softirq+0x57/0xd3
 [  140.108062]  [c012d187] do_softirq+0x45/0x53
 [  140.108062]  [c012d43e] irq_exit+0x35/0x67
 [  140.108062]  [c01152b6] smp_apic_timer_interrupt+0x6b/0x75
 [  140.108062]  [c0109364] apic_timer_interrupt+0x28/0x30
 [  140.108062]  

BUG() with SCSI-interfaced disk images

2008-12-26 Thread John Morrissey
I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images.
I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
behavior (disclaimer: Debian has about a dozen patches in their kvm
packaging, but they all seem to be changes to the build/install process or
security-related).

IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
2.6.26-12).

After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
filesystem is a reliable trigger), the kernel BUGs (oops output below).

I was previously using KVM 72, and tried upgrading to 79 because both Debian
lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts.
79 makes the lenny guest start BUGging as described above. 82 is not
perceivably different from 79 for the lenny guest.

FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although
it emits:

Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No 
Sense [current] 
Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No 
additional sense information

at seemingly random intervals. The upgrade to 82 made the hardy guest start
BUGging on soft lockups at random intervals (I can provide the full output
if anyone's interested, but I'm much more interested in the lenny guest
oops at this point).

john


run via libvirt:
/usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
-boot c -drive file=image.qcow,if=scsi,index=0,boot=on
-net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
-net tap,fd=17,script=,vlan=0,ifname=vnet2 \
-net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
-net tap,fd=18,script=,vlan=1,ifname=vnet3 \
-serial pty -parallel none -usb -vnc 0.0.0.0:1

[The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip,
 -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line
 by hand (outside of libvirt), the VNC console was always blank and there
 was no console output on the serial pty. If this would be useful
 information to have in this case, I'd love to know what I'm doing wrong, or
 if there's a way to specify additional command line arguments with
 libvirt.]

oops generated in the guest:
[  140.101828] sym0: unexpected disconnect
[  140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358
[  140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f
[  140.106449] *pdpt = 1f5f9001 *pde =  
[  140.107356] Oops:  [#1] SMP 
[  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw 
i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom 
ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio 
sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core 
thermal processor fan thermal_sys
[  140.108062] 
[  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
[  140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0
[  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
[  140.108062] EAX: 000a EBX:  ECX: 1f98c084 EDX: 0030
[  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
[  140.108062]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 
task.ti=de0f2000)
[  140.108062] Stack:  000144d6 7f5a222c c011a853 0021d496  
  
[  140.108062] df98c000 e08e08cd   0001 
 df98c000 
[  140.108062]0084 e08e3f2f df988c00 0046  df544400 
0196  
[  140.108062] Call Trace:
[  140.108062]  [c011a853] pvclock_clocksource_read+0x4b/0xd0
[  140.108062]  [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
[  140.108062]  [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
[  140.108062]  [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
[  140.108062]  [c0158e4e] handle_IRQ_event+0x23/0x51
[  140.108062]  [c0159f4d] handle_fasteoi_irq+0x71/0xa4
[  140.108062]  [c010afd2] do_IRQ+0x4d/0x63
[  140.108062]  [c01092a7] common_interrupt+0x23/0x28
[  140.108062]  [c01300d8] ptrace_request+0x1ec/0x278
[  140.108062]  [c012d0c6] __do_softirq+0x57/0xd3
[  140.108062]  [c012d187] do_softirq+0x45/0x53
[  140.108062]  [c012d43e] irq_exit+0x35/0x67
[  140.108062]  [c01152b6] smp_apic_timer_interrupt+0x6b/0x75
[  140.108062]  [c0109364] apic_timer_interrupt+0x28/0x30
[  140.108062]  [c02c9953] _spin_unlock_irqrestore+0x7/0x10
[  140.108062]  [e0865a94] scsi_dispatch_cmd+0x197/0x205 [scsi_mod]
[  140.108062]  [e086ab2e] scsi_request_fn+0x264/0x32a [scsi_mod]
[  140.108063]  [c01dcbd6] __generic_unplug_device+0x1a/0x1c
[  140.108063]  [c01dd3e9] __make_request+0x2fe/0x348
[  140.108063]  [c01dc008]