This bug appears to still be relevant; our dual quad-core xeon server (2xE5405) fell victim to it today after an uptime of a week or two with ten active DomU:s (so perhaps 20 `xm create`s). It resulted in a complete freeze for the Dom0 and all DomUs, requiring an emergency visit to the off-site datacenter. It's a critical bug!

Citing syslog from the `xm create` onwards:

Mar 25 15:34:57 hyper logger: /etc/xen/scripts/block: add 
XENBUS_PATH=backend/vbd/11/2049
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/block: add 
XENBUS_PATH=backend/vbd/11/2050
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/block: add 
XENBUS_PATH=backend/vbd/11/2065
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/vif-bridge: online 
XENBUS_PATH=backend/vif/11/0
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/vif-bridge: online 
XENBUS_PATH=backend/vif/11/1
Mar 25 15:34:57 hyper kernel: device vif11.0 entered promiscuous mode
Mar 25 15:34:57 hyper kernel: audit(1206452097.949:30): dev=vif11.0 prom=256 
old_prom=0 auid=4294967295
Mar 25 15:34:57 hyper kernel: ADDRCONF(NETDEV_UP): vif11.0: link is not ready
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/vif-bridge: Successful 
vif-bridge online for vif11.0, bridge xenbr0.
Mar 25 15:34:57 hyper logger: /etc/xen/scripts/vif-bridge: Writing 
backend/vif/11/0/hotplug-status connected to xenstore.
Mar 25 15:34:58 hyper kernel: device vif11.1 entered promiscuous mode
Mar 25 15:34:58 hyper kernel: audit(1206452098.029:31): dev=vif11.1 prom=256 
old_prom=0 auid=4294967295
Mar 25 15:34:58 hyper kernel: ADDRCONF(NETDEV_UP): vif11.1: link is not ready
Mar 25 15:34:58 hyper logger: /etc/xen/scripts/vif-bridge: Successful 
vif-bridge online for vif11.1, bridge br0.
Mar 25 15:34:58 hyper logger: /etc/xen/scripts/vif-bridge: Writing 
backend/vif/11/1/hotplug-status connected to xenstore.
Mar 25 15:34:58 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2049/physical-device fd:6 to xenstore.
Mar 25 15:34:58 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2049/hotplug-status connected to xenstore.
Mar 25 15:34:59 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2050/physical-device fd:7 to xenstore.
Mar 25 15:34:59 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2050/hotplug-status connected to xenstore.
Mar 25 15:35:00 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2065/physical-device fd:15 to xenstore.
Mar 25 15:35:00 hyper logger: /etc/xen/scripts/block: Writing 
backend/vbd/11/2065/hotplug-status connected to xenstore.
Mar 25 15:35:02 hyper kernel: ADDRCONF(NETDEV_CHANGE): vif11.0: link becomes 
ready
Mar 25 15:35:02 hyper kernel: ----------- [cut here ] --------- [please bite 
here ] ---------
Mar 25 15:35:02 hyper kernel: Kernel BUG at drivers/xen/core/evtchn.c:481
Mar 25 15:35:02 hyper kernel: invalid opcode: 0000 [1] SMP Mar 25 15:35:02 hyper kernel: CPU 7 Mar 25 15:35:02 hyper kernel: Modules linked in: xt_physdev netloop xt_tcpudp xt_state ip_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 bridge loop shpchp serial_core pci_hotplug i2c_i801 psmouse serio_raw i2c_core pcspkr evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generic usb_storage sd_mod ata_piix libata scsi_mod generic ide_core ehci_hcd uhci_hcd e1000 fan
Mar 25 15:35:02 hyper kernel: Pid: 37, comm: xenwatch Not tainted 
2.6.18-6-xen-amd64 #1
Mar 25 15:35:02 hyper kernel: RIP: e030:[<ffffffff80360fe1>]  
[<ffffffff80360fe1>] retrigger+0x26/0x3e
Mar 25 15:35:02 hyper kernel: RSP: e02b:ffff8801ee32fd88  EFLAGS: 00010046
Mar 25 15:35:02 hyper kernel: RAX: 0000000000000000 RBX: 000000000000a080 RCX: 
ffffffffff578000
Mar 25 15:35:02 hyper kernel: RDX: 000000000000005f RSI: ffff8801ee32fd30 RDI: 
0000000000000141
Mar 25 15:35:02 hyper kernel: RBP: ffffffff804ce500 R08: 00000000000000f8 R09: 
ffff8801ee2f2880
Mar 25 15:35:02 hyper kernel: R10: ffff88002ddf26c0 R11: ffffffff80360fbb R12: 
0000000000000141
Mar 25 15:35:02 hyper kernel: R13: ffffffff804ce53c R14: 0000000000000000 R15: 
000000000000001f
Mar 25 15:35:02 hyper kernel: FS:  00002b69e31f36d0(0000) 
GS:ffffffff804c3380(0000) knlGS:0000000000000000
Mar 25 15:35:02 hyper kernel: CS:  e033 DS: 0000 ES: 0000
Mar 25 15:35:02 hyper kernel: Process xenwatch (pid: 37, threadinfo 
ffff8801ee32e000, task ffff8801ee324080)
Mar 25 15:35:02 hyper kernel: Stack: ffffffff802a0679 ffff8801eae08500 ffff8801eae08500 0000000000000000 Mar 25 15:35:02 hyper kernel: ffff8801ee32fde0 000000000000040e ffffffff8036db4c 0000000000000000 Mar 25 15:35:02 hyper kernel: ffffffff8036dfc4 ffff8801ee32fea4 Mar 25 15:35:02 hyper kernel: Call Trace:
Mar 25 15:35:02 hyper kernel:  [<ffffffff802a0679>] enable_irq+0x9d/0xbc
Mar 25 15:35:02 hyper kernel:  [<ffffffff8036db4c>] __netif_up+0xc/0x15
Mar 25 15:35:02 hyper kernel:  [<ffffffff8036dfc4>] netif_map+0x2a6/0x2d8
Mar 25 15:35:02 hyper kernel:  [<ffffffff8035c325>] bus_for_each_dev+0x61/0x6e
Mar 25 15:35:02 hyper kernel:  [<ffffffff803667ce>] xenwatch_thread+0x0/0x145
Mar 25 15:35:02 hyper kernel:  [<ffffffff803667ce>] xenwatch_thread+0x0/0x145
Mar 25 15:35:02 hyper kernel:  [<ffffffff8036830e>] frontend_changed+0x2ba/0x4f9
Mar 25 15:35:02 hyper kernel:  [<ffffffff803667ce>] xenwatch_thread+0x0/0x145
Mar 25 15:35:02 hyper kernel:  [<ffffffff8028f865>] 
keventd_create_kthread+0x0/0x61
Mar 25 15:35:02 hyper kernel:  [<ffffffff80365bdc>] 
xenwatch_handle_callback+0x15/0x48
Mar 25 15:35:02 hyper kernel:  [<ffffffff803668fb>] xenwatch_thread+0x12d/0x145
Mar 25 15:35:02 hyper kernel:  [<ffffffff8028fa28>] 
autoremove_wake_function+0x0/0x2e
Mar 25 15:35:02 hyper kernel:  [<ffffffff8028f865>] 
keventd_create_kthread+0x0/0x61
Mar 25 15:35:02 hyper kernel:  [<ffffffff803667ce>] xenwatch_thread+0x0/0x145
Mar 25 15:35:02 hyper kernel:  [<ffffffff8023352b>] kthread+0xd4/0x107
Mar 25 15:35:02 hyper kernel:  [<ffffffff8025c830>] child_rip+0xa/0x12
Mar 25 15:35:02 hyper kernel:  [<ffffffff8028f865>] 
keventd_create_kthread+0x0/0x61
Mar 25 15:35:02 hyper kernel:  [<ffffffff80233457>] kthread+0x0/0x107
Mar 25 15:35:02 hyper kernel:  [<ffffffff8025c826>] child_rip+0x0/0x12
Mar 25 15:35:02 hyper kernel: Mar 25 15:35:02 hyper kernel: Mar 25 15:35:02 hyper kernel: Code: 0f 0b 68 74 db 41 80 c2 e1 01 f0 0f ab 91 00 08 00 00 b8 01 Mar 25 15:35:02 hyper kernel: RIP [<ffffffff80360fe1>] retrigger+0x26/0x3e
Mar 25 15:35:02 hyper kernel:  RSP <ffff8801ee32fd88>
Mar 25 16:53:02 hyper syslogd 1.4.1#18: restart.
Mar 25 16:53:02 hyper kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
Mar 25 16:53:02 hyper kernel: Bootdata ok (command line is 
root=/dev/mapper/fixme-hyper ro console=tty0)
Mar 25 16:53:02 hyper kernel: Linux version 2.6.18-6-xen-amd64 (Debian 
2.6.18.dfsg.1-18etch1) ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 SMP Sun Feb 10 18:02:52 UTC 2008


This stack trace is more or less identical with what was shown over a year ago.

We've successfully implemented the workaround mentioned earlier (dom0-cpus 1 -> cpuinfo and vcpu-list only show one CPU active for Domain-0), but haven't had an opportunity to stress-test the system to see if this solves the issue.

Dom0 is running on a debian 4.0 host with xen-hypervisor-3.2-1-amd64 version 3.2.0-3~bpo4+1 and linux-image-2.6.18-6-xen-amd64 version 2.6.18.dfsg.1-18etch1 installed via etch-backports.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to