Re: Networking-related crash?
On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy wrote: > Adam Huffman wrote: >> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy wrote: >>> Eric Dumazet wrote: Le 09/12/2009 16:11, Avi Kivity a écrit : > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 00200200 >> IP: [] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> RIP: 0010:[] [] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:c9803bf0 EFLAGS: 00010202 >> RAX: 8001 RBX: 816fb1a0 RCX: 752f >> RDX: 00200200 RSI: 0011 RDI: 816fb1a0 >> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 >> R10: 0002f54189d5 R11: 0001 R12: 819a92e0 >> R13: a029adcc R14: R15: 880632866c38 >> FS: 7fdd34b17710() GS:c980() >> knlGS: >> CS: 0010 DS: 002B ES: 002B CR0: 80050033 >> CR2: 00200200 CR3: 0003349c CR4: 26e0 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task >> 880634945e00) >> Stack: >> 880632866c00 880634640c30 c9803c10 813989c2 >> <0> c9803c30 81374092 c9803c30 880632866c00 >> <0> c9803c50 81373dd3 0002 880632866c00 >> Call Trace: >> >> [] nf_conntrack_destroy+0x1b/0x1d >> [] skb_release_head_state+0x95/0xd7 >> [] __kfree_skb+0x16/0x81 >> [] kfree_skb+0x6a/0x72 >> [] ip6_mc_input+0x220/0x230 [ipv6] >> [] ip6_rcv_finish+0x27/0x2b [ipv6] >> [] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [] netif_receive_skb+0x402/0x427 >> ... >> crash in : 48 8b 43 08 mov 0x8(%rbx),%rax a8 01 test $0x1,%al 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) 75 04 jne 1f 48 89 50 08 mov %rdx,0x8(%rax) 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) if (!nf_ct_is_confirmed(ct)) { BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> } NF_CT_STAT_INC(net, delete); >>> >>> I can't spot the problem. Adam, please send me your .config file. >>> >>> >> >> It's the standard Fedora .config, which is attached. >> >> As I stated in another message, the oops seems related to VT-d. With >> that disabled, the machine has been stable for nearly a day now. > > That probably only affects the timing of some race. Please also > send me the IPv6 ruleset used on that machine. Thanks. > Just to note that if I disable IPv6 completely, the machine is stable - certainly compared with the crashes after a few minutes when IPv6 is enabled. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy wrote: > Adam Huffman wrote: >> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy wrote: >>> Eric Dumazet wrote: Le 09/12/2009 16:11, Avi Kivity a écrit : > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 00200200 >> IP: [] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> RIP: 0010:[] [] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:c9803bf0 EFLAGS: 00010202 >> RAX: 8001 RBX: 816fb1a0 RCX: 752f >> RDX: 00200200 RSI: 0011 RDI: 816fb1a0 >> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 >> R10: 0002f54189d5 R11: 0001 R12: 819a92e0 >> R13: a029adcc R14: R15: 880632866c38 >> FS: 7fdd34b17710() GS:c980() >> knlGS: >> CS: 0010 DS: 002B ES: 002B CR0: 80050033 >> CR2: 00200200 CR3: 0003349c CR4: 26e0 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task >> 880634945e00) >> Stack: >> 880632866c00 880634640c30 c9803c10 813989c2 >> <0> c9803c30 81374092 c9803c30 880632866c00 >> <0> c9803c50 81373dd3 0002 880632866c00 >> Call Trace: >> >> [] nf_conntrack_destroy+0x1b/0x1d >> [] skb_release_head_state+0x95/0xd7 >> [] __kfree_skb+0x16/0x81 >> [] kfree_skb+0x6a/0x72 >> [] ip6_mc_input+0x220/0x230 [ipv6] >> [] ip6_rcv_finish+0x27/0x2b [ipv6] >> [] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [] netif_receive_skb+0x402/0x427 >> ... >> crash in : 48 8b 43 08 mov 0x8(%rbx),%rax a8 01 test $0x1,%al 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) 75 04 jne 1f 48 89 50 08 mov %rdx,0x8(%rax) 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) if (!nf_ct_is_confirmed(ct)) { BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> } NF_CT_STAT_INC(net, delete); >>> >>> I can't spot the problem. Adam, please send me your .config file. >>> >>> >> >> It's the standard Fedora .config, which is attached. >> >> As I stated in another message, the oops seems related to VT-d. With >> that disabled, the machine has been stable for nearly a day now. > > That probably only affects the timing of some race. Please also > send me the IPv6 ruleset used on that machine. Thanks. > Again, it's the Fedora 12 default. Here you go: *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p ipv6-icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -j REJECT --reject-with icmp6-adm-prohibited -A FORWARD -j REJECT --reject-with icmp6-adm-prohibited COMMIT -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
Adam Huffman wrote: > On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy wrote: >> Eric Dumazet wrote: >>> Le 09/12/2009 16:11, Avi Kivity a écrit : On 12/09/2009 03:46 PM, Adam Huffman wrote: > I've been seeing lots of crashes on a new Dell Precision T7500, > running the KVM in Fedora 12. Finally managed to capture an Oops, > which is shown below (hand-transcribed): > > BUG: unable to handle kernel paging request at 00200200 > IP: [] destroy_conntrack+0x82/0x11f > PGD 332d0e067 PUD 33453c067 PMD 0 > RIP: 0010:[] [] > destroy_conntrack+0x82/0x11f > RSP: 0018:c9803bf0 EFLAGS: 00010202 > RAX: 8001 RBX: 816fb1a0 RCX: 752f > RDX: 00200200 RSI: 0011 RDI: 816fb1a0 > RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 > R10: 0002f54189d5 R11: 0001 R12: 819a92e0 > R13: a029adcc R14: R15: 880632866c38 > FS: 7fdd34b17710() GS:c980() > knlGS: > CS: 0010 DS: 002B ES: 002B CR0: 80050033 > CR2: 00200200 CR3: 0003349c CR4: 26e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task > 880634945e00) > Stack: > 880632866c00 880634640c30 c9803c10 813989c2 > <0> c9803c30 81374092 c9803c30 880632866c00 > <0> c9803c50 81373dd3 0002 880632866c00 > Call Trace: > > [] nf_conntrack_destroy+0x1b/0x1d > [] skb_release_head_state+0x95/0xd7 > [] __kfree_skb+0x16/0x81 > [] kfree_skb+0x6a/0x72 > [] ip6_mc_input+0x220/0x230 [ipv6] > [] ip6_rcv_finish+0x27/0x2b [ipv6] > [] ipv6_rcv+0x38e/0x3e5 [ipv6] > [] netif_receive_skb+0x402/0x427 > ... > >>> crash in : >>> 48 8b 43 08 mov0x8(%rbx),%rax >>> a8 01 test $0x1,%al >>> 48 89 02mov%rax,(%rdx) << HERE >> RDX=0x200200 >>> (LIST_POISON2) >>> 75 04 jne1f >>> 48 89 50 08 mov%rdx,0x8(%rax) >>> 1:48 c7 43 10 00 02 20movq $0x200200,0x10(%rbx) >>> >>> if (!nf_ct_is_confirmed(ct)) { >>> >>> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); >>> >>> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> >>> } >>> NF_CT_STAT_INC(net, delete); >> >> I can't spot the problem. Adam, please send me your .config file. >> >> > > It's the standard Fedora .config, which is attached. > > As I stated in another message, the oops seems related to VT-d. With > that disabled, the machine has been stable for nearly a day now. That probably only affects the timing of some race. Please also send me the IPv6 ruleset used on that machine. Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
Eric Dumazet wrote: > Le 09/12/2009 16:11, Avi Kivity a écrit : >> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>> I've been seeing lots of crashes on a new Dell Precision T7500, >>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>> which is shown below (hand-transcribed): >>> >>> BUG: unable to handle kernel paging request at 00200200 >>> IP: [] destroy_conntrack+0x82/0x11f >>> PGD 332d0e067 PUD 33453c067 PMD 0 >>> RIP: 0010:[] [] >>> destroy_conntrack+0x82/0x11f >>> RSP: 0018:c9803bf0 EFLAGS: 00010202 >>> RAX: 8001 RBX: 816fb1a0 RCX: 752f >>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0 >>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 >>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0 >>> R13: a029adcc R14: R15: 880632866c38 >>> FS: 7fdd34b17710() GS:c980() >>> knlGS: >>> CS: 0010 DS: 002B ES: 002B CR0: 80050033 >>> CR2: 00200200 CR3: 0003349c CR4: 26e0 >>> DR0: DR1: DR2: >>> DR3: DR6: 0ff0 DR7: 0400 >>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task >>> 880634945e00) >>> Stack: >>> 880632866c00 880634640c30 c9803c10 813989c2 >>> <0> c9803c30 81374092 c9803c30 880632866c00 >>> <0> c9803c50 81373dd3 0002 880632866c00 >>> Call Trace: >>> >>> [] nf_conntrack_destroy+0x1b/0x1d >>> [] skb_release_head_state+0x95/0xd7 >>> [] __kfree_skb+0x16/0x81 >>> [] kfree_skb+0x6a/0x72 >>> [] ip6_mc_input+0x220/0x230 [ipv6] >>> [] ip6_rcv_finish+0x27/0x2b [ipv6] >>> [] ipv6_rcv+0x38e/0x3e5 [ipv6] >>> [] netif_receive_skb+0x402/0x427 >>> ... >>> > crash in : > 48 8b 43 08 mov0x8(%rbx),%rax > a8 01 test $0x1,%al > 48 89 02mov%rax,(%rdx) << HERE >> RDX=0x200200 > (LIST_POISON2) > 75 04 jne1f > 48 89 50 08 mov%rdx,0x8(%rax) > 1:48 c7 43 10 00 02 20movq $0x200200,0x10(%rbx) > > if (!nf_ct_is_confirmed(ct)) { > > BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); > << HERE >> > } > NF_CT_STAT_INC(net, delete); I can't spot the problem. Adam, please send me your .config file. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
Le 09/12/2009 16:11, Avi Kivity a écrit : > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 00200200 >> IP: [] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map >> CPU 4 >> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE >> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 >> ip6table_filter ip6 >> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog >> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep >> snd_seq drm sn >> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev >> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core >> soundcore parport >> iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas >> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: >> speedstep_lib] >> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 >> Precision WorkStation T7500 >> RIP: 0010:[] [] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:c9803bf0 EFLAGS: 00010202 >> RAX: 8001 RBX: 816fb1a0 RCX: 752f >> RDX: 00200200 RSI: 0011 RDI: 816fb1a0 >> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 >> R10: 0002f54189d5 R11: 0001 R12: 819a92e0 >> R13: a029adcc R14: R15: 880632866c38 >> FS: 7fdd34b17710() GS:c980() >> knlGS: >> CS: 0010 DS: 002B ES: 002B CR0: 80050033 >> CR2: 00200200 CR3: 0003349c CR4: 26e0 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task >> 880634945e00) >> Stack: >> 880632866c00 880634640c30 c9803c10 813989c2 >> <0> c9803c30 81374092 c9803c30 880632866c00 >> <0> c9803c50 81373dd3 0002 880632866c00 >> Call Trace: >> >> [] nf_conntrack_destroy+0x1b/0x1d >> [] skb_release_head_state+0x95/0xd7 >> [] __kfree_skb+0x16/0x81 >> [] kfree_skb+0x6a/0x72 >> [] ip6_mc_input+0x220/0x230 [ipv6] >> [] ip6_rcv_finish+0x27/0x2b [ipv6] >> [] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [] netif_receive_skb+0x402/0x427 >> [] napi_skb_finish+0x29/0x3d >> [] napi_gro_receive+0x2f/0x34 >> [] tg3_poll+0x6c6/0x8c3 [tg3] >> [] net_rx_action+0xaf/0x1c9 >> [] ? list-add_tail+0x15/0x17 >> [] __do_softirq+0xdd/0x1ad >> [] ? apic_write+0x16/0x18 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x47/0x8d >> [] irq_exit+0x44/0x86 >> [] do_IRQ+0xa5/0xbc >> [] ret_from_intr+0x0/0x11 >> >> [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] >> [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] >> [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] >> [] ? vfs_ioctl+0x22/0x87 >> [] ? do_vfs_ioctl+0x47b/0x4c1 >> [] ? sys_ioctl+0x56/0x79 >> [] ? stub_clone+0x13/0x20 >> [] ? system_call_fastpath+0x16/0x1b >> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 >> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> >> 89 02 7 >> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 >> RIP [] destroy_conntrack+0x82/0x11f >> RSP >> CR2: 00200200 >> > > Looks unrelated to kvm - softirq happened to trigger during a kvm > ioctl. Fault looks like list poison. Copying netdev. > crash in : 48 8b 43 08 mov0x8(%rbx),%rax a8 01 test $0x1,%al 48 89 02mov%rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) 75 04 jne1f 48 89 50 08 mov%rdx,0x8(%rax) 1: 48 c7 43 10 00 02 20movq $0x200200,0x10(%rbx) if (!nf_ct_is_confirmed(ct)) { BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> } NF_CT_STAT_INC(net, delete); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
On Wed, Dec 9, 2009 at 3:11 PM, Avi Kivity wrote: > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 00200200 >> IP: [] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map >> CPU 4 >> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE >> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 >> ip6table_filter ip6 >> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog >> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep >> snd_seq drm sn >> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev >> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core >> soundcore parport >> iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas >> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: >> speedstep_lib] >> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 >> Precision WorkStation T7500 >> RIP: 0010:[] [] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:c9803bf0 EFLAGS: 00010202 >> RAX: 8001 RBX: 816fb1a0 RCX: 752f >> RDX: 00200200 RSI: 0011 RDI: 816fb1a0 >> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 >> R10: 0002f54189d5 R11: 0001 R12: 819a92e0 >> R13: a029adcc R14: R15: 880632866c38 >> FS: 7fdd34b17710() GS:c980() >> knlGS: >> CS: 0010 DS: 002B ES: 002B CR0: 80050033 >> CR2: 00200200 CR3: 0003349c CR4: 26e0 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task >> 880634945e00) >> Stack: >> 880632866c00 880634640c30 c9803c10 813989c2 >> <0> c9803c30 81374092 c9803c30 880632866c00 >> <0> c9803c50 81373dd3 0002 880632866c00 >> Call Trace: >> >> [] nf_conntrack_destroy+0x1b/0x1d >> [] skb_release_head_state+0x95/0xd7 >> [] __kfree_skb+0x16/0x81 >> [] kfree_skb+0x6a/0x72 >> [] ip6_mc_input+0x220/0x230 [ipv6] >> [] ip6_rcv_finish+0x27/0x2b [ipv6] >> [] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [] netif_receive_skb+0x402/0x427 >> [] napi_skb_finish+0x29/0x3d >> [] napi_gro_receive+0x2f/0x34 >> [] tg3_poll+0x6c6/0x8c3 [tg3] >> [] net_rx_action+0xaf/0x1c9 >> [] ? list-add_tail+0x15/0x17 >> [] __do_softirq+0xdd/0x1ad >> [] ? apic_write+0x16/0x18 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x47/0x8d >> [] irq_exit+0x44/0x86 >> [] do_IRQ+0xa5/0xbc >> [] ret_from_intr+0x0/0x11 >> >> [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] >> [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] >> [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] >> [] ? vfs_ioctl+0x22/0x87 >> [] ? do_vfs_ioctl+0x47b/0x4c1 >> [] ? sys_ioctl+0x56/0x79 >> [] ? stub_clone+0x13/0x20 >> [] ? system_call_fastpath+0x16/0x1b >> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 >> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> >> 89 02 7 >> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 >> RIP [] destroy_conntrack+0x82/0x11f >> RSP >> CR2: 00200200 >> > > Looks unrelated to kvm - softirq happened to trigger during a kvm ioctl. > Fault looks like list poison. Copying netdev. > Disabling VT-d support in the BIOS seems to have stopped the crashes. At least it's been running without crashing for several hours now, while it would only last minutes before. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networking-related crash?
On 12/09/2009 03:46 PM, Adam Huffman wrote: I've been seeing lots of crashes on a new Dell Precision T7500, running the KVM in Fedora 12. Finally managed to capture an Oops, which is shown below (hand-transcribed): BUG: unable to handle kernel paging request at 00200200 IP: [] destroy_conntrack+0x82/0x11f PGD 332d0e067 PUD 33453c067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map CPU 4 Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6 _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep snd_seq drm sn d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core soundcore parport iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: speedstep_lib] Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 Precision WorkStation T7500 RIP: 0010:[] [] destroy_conntrack+0x82/0x11f RSP: 0018:c9803bf0 EFLAGS: 00010202 RAX: 8001 RBX: 816fb1a0 RCX: 752f RDX: 00200200 RSI: 0011 RDI: 816fb1a0 RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 R10: 0002f54189d5 R11: 0001 R12: 819a92e0 R13: a029adcc R14: R15: 880632866c38 FS: 7fdd34b17710() GS:c980() knlGS: CS: 0010 DS: 002B ES: 002B CR0: 80050033 CR2: 00200200 CR3: 0003349c CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task 880634945e00) Stack: 880632866c00 880634640c30 c9803c10 813989c2 <0> c9803c30 81374092 c9803c30 880632866c00 <0> c9803c50 81373dd3 0002 880632866c00 Call Trace: [] nf_conntrack_destroy+0x1b/0x1d [] skb_release_head_state+0x95/0xd7 [] __kfree_skb+0x16/0x81 [] kfree_skb+0x6a/0x72 [] ip6_mc_input+0x220/0x230 [ipv6] [] ip6_rcv_finish+0x27/0x2b [ipv6] [] ipv6_rcv+0x38e/0x3e5 [ipv6] [] netif_receive_skb+0x402/0x427 [] napi_skb_finish+0x29/0x3d [] napi_gro_receive+0x2f/0x34 [] tg3_poll+0x6c6/0x8c3 [tg3] [] net_rx_action+0xaf/0x1c9 [] ? list-add_tail+0x15/0x17 [] __do_softirq+0xdd/0x1ad [] ? apic_write+0x16/0x18 [] call_softirq+0x1c/0x30 [] do_softirq+0x47/0x8d [] irq_exit+0x44/0x86 [] do_IRQ+0xa5/0xbc [] ret_from_intr+0x0/0x11 [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] [] ? vfs_ioctl+0x22/0x87 [] ? do_vfs_ioctl+0x47b/0x4c1 [] ? sys_ioctl+0x56/0x79 [] ? stub_clone+0x13/0x20 [] ? system_call_fastpath+0x16/0x1b Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> 89 02 7 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 RIP [] destroy_conntrack+0x82/0x11f RSP CR2: 00200200 Looks unrelated to kvm - softirq happened to trigger during a kvm ioctl. Fault looks like list poison. Copying netdev. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Networking-related crash?
I've been seeing lots of crashes on a new Dell Precision T7500, running the KVM in Fedora 12. Finally managed to capture an Oops, which is shown below (hand-transcribed): BUG: unable to handle kernel paging request at 00200200 IP: [] destroy_conntrack+0x82/0x11f PGD 332d0e067 PUD 33453c067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map CPU 4 Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6 _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep snd_seq drm sn d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core soundcore parport iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: speedstep_lib] Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 Precision WorkStation T7500 RIP: 0010:[] [] destroy_conntrack+0x82/0x11f RSP: 0018:c9803bf0 EFLAGS: 00010202 RAX: 8001 RBX: 816fb1a0 RCX: 752f RDX: 00200200 RSI: 0011 RDI: 816fb1a0 RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0 R10: 0002f54189d5 R11: 0001 R12: 819a92e0 R13: a029adcc R14: R15: 880632866c38 FS: 7fdd34b17710() GS:c980() knlGS: CS: 0010 DS: 002B ES: 002B CR0: 80050033 CR2: 00200200 CR3: 0003349c CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task 880634945e00) Stack: 880632866c00 880634640c30 c9803c10 813989c2 <0> c9803c30 81374092 c9803c30 880632866c00 <0> c9803c50 81373dd3 0002 880632866c00 Call Trace: [] nf_conntrack_destroy+0x1b/0x1d [] skb_release_head_state+0x95/0xd7 [] __kfree_skb+0x16/0x81 [] kfree_skb+0x6a/0x72 [] ip6_mc_input+0x220/0x230 [ipv6] [] ip6_rcv_finish+0x27/0x2b [ipv6] [] ipv6_rcv+0x38e/0x3e5 [ipv6] [] netif_receive_skb+0x402/0x427 [] napi_skb_finish+0x29/0x3d [] napi_gro_receive+0x2f/0x34 [] tg3_poll+0x6c6/0x8c3 [tg3] [] net_rx_action+0xaf/0x1c9 [] ? list-add_tail+0x15/0x17 [] __do_softirq+0xdd/0x1ad [] ? apic_write+0x16/0x18 [] call_softirq+0x1c/0x30 [] do_softirq+0x47/0x8d [] irq_exit+0x44/0x86 [] do_IRQ+0xa5/0xbc [] ret_from_intr+0x0/0x11 [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] [] ? vfs_ioctl+0x22/0x87 [] ? do_vfs_ioctl+0x47b/0x4c1 [] ? sys_ioctl+0x56/0x79 [] ? stub_clone+0x13/0x20 [] ? system_call_fastpath+0x16/0x1b Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01 <48> 89 02 7 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 RIP [] destroy_conntrack+0x82/0x11f RSP CR2: 00200200 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html