Re: Networking-related crash?

2010-01-05 Thread Adam Huffman
On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy  wrote:
> Adam Huffman wrote:
>> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy  wrote:
>>> Eric Dumazet wrote:
 Le 09/12/2009 16:11, Avi Kivity a écrit :
> On 12/09/2009 03:46 PM, Adam Huffman wrote:
>> I've been seeing lots of crashes on a new Dell Precision T7500,
>> running the KVM in Fedora 12.  Finally managed to capture an Oops,
>> which is shown below (hand-transcribed):
>>
>> BUG: unable to handle kernel paging request at 00200200
>> IP: [] destroy_conntrack+0x82/0x11f
>> PGD 332d0e067 PUD 33453c067 PMD 0
>> RIP: 0010:[]  []
>> destroy_conntrack+0x82/0x11f
>> RSP: 0018:c9803bf0  EFLAGS: 00010202
>> RAX: 8001 RBX: 816fb1a0 RCX: 752f
>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
>> R13: a029adcc R14:  R15: 880632866c38
>> FS:  7fdd34b17710() GS:c980()
>> knlGS:
>> CS:  0010 DS: 002B ES: 002B CR0: 80050033
>> CR2: 00200200 CR3: 0003349c CR4: 26e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
>> 880634945e00)
>> Stack:
>>   880632866c00 880634640c30 c9803c10 813989c2
>> <0>  c9803c30 81374092 c9803c30 880632866c00
>> <0>  c9803c50 81373dd3 0002 880632866c00
>> Call Trace:
>>   
>>   [] nf_conntrack_destroy+0x1b/0x1d
>>   [] skb_release_head_state+0x95/0xd7
>>   [] __kfree_skb+0x16/0x81
>>   [] kfree_skb+0x6a/0x72
>>   [] ip6_mc_input+0x220/0x230 [ipv6]
>>   [] ip6_rcv_finish+0x27/0x2b [ipv6]
>>   [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>>   [] netif_receive_skb+0x402/0x427
>>   ...
>>
 crash in :
       48 8b 43 08             mov    0x8(%rbx),%rax
       a8 01                   test   $0x1,%al
       48 89 02                mov    %rax,(%rdx)  << HERE >> RDX=0x200200  
 (LIST_POISON2)
       75 04                   jne    1f
       48 89 50 08             mov    %rdx,0x8(%rax)
 1:    48 c7 43 10 00 02 20    movq   $0x200200,0x10(%rbx)

       if (!nf_ct_is_confirmed(ct)) {
               
 BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
               
 hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);  << HERE >>
       }
       NF_CT_STAT_INC(net, delete);
>>>
>>> I can't spot the problem. Adam, please send me your .config file.
>>>
>>>
>>
>> It's the standard Fedora .config, which is attached.
>>
>> As I stated in another message, the oops seems related to VT-d.  With
>> that disabled, the machine has been stable for nearly a day now.
>
> That probably only affects the timing of some race. Please also
> send me the IPv6 ruleset used on that machine. Thanks.
>

Just to note that if I disable IPv6 completely, the machine is stable
- certainly compared with the crashes after a few minutes when IPv6 is
enabled.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-14 Thread Adam Huffman
On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy  wrote:
> Adam Huffman wrote:
>> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy  wrote:
>>> Eric Dumazet wrote:
 Le 09/12/2009 16:11, Avi Kivity a écrit :
> On 12/09/2009 03:46 PM, Adam Huffman wrote:
>> I've been seeing lots of crashes on a new Dell Precision T7500,
>> running the KVM in Fedora 12.  Finally managed to capture an Oops,
>> which is shown below (hand-transcribed):
>>
>> BUG: unable to handle kernel paging request at 00200200
>> IP: [] destroy_conntrack+0x82/0x11f
>> PGD 332d0e067 PUD 33453c067 PMD 0
>> RIP: 0010:[]  []
>> destroy_conntrack+0x82/0x11f
>> RSP: 0018:c9803bf0  EFLAGS: 00010202
>> RAX: 8001 RBX: 816fb1a0 RCX: 752f
>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
>> R13: a029adcc R14:  R15: 880632866c38
>> FS:  7fdd34b17710() GS:c980()
>> knlGS:
>> CS:  0010 DS: 002B ES: 002B CR0: 80050033
>> CR2: 00200200 CR3: 0003349c CR4: 26e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
>> 880634945e00)
>> Stack:
>>   880632866c00 880634640c30 c9803c10 813989c2
>> <0>  c9803c30 81374092 c9803c30 880632866c00
>> <0>  c9803c50 81373dd3 0002 880632866c00
>> Call Trace:
>>   
>>   [] nf_conntrack_destroy+0x1b/0x1d
>>   [] skb_release_head_state+0x95/0xd7
>>   [] __kfree_skb+0x16/0x81
>>   [] kfree_skb+0x6a/0x72
>>   [] ip6_mc_input+0x220/0x230 [ipv6]
>>   [] ip6_rcv_finish+0x27/0x2b [ipv6]
>>   [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>>   [] netif_receive_skb+0x402/0x427
>>   ...
>>
 crash in :
       48 8b 43 08             mov    0x8(%rbx),%rax
       a8 01                   test   $0x1,%al
       48 89 02                mov    %rax,(%rdx)  << HERE >> RDX=0x200200  
 (LIST_POISON2)
       75 04                   jne    1f
       48 89 50 08             mov    %rdx,0x8(%rax)
 1:    48 c7 43 10 00 02 20    movq   $0x200200,0x10(%rbx)

       if (!nf_ct_is_confirmed(ct)) {
               
 BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
               
 hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);  << HERE >>
       }
       NF_CT_STAT_INC(net, delete);
>>>
>>> I can't spot the problem. Adam, please send me your .config file.
>>>
>>>
>>
>> It's the standard Fedora .config, which is attached.
>>
>> As I stated in another message, the oops seems related to VT-d.  With
>> that disabled, the machine has been stable for nearly a day now.
>
> That probably only affects the timing of some race. Please also
> send me the IPv6 ruleset used on that machine. Thanks.
>

Again, it's the Fedora 12 default.  Here you go:

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-14 Thread Patrick McHardy
Adam Huffman wrote:
> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy  wrote:
>> Eric Dumazet wrote:
>>> Le 09/12/2009 16:11, Avi Kivity a écrit :
 On 12/09/2009 03:46 PM, Adam Huffman wrote:
> I've been seeing lots of crashes on a new Dell Precision T7500,
> running the KVM in Fedora 12.  Finally managed to capture an Oops,
> which is shown below (hand-transcribed):
>
> BUG: unable to handle kernel paging request at 00200200
> IP: [] destroy_conntrack+0x82/0x11f
> PGD 332d0e067 PUD 33453c067 PMD 0
> RIP: 0010:[]  []
> destroy_conntrack+0x82/0x11f
> RSP: 0018:c9803bf0  EFLAGS: 00010202
> RAX: 8001 RBX: 816fb1a0 RCX: 752f
> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
> R13: a029adcc R14:  R15: 880632866c38
> FS:  7fdd34b17710() GS:c980()
> knlGS:
> CS:  0010 DS: 002B ES: 002B CR0: 80050033
> CR2: 00200200 CR3: 0003349c CR4: 26e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
> 880634945e00)
> Stack:
>   880632866c00 880634640c30 c9803c10 813989c2
> <0>  c9803c30 81374092 c9803c30 880632866c00
> <0>  c9803c50 81373dd3 0002 880632866c00
> Call Trace:
>   
>   [] nf_conntrack_destroy+0x1b/0x1d
>   [] skb_release_head_state+0x95/0xd7
>   [] __kfree_skb+0x16/0x81
>   [] kfree_skb+0x6a/0x72
>   [] ip6_mc_input+0x220/0x230 [ipv6]
>   [] ip6_rcv_finish+0x27/0x2b [ipv6]
>   [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>   [] netif_receive_skb+0x402/0x427
>   ...
>
>>> crash in :
>>>   48 8b 43 08 mov0x8(%rbx),%rax
>>>   a8 01   test   $0x1,%al
>>>   48 89 02mov%rax,(%rdx)  << HERE >> RDX=0x200200  
>>> (LIST_POISON2)
>>>   75 04   jne1f
>>>   48 89 50 08 mov%rdx,0x8(%rax)
>>> 1:48 c7 43 10 00 02 20movq   $0x200200,0x10(%rbx)
>>>
>>>   if (!nf_ct_is_confirmed(ct)) {
>>>   
>>> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
>>>   
>>> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);  << HERE >>
>>>   }
>>>   NF_CT_STAT_INC(net, delete);
>>
>> I can't spot the problem. Adam, please send me your .config file.
>>
>>
> 
> It's the standard Fedora .config, which is attached.
> 
> As I stated in another message, the oops seems related to VT-d.  With
> that disabled, the machine has been stable for nearly a day now.

That probably only affects the timing of some race. Please also
send me the IPv6 ruleset used on that machine. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-10 Thread Patrick McHardy
Eric Dumazet wrote:
> Le 09/12/2009 16:11, Avi Kivity a écrit :
>> On 12/09/2009 03:46 PM, Adam Huffman wrote:
>>> I've been seeing lots of crashes on a new Dell Precision T7500,
>>> running the KVM in Fedora 12.  Finally managed to capture an Oops,
>>> which is shown below (hand-transcribed):
>>>
>>> BUG: unable to handle kernel paging request at 00200200
>>> IP: [] destroy_conntrack+0x82/0x11f
>>> PGD 332d0e067 PUD 33453c067 PMD 0
>>> RIP: 0010:[]  []
>>> destroy_conntrack+0x82/0x11f
>>> RSP: 0018:c9803bf0  EFLAGS: 00010202
>>> RAX: 8001 RBX: 816fb1a0 RCX: 752f
>>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
>>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
>>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
>>> R13: a029adcc R14:  R15: 880632866c38
>>> FS:  7fdd34b17710() GS:c980()
>>> knlGS:
>>> CS:  0010 DS: 002B ES: 002B CR0: 80050033
>>> CR2: 00200200 CR3: 0003349c CR4: 26e0
>>> DR0:  DR1:  DR2: 
>>> DR3:  DR6: 0ff0 DR7: 0400
>>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
>>> 880634945e00)
>>> Stack:
>>>   880632866c00 880634640c30 c9803c10 813989c2
>>> <0>  c9803c30 81374092 c9803c30 880632866c00
>>> <0>  c9803c50 81373dd3 0002 880632866c00
>>> Call Trace:
>>>   
>>>   [] nf_conntrack_destroy+0x1b/0x1d
>>>   [] skb_release_head_state+0x95/0xd7
>>>   [] __kfree_skb+0x16/0x81
>>>   [] kfree_skb+0x6a/0x72
>>>   [] ip6_mc_input+0x220/0x230 [ipv6]
>>>   [] ip6_rcv_finish+0x27/0x2b [ipv6]
>>>   [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>>>   [] netif_receive_skb+0x402/0x427
>>>   ...
>>>
> crash in :
>   48 8b 43 08 mov0x8(%rbx),%rax
>   a8 01   test   $0x1,%al
>   48 89 02mov%rax,(%rdx)  << HERE >> RDX=0x200200  
> (LIST_POISON2)
>   75 04   jne1f
>   48 89 50 08 mov%rdx,0x8(%rax)
> 1:48 c7 43 10 00 02 20movq   $0x200200,0x10(%rbx)
> 
>   if (!nf_ct_is_confirmed(ct)) {
>   
> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
>   hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); 
>  << HERE >>
>   }
>   NF_CT_STAT_INC(net, delete); 


I can't spot the problem. Adam, please send me your .config file.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-09 Thread Eric Dumazet
Le 09/12/2009 16:11, Avi Kivity a écrit :
> On 12/09/2009 03:46 PM, Adam Huffman wrote:
>> I've been seeing lots of crashes on a new Dell Precision T7500,
>> running the KVM in Fedora 12.  Finally managed to capture an Oops,
>> which is shown below (hand-transcribed):
>>
>> BUG: unable to handle kernel paging request at 00200200
>> IP: [] destroy_conntrack+0x82/0x11f
>> PGD 332d0e067 PUD 33453c067 PMD 0
>> Oops: 0002 [#1] SMP
>> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
>> CPU 4
>> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE
>> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6
>> ip6table_filter ip6
>> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog
>> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep
>> snd_seq drm sn
>> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev
>> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core
>> soundcore parport
>>   iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas
>> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded:
>> speedstep_lib]
>> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1
>> Precision WorkStation T7500
>> RIP: 0010:[]  []
>> destroy_conntrack+0x82/0x11f
>> RSP: 0018:c9803bf0  EFLAGS: 00010202
>> RAX: 8001 RBX: 816fb1a0 RCX: 752f
>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
>> R13: a029adcc R14:  R15: 880632866c38
>> FS:  7fdd34b17710() GS:c980()
>> knlGS:
>> CS:  0010 DS: 002B ES: 002B CR0: 80050033
>> CR2: 00200200 CR3: 0003349c CR4: 26e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
>> 880634945e00)
>> Stack:
>>   880632866c00 880634640c30 c9803c10 813989c2
>> <0>  c9803c30 81374092 c9803c30 880632866c00
>> <0>  c9803c50 81373dd3 0002 880632866c00
>> Call Trace:
>>   
>>   [] nf_conntrack_destroy+0x1b/0x1d
>>   [] skb_release_head_state+0x95/0xd7
>>   [] __kfree_skb+0x16/0x81
>>   [] kfree_skb+0x6a/0x72
>>   [] ip6_mc_input+0x220/0x230 [ipv6]
>>   [] ip6_rcv_finish+0x27/0x2b [ipv6]
>>   [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>>   [] netif_receive_skb+0x402/0x427
>>   [] napi_skb_finish+0x29/0x3d
>>   [] napi_gro_receive+0x2f/0x34
>>   [] tg3_poll+0x6c6/0x8c3 [tg3]
>>   [] net_rx_action+0xaf/0x1c9
>>   [] ? list-add_tail+0x15/0x17
>>   [] __do_softirq+0xdd/0x1ad
>>   [] ? apic_write+0x16/0x18
>>   [] call_softirq+0x1c/0x30
>>   [] do_softirq+0x47/0x8d
>>   [] irq_exit+0x44/0x86
>>   [] do_IRQ+0xa5/0xbc
>>   [] ret_from_intr+0x0/0x11
>>   
>>   [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm]
>>   [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm]
>>   [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm]
>>   [] ? vfs_ioctl+0x22/0x87
>>   [] ? do_vfs_ioctl+0x47b/0x4c1
>>   [] ? sys_ioctl+0x56/0x79
>>   [] ? stub_clone+0x13/0x20
>>   [] ? system_call_fastpath+0x16/0x1b
>> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78
>> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48>
>> 89 02 7
>> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25
>> RIP  [] destroy_conntrack+0x82/0x11f
>>   RSP
>> CR2: 00200200
>>
> 
> Looks unrelated to kvm - softirq happened to trigger during a kvm
> ioctl.  Fault looks like list poison.  Copying netdev.
> 

crash in :
48 8b 43 08 mov0x8(%rbx),%rax
a8 01   test   $0x1,%al
48 89 02mov%rax,(%rdx)  << HERE >> RDX=0x200200  
(LIST_POISON2)
75 04   jne1f
48 89 50 08 mov%rdx,0x8(%rax)
1:  48 c7 43 10 00 02 20movq   $0x200200,0x10(%rbx)

if (!nf_ct_is_confirmed(ct)) {

BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); 
 << HERE >>
}
NF_CT_STAT_INC(net, delete); 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-09 Thread Adam Huffman
On Wed, Dec 9, 2009 at 3:11 PM, Avi Kivity  wrote:
> On 12/09/2009 03:46 PM, Adam Huffman wrote:
>>
>> I've been seeing lots of crashes on a new Dell Precision T7500,
>> running the KVM in Fedora 12.  Finally managed to capture an Oops,
>> which is shown below (hand-transcribed):
>>
>> BUG: unable to handle kernel paging request at 00200200
>> IP: [] destroy_conntrack+0x82/0x11f
>> PGD 332d0e067 PUD 33453c067 PMD 0
>> Oops: 0002 [#1] SMP
>> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
>> CPU 4
>> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE
>> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6
>> ip6table_filter ip6
>> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog
>> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep
>> snd_seq drm sn
>> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev
>> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core
>> soundcore parport
>>  iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas
>> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded:
>> speedstep_lib]
>> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1
>> Precision WorkStation T7500
>> RIP: 0010:[]  []
>> destroy_conntrack+0x82/0x11f
>> RSP: 0018:c9803bf0  EFLAGS: 00010202
>> RAX: 8001 RBX: 816fb1a0 RCX: 752f
>> RDX: 00200200 RSI: 0011 RDI: 816fb1a0
>> RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
>> R10: 0002f54189d5 R11: 0001 R12: 819a92e0
>> R13: a029adcc R14:  R15: 880632866c38
>> FS:  7fdd34b17710() GS:c980()
>> knlGS:
>> CS:  0010 DS: 002B ES: 002B CR0: 80050033
>> CR2: 00200200 CR3: 0003349c CR4: 26e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task
>> 880634945e00)
>> Stack:
>>  880632866c00 880634640c30 c9803c10 813989c2
>> <0>  c9803c30 81374092 c9803c30 880632866c00
>> <0>  c9803c50 81373dd3 0002 880632866c00
>> Call Trace:
>>  
>>  [] nf_conntrack_destroy+0x1b/0x1d
>>  [] skb_release_head_state+0x95/0xd7
>>  [] __kfree_skb+0x16/0x81
>>  [] kfree_skb+0x6a/0x72
>>  [] ip6_mc_input+0x220/0x230 [ipv6]
>>  [] ip6_rcv_finish+0x27/0x2b [ipv6]
>>  [] ipv6_rcv+0x38e/0x3e5 [ipv6]
>>  [] netif_receive_skb+0x402/0x427
>>  [] napi_skb_finish+0x29/0x3d
>>  [] napi_gro_receive+0x2f/0x34
>>  [] tg3_poll+0x6c6/0x8c3 [tg3]
>>  [] net_rx_action+0xaf/0x1c9
>>  [] ? list-add_tail+0x15/0x17
>>  [] __do_softirq+0xdd/0x1ad
>>  [] ? apic_write+0x16/0x18
>>  [] call_softirq+0x1c/0x30
>>  [] do_softirq+0x47/0x8d
>>  [] irq_exit+0x44/0x86
>>  [] do_IRQ+0xa5/0xbc
>>  [] ret_from_intr+0x0/0x11
>>  
>>  [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm]
>>  [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm]
>>  [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm]
>>  [] ? vfs_ioctl+0x22/0x87
>>  [] ? do_vfs_ioctl+0x47b/0x4c1
>>  [] ? sys_ioctl+0x56/0x79
>>  [] ? stub_clone+0x13/0x20
>>  [] ? system_call_fastpath+0x16/0x1b
>> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78
>> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48>
>> 89 02 7
>> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25
>> RIP  [] destroy_conntrack+0x82/0x11f
>>  RSP
>> CR2: 00200200
>>
>
> Looks unrelated to kvm - softirq happened to trigger during a kvm ioctl.
>  Fault looks like list poison.  Copying netdev.
>

Disabling VT-d support in the BIOS seems to have stopped the crashes.
At least it's been running without crashing for several hours now,
while it would only last minutes before.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking-related crash?

2009-12-09 Thread Avi Kivity

On 12/09/2009 03:46 PM, Adam Huffman wrote:

I've been seeing lots of crashes on a new Dell Precision T7500,
running the KVM in Fedora 12.  Finally managed to capture an Oops,
which is shown below (hand-transcribed):

BUG: unable to handle kernel paging request at 00200200
IP: [] destroy_conntrack+0x82/0x11f
PGD 332d0e067 PUD 33453c067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 4
Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE
iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6
ip6table_filter ip6
_tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog
nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep
snd_seq drm sn
d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev
firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core
soundcore parport
  iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas
mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded:
speedstep_lib]
Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1
Precision WorkStation T7500
RIP: 0010:[]  []
destroy_conntrack+0x82/0x11f
RSP: 0018:c9803bf0  EFLAGS: 00010202
RAX: 8001 RBX: 816fb1a0 RCX: 752f
RDX: 00200200 RSI: 0011 RDI: 816fb1a0
RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
R10: 0002f54189d5 R11: 0001 R12: 819a92e0
R13: a029adcc R14:  R15: 880632866c38
FS:  7fdd34b17710() GS:c980() knlGS:
CS:  0010 DS: 002B ES: 002B CR0: 80050033
CR2: 00200200 CR3: 0003349c CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task 880634945e00)
Stack:
  880632866c00 880634640c30 c9803c10 813989c2
<0>  c9803c30 81374092 c9803c30 880632866c00
<0>  c9803c50 81373dd3 0002 880632866c00
Call Trace:
  
  [] nf_conntrack_destroy+0x1b/0x1d
  [] skb_release_head_state+0x95/0xd7
  [] __kfree_skb+0x16/0x81
  [] kfree_skb+0x6a/0x72
  [] ip6_mc_input+0x220/0x230 [ipv6]
  [] ip6_rcv_finish+0x27/0x2b [ipv6]
  [] ipv6_rcv+0x38e/0x3e5 [ipv6]
  [] netif_receive_skb+0x402/0x427
  [] napi_skb_finish+0x29/0x3d
  [] napi_gro_receive+0x2f/0x34
  [] tg3_poll+0x6c6/0x8c3 [tg3]
  [] net_rx_action+0xaf/0x1c9
  [] ? list-add_tail+0x15/0x17
  [] __do_softirq+0xdd/0x1ad
  [] ? apic_write+0x16/0x18
  [] call_softirq+0x1c/0x30
  [] do_softirq+0x47/0x8d
  [] irq_exit+0x44/0x86
  [] do_IRQ+0xa5/0xbc
  [] ret_from_intr+0x0/0x11
  
  [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm]
  [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm]
  [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm]
  [] ? vfs_ioctl+0x22/0x87
  [] ? do_vfs_ioctl+0x47b/0x4c1
  [] ? sys_ioctl+0x56/0x79
  [] ? stub_clone+0x13/0x20
  [] ? system_call_fastpath+0x16/0x1b
Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78
08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48>
89 02 7
5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25
RIP  [] destroy_conntrack+0x82/0x11f
  RSP
CR2: 00200200
   


Looks unrelated to kvm - softirq happened to trigger during a kvm 
ioctl.  Fault looks like list poison.  Copying netdev.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Networking-related crash?

2009-12-09 Thread Adam Huffman
I've been seeing lots of crashes on a new Dell Precision T7500,
running the KVM in Fedora 12.  Finally managed to capture an Oops,
which is shown below (hand-transcribed):

BUG: unable to handle kernel paging request at 00200200
IP: [] destroy_conntrack+0x82/0x11f
PGD 332d0e067 PUD 33453c067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 4
Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE
iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6
ip6table_filter ip6
_tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog
nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep
snd_seq drm sn
d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev
firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core
soundcore parport
 iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas
mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded:
speedstep_lib]
Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1
Precision WorkStation T7500
RIP: 0010:[]  []
destroy_conntrack+0x82/0x11f
RSP: 0018:c9803bf0  EFLAGS: 00010202
RAX: 8001 RBX: 816fb1a0 RCX: 752f
RDX: 00200200 RSI: 0011 RDI: 816fb1a0
RBP: c9803c00 R08: 880336699438 R09: 00aaa5e0
R10: 0002f54189d5 R11: 0001 R12: 819a92e0
R13: a029adcc R14:  R15: 880632866c38
FS:  7fdd34b17710() GS:c980() knlGS:
CS:  0010 DS: 002B ES: 002B CR0: 80050033
CR2: 00200200 CR3: 0003349c CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process qemu-kvm (pid: 1759, threadinfo 88062e9e8000, task 880634945e00)
Stack:
 880632866c00 880634640c30 c9803c10 813989c2
<0> c9803c30 81374092 c9803c30 880632866c00
<0> c9803c50 81373dd3 0002 880632866c00
Call Trace:
 
 [] nf_conntrack_destroy+0x1b/0x1d
 [] skb_release_head_state+0x95/0xd7
 [] __kfree_skb+0x16/0x81
 [] kfree_skb+0x6a/0x72
 [] ip6_mc_input+0x220/0x230 [ipv6]
 [] ip6_rcv_finish+0x27/0x2b [ipv6]
 [] ipv6_rcv+0x38e/0x3e5 [ipv6]
 [] netif_receive_skb+0x402/0x427
 [] napi_skb_finish+0x29/0x3d
 [] napi_gro_receive+0x2f/0x34
 [] tg3_poll+0x6c6/0x8c3 [tg3]
 [] net_rx_action+0xaf/0x1c9
 [] ? list-add_tail+0x15/0x17
 [] __do_softirq+0xdd/0x1ad
 [] ? apic_write+0x16/0x18
 [] call_softirq+0x1c/0x30
 [] do_softirq+0x47/0x8d
 [] irq_exit+0x44/0x86
 [] do_IRQ+0xa5/0xbc
 [] ret_from_intr+0x0/0x11
 
 [] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm]
 [] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm]
 [] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm]
 [] ? vfs_ioctl+0x22/0x87
 [] ? do_vfs_ioctl+0x47b/0x4c1
 [] ? sys_ioctl+0x56/0x79
 [] ? stub_clone+0x13/0x20
 [] ? system_call_fastpath+0x16/0x1b
Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78
08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01 <48>
89 02 7
5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25
RIP  [] destroy_conntrack+0x82/0x11f
 RSP 
CR2: 00200200
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html