Hello, I'm currently hitting a null pointer dereference and kernel panic that seems to be in ovs. The problem is sporadic. I have one production machine that's hit it four times in the past 24hrs, and one lab machine that I can't get to hit it at all.
We rebuilt openvswitch with debugging symbols turned on, and traced the null pointer dereference to datapath/linux/flow.c:814 . Do you have any advice on how to trace this back to a root cause (or, ideally, a fix) ? I've scoured Google for related issues but come up short. (I'll happily accept that my google-fu is lacking, though.) I would greatly appreciate any guidance you could offer. Here's some more information about my system, for context. All nodes have the following versions: root@node:~# uname -a Linux node 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux root@node:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04 Codename: precise root@node:~# dpkg --list | grep openvswitch ii openvswitch-common 1.10.2-0ubuntu2~cloud0 Open vSwitch common components ii openvswitch-datapath-dkms 1.10.2-0ubuntu2~cloud0 Open vSwitch datapath module source - DKMS version ii openvswitch-switch 1.10.2-0ubuntu2~cloud0 Open vSwitch switch implementations root@node:~# The stack trace from the console of a panic'd machine: [259616.202845] Pid: 28568, comm: vhost-28567 Tainted: G WC O 3.2.0-58-generic #88-Ubuntu /0PXXHP$ [259616.213437] RIP: 0010:[<ffffffffa024ecb2>] [<ffffffffa024ecb2>] ovs_flow_tbl_lookup+0xb2/0x100 [openvswitch]$ [259616.224611] RSP: 0018:ffff88180f243cb8 EFLAGS: 00010282$ [259616.230630] RAX: 0000000000000020 RBX: ffff880c72c407c0 RCX: ffffffffffffffe0$ [259616.238678] RDX: ffff88010a0ba678 RSI: 0000000000000004 RDI: ffff8807aa5ac000$ [259616.246728] RBP: ffff88180f243cf8 R08: 000000000000002c R09: 000000003fc3955c$ [259616.254776] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000c2aa5f8$ [259616.262825] R13: 0000000000000018 R14: ffff88180f243d48 R15: 000000000000002c$ [259616.270877] FS: 0000000000000000(0000) GS:ffff88180f240000(0000) knlGS:0000000000000000$ [259616.279990] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b$ [259616.286490] CR2: 0000000000000010 CR3: 0000000924208000 CR4: 00000000000426e0$ [259616.294539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000$ [259616.302588] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400$ [259616.310640] Process vhost-28567 (pid: 28568, threadinfo ffff8805e20e6000, task ffff880bed3a0000)$ [259616.320527] Stack:$ [259616.322866] ffff881206128c00 ffff880d00000044 ffff88180f243cf8 ffff880d2f61f100$ [259616.331256] ffffe8ffffa42188 ffff8817f0faf2c0 ffff881206128c00 ffff88180f254454$ [259616.339649] ffff88180f243dd8 ffffffffa024cf15 ffffffff8108f501 ffff88180f243d30$ [259616.348041] Call Trace:$ [259616.350863] <IRQ> $ [259616.353321] [<ffffffffa024cf15>] ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$ [259616.362543] [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$ [259616.369057] [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50 [openvswitch]$ [259616.376926] [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0 [openvswitch]$ [259616.384795] [<ffffffffa0252160>] ? netdev_create+0x110/0x110 [openvswitch]$ [259616.392660] [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$ [259616.399558] [<ffffffff81547411>] process_backlog+0xb1/0x190$ [259616.405973] [<ffffffff81548734>] net_rx_action+0x134/0x290$ [259616.412288] [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$ [259616.418413] [<ffffffff8166c62c>] call_softirq+0x1c/0x30$ [259616.424429] <EOI> $ [259616.426883] [<ffffffff810162f5>] do_softirq+0x65/0xa0$ [259616.432714] [<ffffffff81548c08>] netif_rx_ni+0x28/0x30$ [259616.438643] [<ffffffff8147d89b>] tun_get_user+0x2fb/0x4a0$ [259616.444863] [<ffffffff8147da65>] tun_sendmsg+0x25/0x30$ [259616.450790] [<ffffffffa040f9d6>] handle_tx+0x296/0x520 [vhost_net]$ [259616.457880] [<ffffffffa040fc95>] handle_tx_kick+0x15/0x20 [vhost_net]$ [259616.465260] [<ffffffffa040ce4d>] vhost_worker+0xdd/0x170 [vhost_net]$ [259616.472543] [<ffffffffa040cd70>] ? vhost_set_memory+0x130/0x130 [vhost_net]$ [259616.480506] [<ffffffff8108b63c>] kthread+0x8c/0xa0$ [259616.486048] [<ffffffff8166c534>] kernel_thread_helper+0x4/0x10$ [259616.492752] [<ffffffff8108b5b0>] ? flush_kthread_worker+0xa0/0xa0$ [259616.499747] [<ffffffff8166c530>] ? gs_change+0x13/0x13$ [259616.505665] Code: 00 48 63 53 20 48 8d 42 01 48 c1 e0 04 48 01 c1 48 8b 01 48 85 c0 74 51 48 8b 09 48 c1 e2 04 48 83 c2 10 48 29 d1 48 85 c9 74 26 <44> 39 61 30 75 d0 4a 8d 7c 29 38 4c 89 fa 4c 89 f6 48 89 4d c8 $ [259616.527456] RIP [<ffffffffa024ecb2>] ovs_flow_tbl_lookup+0xb2/0x100 [openvswitch]$ [259616.536016] RSP <ffff88180f243cb8>$ [259616.540000] CR2: 0000000000000010$ [259616.544395] ---[ end trace 7cd7ddd24540f1d3 ]---$ [259616.549662] Kernel panic - not syncing: Fatal exception in interrupt$ [259616.556849] Pid: 28568, comm: vhost-28567 Tainted: G D WC O 3.2.0-58-generic #88-Ubuntu$ [259616.566357] Call Trace:$ [259616.569183] <IRQ> [<ffffffff81649285>] panic+0x91/0x1a4$ [259616.575345] [<ffffffff81662f5a>] oops_end+0xea/0xf0$ [259616.580994] [<ffffffff8164812f>] no_context+0x150/0x15d$ [259616.587028] [<ffffffff81648307>] __bad_area_nosemaphore+0x1cb/0x1ea$ [259616.594227] [<ffffffff811645eb>] ? kfree+0x3b/0x140$ [259616.599873] [<ffffffff810e234e>] ? rcu_irq_exit+0xe/0x10$ [259616.606009] [<ffffffff81648339>] bad_area_nosemaphore+0x13/0x15$ [259616.612820] [<ffffffff81665bab>] do_page_fault+0x46b/0x540$ [259616.619143] [<ffffffff8153a455>] ? kfree_skb+0x45/0xc0$ [259616.625082] [<ffffffff81571479>] ? netlink_attachskb+0x1d9/0x220$ [259616.631989] [<ffffffff810608e0>] ? try_to_wake_up+0x200/0x200$ [259616.638608] [<ffffffff816624f5>] page_fault+0x25/0x30$ [259616.644456] [<ffffffffa024ecb2>] ? ovs_flow_tbl_lookup+0xb2/0x100 [openvswitch]$ [259616.652817] [<ffffffffa024ec5a>] ? ovs_flow_tbl_lookup+0x5a/0x100 [openvswitch]$ [259616.661180] [<ffffffffa024cf15>] ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$ [259616.670410] [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$ [259616.676936] [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50 [openvswitch]$ [259616.684813] [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0 [openvswitch]$ [259616.692696] [<ffffffffa0252160>] ? netdev_create+0x110/0x110 [openvswitch]$ [259616.700575] [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$ [259616.707482] [<ffffffff81547411>] process_backlog+0xb1/0x190$ [259616.713915] [<ffffffff81548734>] net_rx_action+0x134/0x290$ [259616.720242] [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$ [259616.726384] [<ffffffff8166c62c>] call_softirq+0x1c/0x30$ [259616.732410] <EOI> [<ffffffff810162f5>] do_softirq+0x65/0xa0$ -- /thor
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev