Package: linux-image-3.16.0-4-amd64 Version: 3.16.7-ckt11-1+deb8u6 Severity: important
Hi! Inside a Xen domU, with the combination of * latest kernel of jessie (3.16.7-ckt11-1+deb8u6) or related kernel from wheezy-backports (3.16.7-ckt11-1+deb8u6~bpo70+1) and * 2 network interfaces and * 24 VCPUs .. I see error "unable to handle kernel NULL pointer dereference" during start-up ... [ 0.755434] xen_netfront: can't alloc rx grant refs [ 0.758359] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 0.761622] IP: [<ffffffffa018bc09>] netback_changed+0x989/0xf00 [xen_netfront] [ 0.761622] PGD 0 [ 0.761622] Oops: 0000 [#1] SMP [ 0.761622] Modules linked in: ata_piix xen_blkfront(+) xen_netfront(+) libata crc32c_intel floppy scsi_mod [ 0.761622] CPU: 1 PID: 129 Comm: xenwatch Not tainted 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u6~bpo70+1 [ 0.761622] Hardware name: Xen HVM domU, BIOS 4.4.1 10/26/2015 [ 0.761622] task: ffff88003bbd53f0 ti: ffff88003bbd8000 task.ti: ffff88003bbd8000 [ 0.761622] RIP: 0010:[<ffffffffa018bc09>] [<ffffffffa018bc09>] netback_changed+0x989/0xf00 [xen_netfront] [ 0.761622] RSP: 0018:ffff88003bbdbde8 EFLAGS: 00010202 [ 0.761622] RAX: 0000000000000000 RBX: ffff880032398d00 RCX: 0000000000000001 [ 0.761622] RDX: 00000000000322a7 RSI: ffff880032398d98 RDI: 0000000000005729 [ 0.761622] RBP: 0000000000098d00 R08: 0000000000000001 R09: ffffffff8172b600 [ 0.761622] R10: ffffea0000af94c0 R11: ffffea0000af9b38 R12: ffff880036a61000 [ 0.761622] R13: ffff8800322a6000 R14: ffff880036a618c0 R15: ffff8800322a7000 [ 0.761622] FS: 0000000000000000(0000) GS:ffff88003ce20000(0000) knlGS:0000000000000000 [ 0.761622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.761622] CR2: 0000000000000018 CR3: 0000000001811000 CR4: 00000000001406e0 [ 0.761622] Stack: [ 0.761622] ffff88003b5e0c20 ffff880032391381 ffff8800323912c4 ffff880000000018 [ 0.761622] ffff88003b5e0c00 0000001400000001 ffff880032398d98 ffff88003b5e0c00 [ 0.761622] 0000000000000000 ffff8800328798f1 0000000800000001 0000003800000001 [ 0.761622] Call Trace: [ 0.761622] [<ffffffff81381d50>] ? xenbus_thread+0x2a0/0x2a0 [ 0.761622] [<ffffffff81381dea>] ? xenwatch_thread+0x9a/0x140 [ 0.761622] [<ffffffff810b13b0>] ? __wake_up_sync+0x20/0x20 [ 0.761622] [<ffffffff81090741>] ? kthread+0xc1/0xe0 [ 0.761622] [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0 [ 0.761622] [<ffffffff8154be58>] ? ret_from_fork+0x58/0x90 [ 0.761622] [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0 [ 0.761622] Code: 63 38 fe e9 5c fb ff ff 48 8b 7c 24 20 48 c7 c2 cb d2 18 a0 be f4 ff ff ff 31 c0 e8 72 4a 1f e1 eb a2 48 8b 43 20 48 8b 74 24 30 <48> 8b 78 18 e8 8e 4b 1f e1 85 c0 0f 88 d5 fd ff ff 48 8b 43 20 [ 0.761622] RIP [<ffffffffa018bc09>] netback_changed+0x989/0xf00 [xen_netfront] [ 0.761622] RSP <ffff88003bbdbde8> [ 0.761622] CR2: 0000000000000018 [ 0.761622] ---[ end trace 6123087ce2740115 ]--- ... and the second network interface ends up unusuable. It turns out, what's happening is that: * by default, the hypervisor allocates 32 grant table entries and * network interface can need more than 32. * Now function talk_to_netback (drivers/net/xen-netfront.c) calls function xennet_create_queues (drivers/net/xen-netfront.c) to create num_queues many queues. * xennet_create_queues goes on as long as it can and stores the number of queues created at info->netdev->real_num_tx_queues. * Now function talk_to_netback continues operation with the (wrong) assumption that num_queues queues are in place, while it may be fewer than that. So yyncing num_queues with info->netdev->real_num_tx_queues fixes the problem. Viktor Dukhovni published a patch on 2015-09-09 at http://lists.xenproject.org/archives/html/xen-users/2015-09/txtbaRgWqxpT4.txt , already. His patch also fixes the "only created %d queues" message: unpatched it is using the wanted number of queues (rather than the number of queues created), by mistake. I'm hoping for an updated kernel package including Viktor's patch, soon. For a workaround, one can use something like gnttab_max_nr_frames=256 to increase the size of the grant table (with GRUB_CMDLINE_XEN_DEFAULT in /etc/default/grub). Again, it's no more than a workaround and requires rebooting the hypervisor (which upgrading the domU to a fixed kernel does not). Many thanks in advance, Sebastian -- System Information: Debian Release: 7.9 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash

