Public bug reported: ---Problem Description--- I can not get hotplug attach to work in Ubuntu but if I try to detach a CX4 VF from a guest I am getting some issues: Like in this case: [ 474.393308] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user [ 474.393543] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 [ 474.393553] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 [ 474.393906] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) [ 474.393939] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu [ 474.400360] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff pg=1000 [ 474.400380] mlx5_core 0001:01:00.3: firmware version: 12.17.226 [ 474.401341] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass [ 474.402284] EEH: Frozen PE#6 on PHB#1 detected [ 474.402475] EEH: PE location: Slot4, PHB location: N/A [ 474.403699] EEH: This PCI device has failed 1 times in the last hour [ 474.403700] EEH: Notify device drivers to shutdown [ 474.403707] mlx5_core 0001:01:00.3: mlx5_pci_err_detected was called [ 474.403711] mlx5_core 0001:01:00.3: 0001:01:00.3:mlx5_enter_error_state:115:(pid 779): start [ 474.403870] mlx5_core 0001:01:00.3: 0001:01:00.3:mlx5_enter_error_state:120:(pid 779): end
One time I saw SSep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: firmware version: 12.17.226 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944260] Unable to handle kernel paging request for data at address 0x00000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944533] Faulting instruction address: 0xc0000000005b37e0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944592] Oops: Kernel access of bad area, sig: 11 [#1] Sep 13 09:41:32 ltc-fire1 kernel: [70437.944636] SMP NR_CPUS=2048 NUMA PowerNV Sep 13 09:41:32 ltc-fire1 kernel: [70437.944851] Modules linked in: vfio_pci irqbypass vfio_iommu_spapr_tce vfio_virqfd vfio vfio_spapr_eeh xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) mlx4_en(OE) ib_addr(OE) ib_netlink(OE) mlx4_core(OE) mlx_compat(OE) bridge stp llc joydev input_leds mac_hid ofpart at24 cmdlinepart powernv_flash ipmi_powernv nvmem_core uio_pdrv_genirq opal_prd mtd ipmi_msghandler uio ibmpowernv powernv_rng binfmt_misc dm_multipath knem(OE) ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysim gblt fb_sys_fops drm ahci devlink libahci [last unloaded: mlx4_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.946007] CPU: 40 PID: 12501 Comm: libvirtd Tainted: G OE 4.7.0unofficial #5 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946074] task: c000000ec319a200 ti: c000000ec324c000 task.ti: c000000ec324c000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946140] NIP: c0000000005b37e0 LR: c0000000005ad070 CTR: 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946208] REGS: c000000ec324f100 TRAP: 0300 Tainted: G OE (4.7.0unofficial) Sep 13 09:41:32 ltc-fire1 kernel: [70437.946286] MSR: 900000010280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 84028844 XER: 20000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR00: c0000000005d19c8 c000000ec324f380 c0000000013bef00 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR04: ffffffffffffffff 0000000000000000 ffffffffffffffff 000000000000ffff Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff f000000003803140 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR12: 0000000024048840 c00000000fb96800 c000000ec0def080 c000000ec0def000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR16: 00003fff93ef0000 0000000000000001 0000000000000000 c000000ec0def100 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR20: c000000ec0def118 0000000000000000 0000000000010000 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR24: 0000000000010000 ffffffffffff0000 0000000000000000 000000000000ffff Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR28: 0000000000000000 ffffffffffffffff 0000000000010000 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947437] NIP [c0000000005b37e0] _find_next_bit.part.0+0x10/0x90 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947496] LR [c0000000005ad070] bitmap_find_next_zero_area_off+0x70/0x110 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947553] Call Trace: Sep 13 09:41:32 ltc-fire1 kernel: [70437.947578] [c000000ec324f380] [c000000ec324f3c0] 0xc000000ec324f3c0 (unreliable) Sep 13 09:41:32 ltc-fire1 kernel: [70437.947659] [c000000ec324f3e0] [c0000000005d19c8] iommu_area_alloc+0x78/0x130 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947731] [c000000ec324f440] [c000000000041a0c] iommu_range_alloc+0x18c/0x410 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947812] [c000000ec324f500] [c000000000041f24] iommu_alloc+0x74/0x190 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947885] [c000000ec324f570] [c0000000000437b8] iommu_alloc_coherent+0x138/0x1e0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947968] [c000000ec324f620] [c000000000041538] dma_iommu_alloc_coherent+0x58/0x80 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948057] [c000000ec324f670] [d000000012e75eb0] mlx5_cmd_init+0x118/0x940 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948146] [c000000ec324f740] [d000000012e722f0] mlx5_load_one+0x198/0x16c0 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948235] [c000000ec324f8b0] [d000000012e73e5c] init_one+0x3b4/0x5e0 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948314] [c000000ec324f950] [c0000000006042dc] local_pci_probe+0x6c/0x140 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948381] [c000000ec324f9e0] [c0000000006051b8] pci_device_probe+0x178/0x200 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948462] [c000000ec324fa40] [c0000000006f6cb0] driver_probe_device+0x220/0x580 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948542] [c000000ec324fad0] [c0000000006f348c] bus_for_each_drv+0x8c/0x100 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948611] [c000000ec324fb20] [c0000000006f67b0] __device_attach+0x140/0x220 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948679] [c000000ec324fbb0] [c0000000006f4510] bus_rescan_devices_helper+0x70/0xc0 I think the key is that I am seeing this trace with 32bit dma which is maybe causing issue Sep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: firmware version: 12.17.226 ---uname output--- Linux ltc-fire1 4.7.0unofficial #5 SMP Mon Sep 5 08:53:38 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = P8 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Im using a kernel 4.7 plus the qemu is from Mike Roth qemu with DDW. To recreate just need to bring a guest with VF and then I did virsh detach-device ltc-fire1-vm2-ubuntu-16.10 ./add_cx4vf.xml --live This should be ship issue given that we now have whole sriov hotplug broken on Ubuntu 16.10 > Hi Carol, > > Just to confirm, have you hit this with the new QEMU package from Jon's PPA? > It should be based on QEMU 2.6.1, which should have some fixes related to EEH I have not tried. Will do during the day. As far as the unplug issues, I'm unable to reproduce any of the host errors noted in the initial comments using the latest 2.6.1-based package. I'm only able to reproduce the following guest errors in dmesg. Carol, can you confirm whether or not this is also the case for you? Here's what I'm seeing, but it's intermittent. Sometimes I see up to 7 successful unplugs before reproducing, but once I do reproduce from what I can tell the trace is always this: [ 63.314271] Unable to handle kernel paging request for data at address 0x00000000 [ 63.327986] Faulting instruction address: 0xd000000000ea5cbc [ 63.328388] Oops: Kernel access of bad area, sig: 11 [#1] [ 63.328741] SMP NR_CPUS=2048 NUMA pSeries [ 63.329085] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) ib_ipoib(OE) ib_cm(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) sg ghash_generic gf128mul vmx_crypto ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom virtio_net virtio_blk ibmvscsi scsi_transport_srp mlx5_core(OE) crc32c_vpmsum ptp pps_core mlx_compat(OE) virtio_pci virtio_ring virtio [ 63.332759] CPU: 14 PID: 5234 Comm: drmgr Tainted: G W OE 4.8.0-15-generic #16-Ubuntu [ 63.332911] task: c000000003bb9780 task.stack: c000000003cd8000 [ 63.333025] NIP: d000000000ea5cbc LR: d000000000ea5cb4 CTR: c0000000002ddc10 [ 63.333151] REGS: c000000003cdb6d0 TRAP: 0300 Tainted: G W OE (4.8.0-15-generic) [ 63.333305] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22000228 XER: 20000000 [ 63.333716] CFAR: c000000000142f3c DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 GPR00: d000000000ea5cb4 c000000003cdb950 d000000000f288e8 0000000000000000 GPR04: c0000000b3716b40 d000000000ea5b48 c0000000010f6a38 c0000000010d7b00 GPR08: c000000000fa7b00 0000000000000000 0000000000000000 d000000000eeff70 GPR12: c0000000002ddc10 c00000000fb87e00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: c000000004ff5b58 c000000004ff5b20 0000000000000001 c000000004fc04e8 GPR28: c000000004fc0000 0000000000000000 c000000004fc0290 0000000000000000 [ 63.335484] NIP [d000000000ea5cbc] destroy_msg_cache+0x84/0x130 [mlx5_core] [ 63.335609] LR [d000000000ea5cb4] destroy_msg_cache+0x7c/0x130 [mlx5_core] [ 63.335713] Call Trace: [ 63.335805] [c000000003cdb950] [d000000000ea5cb4] destroy_msg_cache+0x7c/0x130 [mlx5_core] (unreliable) [ 63.336003] [c000000003cdb9a0] [d000000000ea66e8] mlx5_cmd_cleanup+0x40/0x80 [mlx5_core] [ 63.336206] [c000000003cdb9d0] [d000000000ea1a70] mlx5_unload_one+0x248/0x420 [mlx5_core] [ 63.336383] [c000000003cdba90] [d000000000ea2028] remove_one+0xb0/0x170 [mlx5_core] [ 63.336539] [c000000003cdbb20] [c0000000005660e0] pci_device_remove+0x70/0x110 [ 63.336688] [c000000003cdbb60] [c000000000610e80] __device_release_driver+0xc0/0x1a0 [ 63.336828] [c000000003cdbb90] [c000000000610fa0] device_release_driver+0x40/0x70 [ 63.337015] [c000000003cdbbc0] [c00000000055a730] pci_stop_bus_device+0xe0/0x100 [ 63.337156] [c000000003cdbc00] [c00000000055a914] pci_stop_and_remove_bus_device_locked+0x34/0x60 [ 63.337321] [c000000003cdbc30] [c0000000005692b0] remove_store+0xb0/0xd0 [ 63.337458] [c000000003cdbc80] [c000000000608ef8] dev_attr_store+0x68/0xa0 [ 63.337571] [c000000003cdbcc0] [c0000000003cb3f0] sysfs_kf_write+0x80/0xb0 [ 63.337692] [c000000003cdbd00] [c0000000003ca25c] kernfs_fop_write+0x17c/0x250 [ 63.337825] [c000000003cdbd50] [c00000000030fc3c] __vfs_write+0x6c/0xe0 [ 63.337950] [c000000003cdbd90] [c000000000311144] vfs_write+0xd4/0x240 [ 63.338094] [c000000003cdbde0] [c000000000312e5c] SyS_write+0x6c/0x110 [ 63.338233] [c000000003cdbe30] [c0000000000095e0] system_call+0x38/0x108 [ 63.338364] Instruction dump: [ 63.338429] 60420000 480498b9 e8410018 813e0060 7fa4eb78 7f83e378 7ffdfb78 3929ffff [ 63.338697] 913e0060 4bfffe21 7fbff040 7fe3fb78 <ebff0000> 409effd0 3bde0078 7fbed840 [ 63.338925] ---[ end trace a4fbfe789eac3ffc ]--- [ 63.344490] I'm using QEMU directly, and see the above after the device_del. I see no other errors in guest/host: root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2 unbinding 0001:01:00.2 via /sys/bus/pci/devices/0001:01:00.2/driver/unbind binding 0001:01:00.2 echo 0x15b3 0x1014 >/sys/bus/pci/drivers/vfio-pci/new_id root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2 qemu-system-ppc64 --smp sockets=1,cores=2,threads=8 --enable-kvm -M pseries -m 3G,slots=32,maxmem=8G -device virtio-blk- pci,id=disk1,drive=drive1 -drive file=ubuntu16.10.base.img.4,aio=threads,cache=none,if=none,id=drive1 -device spapr-vscsi,id=scsi0,reg=0x2000 -monitor unix:/tmp/mdroth- vm4-hmp.sock,server,nowait -qmp unix:/tmp/mdroth- vm4-qmp.sock,server,nowait -device virtio-net- pci,mac=52:54:00:12:34:01,netdev=net1,id=nic1 -netdev user,hostfwd=tcp:0.0.0.0:2224-:22,id=net1 -device vfio- pci,host=0001:01:00.2,id=vfio0 -L /root/mdroth-qemu/pc-bios -boot c -vga none -nographic root@ltc-fire1:~# rlwrap socat stdio unix-connect:/tmp/mdroth-vm4-hmp.sock QEMU 2.6.1 monitor - type 'help' for more information (qemu) device_del vfio0 I'm planning to do more debugging of the guest driver's mlx5_cmd_cleanup -> destroy_msg_cache cleanup path which seems to be triggering the error, but want to confirm I'm looking at the same issue that Carol is currently tracking. > Sometimes I see that one, but if I do not see that one then I see issues at > host which could be the stack trace that I put or an EEH. But the problems > at host the EEH or the stack trace I always see that the enable device is > happening before the 64bit dma is set back again to the card. You mean the enable device for the card in the guest after hotplug? or is this in the unplug path? Here's what I see when doing unplug: <bind 0001:01: 0.2 to vfio-pci> [ 8056.733150] pci 0001:01: 0.2: [PE# 005] Disabling 64-bit DMA bypass [ 8056.733157] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 8056.733927] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff pg=1000 [ 8057.039487] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 8057.073152] vfio-pci 0001:01:00.2: enabling device (0000 -> 0002) <start guest> [ 7805.107856] KVM guest htab at c000000f2b000000 (order 26), LPID 1 <guest immediately sets up default 32-bit window> [ 7805.216462] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..3fffffff pg=1000 <guest kernel boots, enables 64-bit window> [ 7815.386476] pci 0001:01: 0.2: [PE# 005] Setting up window#1 800000000000000..8000001ffffffff pg=10000 <unplug 0001:01: 0.2 (if guest kernel doesn't crash, if it crashes we never get past here and hang in guest> [ 7906.714683] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 7906.714690] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1 <host re-enables 32-bit window and re-enables 64-bit DMA bypass> [ 7906.714722] window_size 0x80000000 [ 7906.715205] Calling iommu_init_table [ 7906.721714] Calling pnv_pci_ioda2_set_window [ 7906.721851] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff pg=1000 [ 7906.722334] pci 0001:01: 0.2: [PE# 005] Enabling 64-bit DMA bypass I'm unable to reproduce any host crashes in either case. If you're hitting an issue where maybe we're rebinding the vf back to mlx5_core in the host before the window re-init is complete then maybe that might explain what you're seeing. If that's the case it might be a race somewhere in libvirt when it "un-stubs"/re-assigns the device back to default mlx5_core driver. I recall some previous libvirt BZs around avoiding this sort of scenario but i'm not sure if they are still pending or not. I'll do some more digging to see if I can reproduce without libvirt later today (perhaps by attempting to re-bind to mlx5_core immediately after unplug...in my scenario i think no such re-binding occurs so maybe that's why i don't see the host crashes (or other guest crashes...though it's also strange to me even in this race scenario that the guest would crash do to host race since drmgr fixes are already in place to make sure guest kernel has released the device before unplug completes in host/libvirt. maybe i just haven't seen the guest crashes yet and those are also a separate issue from the host ones)) I was consulting with Mike about this and the last experiment that wanted to try was to put a sleep in vfio so in that way qemu can do all the dma setup before libvirt does the enable device at the KVM host. I put a sleep in drivers/vfio/vfio.c in function vfio_del_group_dev and that kind of helps in my case. Not sure if that will be the solution but it works for my case. diff -Naurp vfio.c.save vfio.c --- vfio.c.save 2016-09-28 10:11:30.047333880 -0500 +++ vfio.c 2016-09-30 13:57:15.376564520 -0500 @@ -34,6 +34,7 @@ #include <linux/uaccess.h> #include <linux/vfio.h> #include <linux/wait.h> +#include <linux/delay.h> #define DRIVER_VERSION "0.3" #define DRIVER_AUTHOR "Alex Williamson <alex.william...@redhat.com>" @@ -910,6 +911,7 @@ void *vfio_del_group_dev(struct device * vfio_device_put(device); + msleep(2000); /* * If the device is still present in the group after the above * 'put', then it is in use and we need to request it from the Then in dmesg I see [ 1122.568409] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 [ 1122.568415] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 [ 1122.575117] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff pg=1000 [ 1122.575121] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass [ 1124.584345] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) [ 1124.584379] mlx5_core 0001:01:00.3: Using 64-bit DMA iommu bypass [ 1124.584445] mlx5_core 0001:01:00.3: firmware version: 12.17.1010 [ 1125.062922] mlx5_core 0001:01:00.3: Assigned random MAC address f2:c8:5a:8b:2a:b2 [ 1125.108710] mlx5_core 0001:01:00.3 enP1p1s0f3: renamed from eth0 Mirroring bug to Launchpad for Canonical's awareness... ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-146212 severity-critical targetmilestone-inin1610 ** Tags added: architecture-ppc64le bugnameltc-146212 severity-critical targetmilestone-inin1610 ** Changed in: ubuntu Assignee: (unassigned) => Taco Screen team (taco-screen-team) ** Package changed: ubuntu => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1630304 Title: Ubuntu 16.10 KVM: Issue doing hotplug detach to SRIOV VF Status in linux package in Ubuntu: New Bug description: ---Problem Description--- I can not get hotplug attach to work in Ubuntu but if I try to detach a CX4 VF from a guest I am getting some issues: Like in this case: [ 474.393308] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user [ 474.393543] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 [ 474.393553] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 [ 474.393906] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) [ 474.393939] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu [ 474.400360] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff pg=1000 [ 474.400380] mlx5_core 0001:01:00.3: firmware version: 12.17.226 [ 474.401341] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass [ 474.402284] EEH: Frozen PE#6 on PHB#1 detected [ 474.402475] EEH: PE location: Slot4, PHB location: N/A [ 474.403699] EEH: This PCI device has failed 1 times in the last hour [ 474.403700] EEH: Notify device drivers to shutdown [ 474.403707] mlx5_core 0001:01:00.3: mlx5_pci_err_detected was called [ 474.403711] mlx5_core 0001:01:00.3: 0001:01:00.3:mlx5_enter_error_state:115:(pid 779): start [ 474.403870] mlx5_core 0001:01:00.3: 0001:01:00.3:mlx5_enter_error_state:120:(pid 779): end One time I saw SSep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: firmware version: 12.17.226 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944260] Unable to handle kernel paging request for data at address 0x00000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944533] Faulting instruction address: 0xc0000000005b37e0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944592] Oops: Kernel access of bad area, sig: 11 [#1] Sep 13 09:41:32 ltc-fire1 kernel: [70437.944636] SMP NR_CPUS=2048 NUMA PowerNV Sep 13 09:41:32 ltc-fire1 kernel: [70437.944851] Modules linked in: vfio_pci irqbypass vfio_iommu_spapr_tce vfio_virqfd vfio vfio_spapr_eeh xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) mlx4_en(OE) ib_addr(OE) ib_netlink(OE) mlx4_core(OE) mlx_compat(OE) bridge stp llc joydev input_leds mac_hid ofpart at24 cmdlinepart powernv_flash ipmi_powernv nvmem_core uio_pdrv_genirq opal_prd mtd ipmi_msghandler uio ibmpowernv powernv_rng binfmt_misc dm_multipath knem(OE) ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sys imgblt fb_sys_fops drm ahci devlink libahci [last unloaded: mlx4_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.946007] CPU: 40 PID: 12501 Comm: libvirtd Tainted: G OE 4.7.0unofficial #5 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946074] task: c000000ec319a200 ti: c000000ec324c000 task.ti: c000000ec324c000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946140] NIP: c0000000005b37e0 LR: c0000000005ad070 CTR: 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946208] REGS: c000000ec324f100 TRAP: 0300 Tainted: G OE (4.7.0unofficial) Sep 13 09:41:32 ltc-fire1 kernel: [70437.946286] MSR: 900000010280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 84028844 XER: 20000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR00: c0000000005d19c8 c000000ec324f380 c0000000013bef00 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR04: ffffffffffffffff 0000000000000000 ffffffffffffffff 000000000000ffff Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff f000000003803140 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR12: 0000000024048840 c00000000fb96800 c000000ec0def080 c000000ec0def000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR16: 00003fff93ef0000 0000000000000001 0000000000000000 c000000ec0def100 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR20: c000000ec0def118 0000000000000000 0000000000010000 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR24: 0000000000010000 ffffffffffff0000 0000000000000000 000000000000ffff Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR28: 0000000000000000 ffffffffffffffff 0000000000010000 0000000000000000 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947437] NIP [c0000000005b37e0] _find_next_bit.part.0+0x10/0x90 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947496] LR [c0000000005ad070] bitmap_find_next_zero_area_off+0x70/0x110 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947553] Call Trace: Sep 13 09:41:32 ltc-fire1 kernel: [70437.947578] [c000000ec324f380] [c000000ec324f3c0] 0xc000000ec324f3c0 (unreliable) Sep 13 09:41:32 ltc-fire1 kernel: [70437.947659] [c000000ec324f3e0] [c0000000005d19c8] iommu_area_alloc+0x78/0x130 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947731] [c000000ec324f440] [c000000000041a0c] iommu_range_alloc+0x18c/0x410 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947812] [c000000ec324f500] [c000000000041f24] iommu_alloc+0x74/0x190 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947885] [c000000ec324f570] [c0000000000437b8] iommu_alloc_coherent+0x138/0x1e0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.947968] [c000000ec324f620] [c000000000041538] dma_iommu_alloc_coherent+0x58/0x80 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948057] [c000000ec324f670] [d000000012e75eb0] mlx5_cmd_init+0x118/0x940 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948146] [c000000ec324f740] [d000000012e722f0] mlx5_load_one+0x198/0x16c0 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948235] [c000000ec324f8b0] [d000000012e73e5c] init_one+0x3b4/0x5e0 [mlx5_core] Sep 13 09:41:32 ltc-fire1 kernel: [70437.948314] [c000000ec324f950] [c0000000006042dc] local_pci_probe+0x6c/0x140 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948381] [c000000ec324f9e0] [c0000000006051b8] pci_device_probe+0x178/0x200 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948462] [c000000ec324fa40] [c0000000006f6cb0] driver_probe_device+0x220/0x580 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948542] [c000000ec324fad0] [c0000000006f348c] bus_for_each_drv+0x8c/0x100 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948611] [c000000ec324fb20] [c0000000006f67b0] __device_attach+0x140/0x220 Sep 13 09:41:32 ltc-fire1 kernel: [70437.948679] [c000000ec324fbb0] [c0000000006f4510] bus_rescan_devices_helper+0x70/0xc0 I think the key is that I am seeing this trace with 32bit dma which is maybe causing issue Sep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No device request channel registered, blocked until released by user Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: firmware version: 12.17.226 ---uname output--- Linux ltc-fire1 4.7.0unofficial #5 SMP Mon Sep 5 08:53:38 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = P8 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Im using a kernel 4.7 plus the qemu is from Mike Roth qemu with DDW. To recreate just need to bring a guest with VF and then I did virsh detach-device ltc-fire1-vm2-ubuntu-16.10 ./add_cx4vf.xml --live This should be ship issue given that we now have whole sriov hotplug broken on Ubuntu 16.10 > Hi Carol, > > Just to confirm, have you hit this with the new QEMU package from Jon's PPA? > It should be based on QEMU 2.6.1, which should have some fixes related to EEH I have not tried. Will do during the day. As far as the unplug issues, I'm unable to reproduce any of the host errors noted in the initial comments using the latest 2.6.1-based package. I'm only able to reproduce the following guest errors in dmesg. Carol, can you confirm whether or not this is also the case for you? Here's what I'm seeing, but it's intermittent. Sometimes I see up to 7 successful unplugs before reproducing, but once I do reproduce from what I can tell the trace is always this: [ 63.314271] Unable to handle kernel paging request for data at address 0x00000000 [ 63.327986] Faulting instruction address: 0xd000000000ea5cbc [ 63.328388] Oops: Kernel access of bad area, sig: 11 [#1] [ 63.328741] SMP NR_CPUS=2048 NUMA pSeries [ 63.329085] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) ib_ipoib(OE) ib_cm(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) sg ghash_generic gf128mul vmx_crypto ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom virtio_net virtio_blk ibmvscsi scsi_transport_srp mlx5_core(OE) crc32c_vpmsum ptp pps_core mlx_compat(OE) virtio_pci virtio_ring virtio [ 63.332759] CPU: 14 PID: 5234 Comm: drmgr Tainted: G W OE 4.8.0-15-generic #16-Ubuntu [ 63.332911] task: c000000003bb9780 task.stack: c000000003cd8000 [ 63.333025] NIP: d000000000ea5cbc LR: d000000000ea5cb4 CTR: c0000000002ddc10 [ 63.333151] REGS: c000000003cdb6d0 TRAP: 0300 Tainted: G W OE (4.8.0-15-generic) [ 63.333305] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22000228 XER: 20000000 [ 63.333716] CFAR: c000000000142f3c DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 GPR00: d000000000ea5cb4 c000000003cdb950 d000000000f288e8 0000000000000000 GPR04: c0000000b3716b40 d000000000ea5b48 c0000000010f6a38 c0000000010d7b00 GPR08: c000000000fa7b00 0000000000000000 0000000000000000 d000000000eeff70 GPR12: c0000000002ddc10 c00000000fb87e00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: c000000004ff5b58 c000000004ff5b20 0000000000000001 c000000004fc04e8 GPR28: c000000004fc0000 0000000000000000 c000000004fc0290 0000000000000000 [ 63.335484] NIP [d000000000ea5cbc] destroy_msg_cache+0x84/0x130 [mlx5_core] [ 63.335609] LR [d000000000ea5cb4] destroy_msg_cache+0x7c/0x130 [mlx5_core] [ 63.335713] Call Trace: [ 63.335805] [c000000003cdb950] [d000000000ea5cb4] destroy_msg_cache+0x7c/0x130 [mlx5_core] (unreliable) [ 63.336003] [c000000003cdb9a0] [d000000000ea66e8] mlx5_cmd_cleanup+0x40/0x80 [mlx5_core] [ 63.336206] [c000000003cdb9d0] [d000000000ea1a70] mlx5_unload_one+0x248/0x420 [mlx5_core] [ 63.336383] [c000000003cdba90] [d000000000ea2028] remove_one+0xb0/0x170 [mlx5_core] [ 63.336539] [c000000003cdbb20] [c0000000005660e0] pci_device_remove+0x70/0x110 [ 63.336688] [c000000003cdbb60] [c000000000610e80] __device_release_driver+0xc0/0x1a0 [ 63.336828] [c000000003cdbb90] [c000000000610fa0] device_release_driver+0x40/0x70 [ 63.337015] [c000000003cdbbc0] [c00000000055a730] pci_stop_bus_device+0xe0/0x100 [ 63.337156] [c000000003cdbc00] [c00000000055a914] pci_stop_and_remove_bus_device_locked+0x34/0x60 [ 63.337321] [c000000003cdbc30] [c0000000005692b0] remove_store+0xb0/0xd0 [ 63.337458] [c000000003cdbc80] [c000000000608ef8] dev_attr_store+0x68/0xa0 [ 63.337571] [c000000003cdbcc0] [c0000000003cb3f0] sysfs_kf_write+0x80/0xb0 [ 63.337692] [c000000003cdbd00] [c0000000003ca25c] kernfs_fop_write+0x17c/0x250 [ 63.337825] [c000000003cdbd50] [c00000000030fc3c] __vfs_write+0x6c/0xe0 [ 63.337950] [c000000003cdbd90] [c000000000311144] vfs_write+0xd4/0x240 [ 63.338094] [c000000003cdbde0] [c000000000312e5c] SyS_write+0x6c/0x110 [ 63.338233] [c000000003cdbe30] [c0000000000095e0] system_call+0x38/0x108 [ 63.338364] Instruction dump: [ 63.338429] 60420000 480498b9 e8410018 813e0060 7fa4eb78 7f83e378 7ffdfb78 3929ffff [ 63.338697] 913e0060 4bfffe21 7fbff040 7fe3fb78 <ebff0000> 409effd0 3bde0078 7fbed840 [ 63.338925] ---[ end trace a4fbfe789eac3ffc ]--- [ 63.344490] I'm using QEMU directly, and see the above after the device_del. I see no other errors in guest/host: root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2 unbinding 0001:01:00.2 via /sys/bus/pci/devices/0001:01:00.2/driver/unbind binding 0001:01:00.2 echo 0x15b3 0x1014 >/sys/bus/pci/drivers/vfio-pci/new_id root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2 qemu-system-ppc64 --smp sockets=1,cores=2,threads=8 --enable-kvm -M pseries -m 3G,slots=32,maxmem=8G -device virtio-blk- pci,id=disk1,drive=drive1 -drive file=ubuntu16.10.base.img.4,aio=threads,cache=none,if=none,id=drive1 -device spapr-vscsi,id=scsi0,reg=0x2000 -monitor unix:/tmp/mdroth- vm4-hmp.sock,server,nowait -qmp unix:/tmp/mdroth- vm4-qmp.sock,server,nowait -device virtio-net- pci,mac=52:54:00:12:34:01,netdev=net1,id=nic1 -netdev user,hostfwd=tcp:0.0.0.0:2224-:22,id=net1 -device vfio- pci,host=0001:01:00.2,id=vfio0 -L /root/mdroth-qemu/pc-bios -boot c -vga none -nographic root@ltc-fire1:~# rlwrap socat stdio unix-connect:/tmp/mdroth-vm4-hmp.sock QEMU 2.6.1 monitor - type 'help' for more information (qemu) device_del vfio0 I'm planning to do more debugging of the guest driver's mlx5_cmd_cleanup -> destroy_msg_cache cleanup path which seems to be triggering the error, but want to confirm I'm looking at the same issue that Carol is currently tracking. > Sometimes I see that one, but if I do not see that one then I see issues at > host which could be the stack trace that I put or an EEH. But the problems > at host the EEH or the stack trace I always see that the enable device is > happening before the 64bit dma is set back again to the card. You mean the enable device for the card in the guest after hotplug? or is this in the unplug path? Here's what I see when doing unplug: <bind 0001:01: 0.2 to vfio-pci> [ 8056.733150] pci 0001:01: 0.2: [PE# 005] Disabling 64-bit DMA bypass [ 8056.733157] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 8056.733927] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff pg=1000 [ 8057.039487] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 8057.073152] vfio-pci 0001:01:00.2: enabling device (0000 -> 0002) <start guest> [ 7805.107856] KVM guest htab at c000000f2b000000 (order 26), LPID 1 <guest immediately sets up default 32-bit window> [ 7805.216462] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..3fffffff pg=1000 <guest kernel boots, enables 64-bit window> [ 7815.386476] pci 0001:01: 0.2: [PE# 005] Setting up window#1 800000000000000..8000001ffffffff pg=10000 <unplug 0001:01: 0.2 (if guest kernel doesn't crash, if it crashes we never get past here and hang in guest> [ 7906.714683] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0 [ 7906.714690] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1 <host re-enables 32-bit window and re-enables 64-bit DMA bypass> [ 7906.714722] window_size 0x80000000 [ 7906.715205] Calling iommu_init_table [ 7906.721714] Calling pnv_pci_ioda2_set_window [ 7906.721851] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff pg=1000 [ 7906.722334] pci 0001:01: 0.2: [PE# 005] Enabling 64-bit DMA bypass I'm unable to reproduce any host crashes in either case. If you're hitting an issue where maybe we're rebinding the vf back to mlx5_core in the host before the window re-init is complete then maybe that might explain what you're seeing. If that's the case it might be a race somewhere in libvirt when it "un-stubs"/re-assigns the device back to default mlx5_core driver. I recall some previous libvirt BZs around avoiding this sort of scenario but i'm not sure if they are still pending or not. I'll do some more digging to see if I can reproduce without libvirt later today (perhaps by attempting to re-bind to mlx5_core immediately after unplug...in my scenario i think no such re-binding occurs so maybe that's why i don't see the host crashes (or other guest crashes...though it's also strange to me even in this race scenario that the guest would crash do to host race since drmgr fixes are already in place to make sure guest kernel has released the device before unplug completes in host/libvirt. maybe i just haven't seen the guest crashes yet and those are also a separate issue from the host ones)) I was consulting with Mike about this and the last experiment that wanted to try was to put a sleep in vfio so in that way qemu can do all the dma setup before libvirt does the enable device at the KVM host. I put a sleep in drivers/vfio/vfio.c in function vfio_del_group_dev and that kind of helps in my case. Not sure if that will be the solution but it works for my case. diff -Naurp vfio.c.save vfio.c --- vfio.c.save 2016-09-28 10:11:30.047333880 -0500 +++ vfio.c 2016-09-30 13:57:15.376564520 -0500 @@ -34,6 +34,7 @@ #include <linux/uaccess.h> #include <linux/vfio.h> #include <linux/wait.h> +#include <linux/delay.h> #define DRIVER_VERSION "0.3" #define DRIVER_AUTHOR "Alex Williamson <alex.william...@redhat.com>" @@ -910,6 +911,7 @@ void *vfio_del_group_dev(struct device * vfio_device_put(device); + msleep(2000); /* * If the device is still present in the group after the above * 'put', then it is in use and we need to request it from the Then in dmesg I see [ 1122.568409] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0 [ 1122.568415] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1 [ 1122.575117] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff pg=1000 [ 1122.575121] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass [ 1124.584345] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002) [ 1124.584379] mlx5_core 0001:01:00.3: Using 64-bit DMA iommu bypass [ 1124.584445] mlx5_core 0001:01:00.3: firmware version: 12.17.1010 [ 1125.062922] mlx5_core 0001:01:00.3: Assigned random MAC address f2:c8:5a:8b:2a:b2 [ 1125.108710] mlx5_core 0001:01:00.3 enP1p1s0f3: renamed from eth0 Mirroring bug to Launchpad for Canonical's awareness... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630304/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp