** Changed in: ubuntu-power-systems
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1702768

Title:
  Ubuntu 17.04 KVM: stack trace generated when enabling SRIOV in power

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress

Bug description:
  ---Problem Description---
  When enabling SRIOV with kernel 4.10.0-26-generic in power will see this 
stack trace:
  [ 2084.079575] ------------[ cut here ]------------
  [ 2084.079583] WARNING: CPU: 120 PID: 734 at 
/build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78 
pnv_pci_get_npu_dev+0x40/0xb0
  [ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE) 
ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc 
ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel 
uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE) 
rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
  [ 2084.079640]  xor raid6_pq libcrc32c raid1 raid0 multipath linear 
mlx4_en(OE) ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) 
mlx4_core(OE) ipr devlink mlx_compat(OE)
  [ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G        W  OE  
 4.10.0-26-generic #30-Ubuntu
  [ 2084.079663] Workqueue: events work_for_cpu_fn
  [ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000
  [ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 
0000000000000000
  [ 2084.079668] REGS: c000000fee537700 TRAP: 0700   Tainted: G        W  OE    
(4.10.0-26-generic)
  [ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  [ 2084.079677]   CR: 42004428  XER: 20000000
  [ 2084.079678] CFAR: c00000000009d400 SOFTE: 1
                 GPR00: c00000000009d404 c000000fee537980 c00000000145d100 
0000000000000000
                 GPR04: 0000000000000000 0000000000000aa6 c000001fff700000 
0000000000049188
                 GPR08: 0000000000000007 0000000000000001 0000000000000001 
0000000000000000
                 GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48 
c000000ff70ec540
                 GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0 
0000000000000001
                 GPR20: 0000000000000000 c000000001318de8 c000000000d700e8 
0000000000000001
                 GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000 
c000000003d02098
                 GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff 
0000000000000000
  [ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0
  [ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250
  [ 2084.079705] Call Trace:
  [ 2084.079708] [c000000fee5379b0] [c00000000009d404] 
pnv_npu_try_dma_set_bypass+0x144/0x250
  [ 2084.079710] [c000000fee537a80] [c000000000096c74] 
pnv_pci_ioda_dma_set_mask+0xa4/0x150
  [ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0
  [ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0 
[mlx5_core]
  [ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] 
local_pci_probe+0x6c/0x140
  [ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60
  [ 2084.079737] [c000000fee537c90] [c0000000001061a0] 
process_one_work+0x2b0/0x5a0
  [ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650
  [ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0
  [ 2084.079746] [c000000fee537e30] [c00000000000b4e8] 
ret_from_kernel_thread+0x5c/0x74
  [ 2084.079747] Instruction dump:
  [ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000 
2fa30000
  [ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 
419e0048 7c852378
  [ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]---

  This issue was introduced by this  commit:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c3b89effc281704d5395282c800c45e453235f6
 (Subject: powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev )

  and the solution will be to add this commit:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=377aa6b0efbaa29cfeecd8b9244641217f9544ca

  which reads: "powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has 
no of_node"
   
  Requesting fix inclusion in 17.04 and probably 16.04.3.

  ---uname output---
  4.10.0-26-generic #30-Ubuntu SMP Tue Jun 27 09:29:34 UTC 2017 ppc64le ppc64le 
ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  Need a Mellanox card that supports SRIOV.  

   
  Machine Type = P8 
   
  ---Steps to Reproduce---
   Just enable SRIOV in a power system with Mellanox CX4 or CX5 will be like 
this:
  echo 1 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
   
  Stack trace output:
   [ 2084.079567] mlx5_core 0004:01:04.0: Using 64-bit DMA iommu bypass
  [ 2084.079575] ------------[ cut here ]------------
  [ 2084.079583] WARNING: CPU: 120 PID: 734 at 
/build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78 
pnv_pci_get_npu_dev+0x40/0xb0
  [ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE) 
ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc 
ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel 
uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE) 
rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
  [ 2084.079640]  xor raid6_pq libcrc32c raid1 raid0 multipath linear 
mlx4_en(OE) ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) 
mlx4_core(OE) ipr devlink mlx_compat(OE)
  [ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G        W  OE  
 4.10.0-26-generic #30-Ubuntu
  [ 2084.079663] Workqueue: events work_for_cpu_fn
  [ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000
  [ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 
0000000000000000
  [ 2084.079668] REGS: c000000fee537700 TRAP: 0700   Tainted: G        W  OE    
(4.10.0-26-generic)
  [ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  [ 2084.079677]   CR: 42004428  XER: 20000000
  [ 2084.079678] CFAR: c00000000009d400 SOFTE: 1
                 GPR00: c00000000009d404 c000000fee537980 c00000000145d100 
0000000000000000
                 GPR04: 0000000000000000 0000000000000aa6 c000001fff700000 
0000000000049188
                 GPR08: 0000000000000007 0000000000000001 0000000000000001 
0000000000000000
                 GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48 
c000000ff70ec540
                 GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0 
0000000000000001
                 GPR20: 0000000000000000 c000000001318de8 c000000000d700e8 
0000000000000001
                 GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000 
c000000003d02098
                 GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff 
0000000000000000
  [ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0
  [ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250
  [ 2084.079705] Call Trace:
  [ 2084.079708] [c000000fee5379b0] [c00000000009d404] 
pnv_npu_try_dma_set_bypass+0x144/0x250
  [ 2084.079710] [c000000fee537a80] [c000000000096c74] 
pnv_pci_ioda_dma_set_mask+0xa4/0x150
  [ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0
  [ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0 
[mlx5_core]
  [ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] 
local_pci_probe+0x6c/0x140
  [ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60
  [ 2084.079737] [c000000fee537c90] [c0000000001061a0] 
process_one_work+0x2b0/0x5a0
  [ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650
  [ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0
  [ 2084.079746] [c000000fee537e30] [c00000000000b4e8] 
ret_from_kernel_thread+0x5c/0x74
  [ 2084.079747] Instruction dump:
  [ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000 
2fa30000
  [ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 
419e0048 7c852378
  [ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]---
  [ 2084.080096] mlx5_core 0004:01:04.0: firmware version: 12.20.1010

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1702768/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to