The proposed kernel in Zesty boots fine on the AW SDP at Canonical, and
does not report any NMI softlockups.

ubuntu@ubuntu:~$ uname -a 
Linux ubuntu 4.10.0-23-generic #25-Ubuntu SMP Fri Jun 9 09:36:27 UTC 2017 
aarch64 aarch64 aarch64 GNU/Linux

ubuntu@ubuntu:~$ apt-cache policy linux-image-4.10.0-23-generic
linux-image-4.10.0-23-generic:
  Installed: 4.10.0-23.25
  Candidate: 4.10.0-23.25
  Version table:
 *** 4.10.0-23.25 500
        500 http://us.ports.ubuntu.com/ubuntu-ports zesty-proposed/main arm64 
Packages
        100 /var/lib/dpkg/status


** Tags removed: verification-needed-zesty
** Tags added: verification-done-zesty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1680549

Title:
  [Zesty] QDF2400 ARM64 server - NMI watchdog: BUG: soft lockup - CPU#8
  stuck for 22s!

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Zesty:
  Fix Committed

Bug description:
  [IMPACT]
  Booting Zesty 4.10 kernel on Qualcomm Centriq 2400 ARM64 servers causes soft 
lockups on multiple CPUs.

  [  104.205397] Modules linked in: nls_iso8859_1 cdc_acm bridge stp llc
  ipmi_ssif ipmi_devintf ipmi_msghandler shpchp hdma hdma_mgmt i2c_qup
  cppc_cpufreq ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp
  libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10
  raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
  raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage at803x
  aes_ce_blk aes_ce_cipher crc32_ce crct10dif_ce ghash_ce sha2_ce
  sha1_ce mlx5_core devlink ptp pps_core ahci_platform libahci_platform
  libahci qcom_emac sdhci_acpi sdhci xhci_plat_hcd pinctrl_qdf2xxx fjes
  aes_neon_blk crypto_simd cryptd

  [  104.205442] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G             L  
4.10.0-16-generic #18ubuntuRC03+<redacted>.1
  [  104.205443] Hardware name: Qualcomm QDF2400 DP/ABW|SYS|CVR,1DPC|V3         
  , BIOS XBL.DF.2.0.R3-00153 QDF2400_REL CRM 02/ 8/2017
  [  104.205444] task: ffff9fa30ed49c00 task.stack: ffff9fa30ed5c000
  [  104.205447] PC is at _raw_spin_unlock_irqrestore+0x2c/0x38
  [  104.205450] LR is at alloc_iova+0x1cc/0x2a0
  [  104.205451] pc : [<ffff3f0624a00974>] lr : [<ffff3f0624682e8c>] pstate: 
20400145
  [  104.205452] sp : ffff9fa31fbecc00
  [  104.205453] x29: ffff9fa31fbecc00 x28: 0000000ffffefe46
  [  104.205455] x27: 0000000000000040 x26: 0000000fffffffff
  [  104.205458] x25: ffff3f06251f8000 x24: 0000000000000001
  [  104.205460] x23: ffff9fa30da06008 x22: 0000000000000000
  [  104.205462] x21: ffff9fa2e2af8740 x20: ffff9fa30da06008
  [  104.205464] x19: 0000000000000140 x18: 00000000a5e112c1
  [  104.205466] x17: 000000004d48a1ed x16: 00000000b0f9c455
  [  104.205468] x15: 00000000aa4269e9 x14: 0000000085094ac4
  [  104.205471] x13: 000000009b3b00da x12: 000000008aae8d9c
  [  104.205473] x11: ffff9fa31fbf90b0 x10: ffff3f0624eb70eb
  [  104.205475] x9 : 0000000000000000 x8 : 0000000000000004
  [  104.205477] x7 : ffff9fa2e2875400 x6 : 0000000000000000
  [  104.205479] x5 : ffff9fa2e2875401 x4 : 0000000000000000
  [  104.205481] x3 : ffff9fa2e2a27b00 x2 : ffff9fa2e2875400
  [  104.205483] x1 : 0000000000000140 x0 : 000000000000f7c2

  [  111.198062] INFO: rcu_sched self-detected stall on CPU
  [  111.198971] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  111.198977]        31-...: (1 GPs behind) idle=1b3/2/0 softirq=432/433 
fqs=6805
  [  111.198979]        32-...: (1 GPs behind) idle=291/1/0 softirq=469/470 
fqs=6805
  [  111.198980]        (detected by 2, t=15002 jiffies, g=143, c=142, q=6968)
  [  111.199000] Task dump for CPU 31:
  [  111.199002] swapper/31      R  running task        0     0      1 
0x00000002
  [  111.199006] Call trace:
  [  111.199012] [<ffff3f0624086250>] __switch_to+0x98/0xb0
  [  111.199014] [<0000000b7160dcd2>] 0xb7160dcd2
  [  111.199015] Task dump for CPU 32:
  [  111.199016] swapper/32      R  running task        0     0      1 
0x00000002
  [  111.199018] Call trace:
  [  111.199019] [<ffff3f0624086250>] __switch_to+0x98/0xb0
  [  111.199020] [<0000000bcde2fa4e>] 0xbcde2fa4e
  [  111.227703]        31-...: (1 GPs behind) idle=1b3/2/0 softirq=432/433 
fqs=6809
  [  111.234558]         (t=15010 jiffies g=143 c=142 q=6968)
  [  111.239334] Task dump for CPU 31:
  [  111.239335] swapper/31      R  running task        0     0      1 
0x00000002
  [  111.239338] Call trace:
  [  111.239344] [<ffff3f062408b030>] dump_backtrace+0x0/0x2b0
  [  111.239346] [<ffff3f062408b304>] show_stack+0x24/0x30
  [  111.239350] [<ffff3f0624103f80>] sched_show_task+0x128/0x178
  [  111.239352] [<ffff3f0624106d68>] dump_cpu_task+0x48/0x58
  [  111.239356] [<ffff3f0624200d38>] rcu_dump_cpu_stacks+0xbc/0xf0
  [  111.239359] [<ffff3f06241409e8>] rcu_check_callbacks+0x7a8/0x968
  [  111.239362] [<ffff3f0624146e1c>] update_process_times+0x34/0x60
  [  111.239365] [<ffff3f0624159118>] tick_sched_handle.isra.7+0x38/0x70
  [  111.239367] [<ffff3f062415919c>] tick_sched_timer+0x4c/0x98
  [  111.239369] [<ffff3f06241477a0>] __hrtimer_run_queues+0xe8/0x2e8
  [  111.239371] [<ffff3f0624148340>] hrtimer_interrupt+0xa8/0x228
  [  111.239376] [<ffff3f062487c02c>] arch_timer_handler_phys+0x3c/0x50
  [  111.239379] [<ffff3f0624133964>] handle_percpu_devid_irq+0x8c/0x230
  [  111.239383] [<ffff3f062412d8b4>] generic_handle_irq+0x34/0x50
  [  111.239385] [<ffff3f062412dfe0>] __handle_domain_irq+0x68/0xc0
  [  111.239386] [<ffff3f06240818b4>] gic_handle_irq+0xc4/0x170
  [  111.239388] Exception stack(0xffff9fa31fa7caa0 to 0xffff9fa31fa7cbd0)
  [  111.239390] caa0: ffff9fa31fa7cad0 0001000000000000 ffff9fa31fa7cc00 
ffff3f0624a00974
  [  111.239392] cac0: 0000000020400145 0000000000000001 00000000000000fe 
0000000000000140
  [  111.239394] cae0: ffff9fa2e10b1c00 ffff9fa2e11c8800 0000000000000000 
ffff9fa2e10b1c01
  [  111.239396] cb00: 0000000000000000 ffff9fa2e10b1c00 ffff9fa3035ee681 
0000000000000000
  [  111.239397] cb20: ffff7e7e8b8533e0 ffff9fa31fa890b0 0000000000000000 
000000009b3b00da
  [  111.239399] cb40: 0000000085094ac4 00000000aa4269e9 0000000046e68d43 
000000004d48a1ed
  [  111.239401] cb60: 00000000a5e112c1 0000000000000140 ffff9fa30da06008 
ffff9fa2e1073ac0
  [  111.239403] cb80: 0000000000000000 ffff9fa30da06008 0000000000000001 
ffff3f06251f8000
  [  111.239404] cba0: 0000000fffffffff 0000000000000040 0000000ffffef50a 
ffff9fa31fa7cc00
  [  111.239406] cbc0: ffff3f0624682e8c ffff9fa31fa7cc00
  [  111.239407] [<ffff3f062408315c>] el1_irq+0xdc/0x180
  [  111.239411] [<ffff3f0624682e8c>] alloc_iova+0x1cc/0x2a0
  [  111.239413] [<ffff3f0624680488>] __alloc_iova+0x78/0x88
  [  111.239414] [<ffff3f0624680528>] __iommu_dma_map+0x90/0x128
  [  111.239416] [<ffff3f0624680e30>] iommu_dma_map_page+0x60/0x78
  [  111.239420] [<ffff3f062409c8fc>] __iommu_map_page+0x5c/0xd0
  [  111.239565] [<ffff3f06201046d0>] mlx5e_alloc_rx_wqe+0x118/0x318 [mlx5_core]
  [  111.239607] [<ffff3f06201050e8>] mlx5e_post_rx_wqes+0xa0/0x110 [mlx5_core]
  [  111.239647] [<ffff3f06201075dc>] mlx5e_napi_poll+0x22c/0x518 [mlx5_core]
  [  111.239650] [<ffff3f06248cdda0>] net_rx_action+0x2e8/0x3f0
  [  111.239652] [<ffff3f0624081aa8>] __do_softirq+0x148/0x31c
  [  111.239656] [<ffff3f06240d3d68>] irq_exit+0xd0/0x120
  [  111.239658] [<ffff3f062412dfe4>] __handle_domain_irq+0x6c/0xc0
  [  111.239660] [<ffff3f06240818b4>] gic_handle_irq+0xc4/0x170
  [  111.239661] Exception stack(0xffff9fa30ecffd80 to 0xffff9fa30ecffeb0)
  [  111.239663] fd80: ffff9fa31fa85200 0000609cfabd2000 0000000006400000 
0000000000000004
  [  111.239665] fda0: 0000000000003296 0000000000000015 000000005c57e302 
0000000000000000
  [  111.239667] fdc0: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
  [  111.239668] fde0: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
  [  111.239670] fe00: 0000000000000000 0000000000000000 00000000ffffffff 
0000000b7179114e
  [  111.239672] fe20: ffff9fa3041c8000 0000000000000003 ffff3f0625292eb8 
0000000000000000
  [  111.239673] fe40: 0000000b7160dcd2 0000000000000003 0000000000000000 
0000000000000000
  [  111.239675] fe60: 0000000000000000 ffff9fa30ecffeb0 ffff3f06248549bc 
ffff9fa30ecffeb0
  [  111.239677] fe80: ffff3f06248549c4 0000000060400145 ffff9fa30ecffeb0 
ffff3f06248549bc
  [  111.239678] fea0: ffffffffffffffff 0000000b7160dcd2
  [  111.239680] [<ffff3f062408315c>] el1_irq+0xdc/0x180
  [  111.239684] [<ffff3f06248549c4>] cpuidle_enter_state+0x124/0x318
  [  111.239686] [<ffff3f0624854c2c>] cpuidle_enter+0x34/0x48
  [  111.239689] [<ffff3f062411c030>] call_cpuidle+0x40/0x70
  [  111.239691] [<ffff3f062411c344>] do_idle+0x1ac/0x1f8
  [  111.239693] [<ffff3f062411c5c4>] cpu_startup_entry+0x2c/0x30
  [  111.239695] [<ffff3f0624091008>] secondary_start_kernel+0x158/0x198
  [  111.239696] [<00000000112091a4>] 0x112091a4
  [  111.239697] Task dump for CPU 32:
  [  111.239699] swapper/32      R  running task        0     0      1 
0x00000002
  [  111.239701] Call trace:
  [  111.239704] [<ffff3f0624086250>] __switch_to+0x98/0xb0
  [  111.239705] [<0000000bcde2fa4e>] 0xbcde2fa4e
  [  129.361765] ip_tables: (C) 2000-2006 Netfilter Core Team
  [  129.397270] ip6_tables: (C) 2000-2006 Netfilter Core Team
  [  129.438584] Ebtables v2.0 registered

  [FIX]
  The following patches cherry-picked from linux-next fixes this issue.
  5016bdb796b3 iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range
  d9a5f8adaec9 iommu/dma: Plumb in the per-CPU IOVA caches
  fc7f6142bacb iommu/dma: Clean up MSI IOVA allocation
  568c61384ea1 iommu/dma: Convert to address-based allocation
  dddd632b072f iommu/dma: Implement PCI allocation optimisation
  de84f5f049d9 iommu/dma: Stop getting dma_32bit_pfn wrong

  [Test case]
  After applying the patches the kernel boot with no soft lockups. This was 
tested by me on Zesty 4.10.0-20.22 kernel on QDF2400 SDP.

  [Regression Potential]
  These patches applicable to iommu driver and does not impact any platform 
code. Please see the comments section for regression tests on ARM64, Power8 and 
intel platforms.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680549/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to