[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2020-06-26 02:52 EDT--- IBM Bugzilla status-> closed, Fix Released for focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1874058 Title: [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: Fix Released Bug description: Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. A fix for this has recently been pulled into David Miller's net tree as part of a series of Mellanox fixes: https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/ It hasn't landed in Linus' tree yet though To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1874058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From niklas.schne...@ibm.com 2020-05-20 08:55 EDT--- Verified working with proposed kernel! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1874058 Title: [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace Status in Ubuntu on IBM z Systems: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Bug description: Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. A fix for this has recently been pulled into David Miller's net tree as part of a series of Mellanox fixes: https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/ It hasn't landed in Linus' tree yet though To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1874058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From niklas.schne...@ibm.com 2020-04-27 04:31 EDT--- The commit has landed upstream in v5.7-rc3 as: a019b36123aec9700b21ae0724710f62928a8bc1 ("net/mlx5: Fix failing fw tracer allocation on s390") -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1874058 Title: [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. A fix for this has recently been pulled into David Miller's net tree as part of a series of Mellanox fixes: https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/ It hasn't landed in Linus' tree yet though To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1874058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From niklas.schne...@ibm.com 2020-04-22 10:06 EDT--- I do though for this commit this of course also depends on the Mellanox maintainers, the commit has a Fixes tag so I think it should hopefully be picked up by auto selection. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1874058 Title: [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. A fix for this has recently been pulled into David Miller's net tree as part of a series of Mellanox fixes: https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/ It hasn't landed in Linus' tree yet though To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1874058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From niklas.schne...@ibm.com 2020-04-22 09:08 EDT--- The bug described in this particular bugzilla is fixed by "net/mlx5: Fix failing fw tracer allocation on s390" just wanted to point to the the thread because that also contains the note that they were added to David Miller's tree -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1874058 Title: [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. A fix for this has recently been pulled into David Miller's net tree as part of a series of Mellanox fixes: https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/ It hasn't landed in Linus' tree yet though To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1874058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1874058] Comment bridged from LTC Bugzilla
--- Comment From niklas.schne...@ibm.com 2020-04-22 04:59 EDT--- ---Problem Description--- Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask code generates a stack trace when initializing a device. The driver tries to allocate more contiguous memory than is allowed by the platform specific FORCE_MAX_ZONEORDER setting. FORCE_MAX_ZONEORDER on s390x: 9 FORCE_MAX_ZONEORDER on other platforms: 11 or more This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer code is only used for physical functions. ---Additional Hardware Info--- Z15 partition with Mojave (ConnectX5) adapter ---uname output--- Linux pok1-qz1-sr1-rk011-s21 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:46:43 UTC 2020 s390x s390x s390x GNU/Linux Machine Type = Z15 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Start a partition with a Mojave (ConnectX5) adapter Stack trace output: [ 331.531813] [ cut here ] [ 331.531819] WARNING: CPU: 7 PID: 2156 at mm/page_alloc.c:4727 __alloc_pages_nodemask+0x25c/0x320 [ 331.531820] Modules linked in: mlx5_core(+) mlxfw tls ptp pps_core s390_trng chsc_sch vfio_ccw vfio_mdev mdev eadm_sch vfio_iommu_type1 vfio sch_fq_codel ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 qeth_l2 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp qeth scsi_transport_fc qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath [ 331.531833] CPU: 7 PID: 2156 Comm: systemd-udevd Not tainted 5.4.0-14-generic #17-Ubuntu [ 331.531833] Hardware name: IBM 8562 GT2 A00 (LPAR) [ 331.531834] Krnl PSW : 0704c0018000 735d720c (__alloc_pages_nodemask+0x25c/0x320) [ 331.531836]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 331.531837] Krnl GPRS: 7418d687 00040dc0 00040dc0 000a [ 331.531837] 000a 03ff8042607e [ 331.531838]0dc0 002203b0 000a 0001c9480120 [ 331.531838]0001ecda4400 0055 03e001943680 03e001943600 [ 331.531844] Krnl Code: 735d7200: a7212000tmll%r2,8192 735d7204: a774ff87 brc 7,735d7112 #735d7208: a7f40001 brc 15,735d720a >735d720c: a789 lghi%r8,0 735d7210: a7f4ff83 brc 15,735d7116 735d7214: a718 lhi %r1,0 735d7218: a7f4ff1b brc 15,735d704e 735d721c: e3100344 lg %r1,832 [ 331.531851] Call Trace: [ 331.531852] ([<0201>] 0x201) [ 331.531856] [<735a20c4>] kmalloc_order+0x34/0xb0 [ 331.531856] [<735a2172>] kmalloc_order_trace+0x32/0xe0 [ 331.531880] [<03ff8042607e>] mlx5_fw_tracer_create+0x3e/0x500 [mlx5_core] [ 331.531899] [<03ff803ffa88>] mlx5_init_once+0x148/0x3c0 [mlx5_core] [ 331.531917] [<03ff8040152a>] mlx5_load_one+0x7a/0x240 [mlx5_core] [ 331.531935] [<03ff804018d8>] init_one+0x1e8/0x310 [mlx5_core] [ 331.531939] [<73916e16>] local_pci_probe+0x56/0xc0 [ 331.531941] [<73917ef2>] pci_device_probe+0x132/0x1e0 [ 331.531942] [<739a1374>] really_probe+0xf4/0x460 [ 331.531943] [<739a1a60>] driver_probe_device+0x130/0x190 [ 331.531944] [<739a1dae>] device_driver_attach+0x7e/0xa0 [ 331.531945] [<739a1e86>] __driver_attach+0xb6/0x180 [ 331.531947] [<7399eae2>] bus_for_each_dev+0x82/0xc0 [ 331.531948] [<739a030a>] bus_add_driver+0x16a/0x260 [ 331.531949] [<739a2b38>] driver_register+0x88/0x150 [ 331.531967] [<03ff80362080>] init+0x80/0xb0 [mlx5_core] [ 331.531968] [<733648bc>] do_one_initcall+0x3c/0x200 [ 331.531970] [<73495fc0>] do_init_module+0x70/0x270 [ 331.531970] [<734983b2>] load_module+0x1142/0x1440 [ 331.531971] [<734988e4>] __do_sys_finit_module+0xa4/0xf0 [ 331.531973] [<73c54ec2>] system_call+0x2a6/0x2c8 [ 331.531974] Last Breaking-Event-Address: [ 331.531975] [<735d7208>] __alloc_pages_nodemask+0x258/0x320 [ 331.531975] ---[ end trace 5985b580c6dbfd3e ]--- Oops output: [ 331.244901] pci 0100:00:00.0: [15b3:1019] type 00 class 0x02 [ 331.245195] pci 0100:00:00.0: reg 0x10: [mem 0xc000-0xc1ff 64bit pref] [ 331.245479] pci 0100:00:00.0: reg 0x30: [mem 0x-0x000f pref] [ 331.245518] pci 0100:00:00.0: enabling Extended Tags [ 331.246291] pci 0100:00:00.0: PME# supported from D3cold [ 331.246619] pci 0100:00:00.0: reg 0x1a4: [mem