[Group.of.nepali.translators] [Bug 1824864] Re: CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64
** Changed in: linux (Ubuntu Cosmic) Status: Fix Committed => Invalid -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1824864 Title: CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64 Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Bug description: [Impact] * Too small dmsg kernel buf ring size leads to loosing/missing early boot kernel messages which happen before journald starts slurping them up and storing them on disc. This results in messages similar to this one on boot "missed NN kernel messages on boot". This is especially pronounced on arm64 as the default setting there is way lower than any other 32bit or 64bit architecture we ship. Also amd64 appears to have the highest setting of 18 among all architectures we ship. The best course of action to bump all 64bit arches to 18, and keep all 32bit arches at the current & upstream default of 17. [Test Case] * $ cat /boot/config-`uname -r` | grep CONFIG_LOG_BUF_SHIFT on 64bit arches result should be: CONFIG_LOG_BUF_SHIFT=18 on 32bit arches result should be: CONFIG_LOG_BUF_SHIFT=17 * run systemd adt test, the boot-and-services test case should not fail journald tests with "missed kernel messages" visible in the error logs. [Regression Potential] * Increasing the size of the log_buf, will increase kernel memory usage which cannot be reclaimed. It will now become 256kb on arm64, ppc64el, s390x instead of 8kB/128kb/128kb respectively. 32bit arches remain unchanged at 128kb. [Other Info] * Original bug report CONFIG_LOG_BUF_SHIFT policy<{ 'amd64' : '18', 'arm64' : '14', 'armhf' : '17', 'i386' : '17', 'ppc64el': '17', 's390x' : '17'}> Please set CONFIG_LOG_BUF_SHIFT to at least 17 on arm64. Potentially bump all 64-bit arches to 18 (or higher!) as was done on amd64, meaning set 18 on arm64 s390x ppc64el. I have a systemd autopkgtest test that asserts that we see Linux kernel command line in the dmesg (journalctl -k -b). And it is consistently failing on arm64 scalingstack KVM EFI machines with messages of "missing 81 kernel messages". config LOG_BUF_SHIFT int "Kernel log buffer size (16 => 64KB, 17 => 128KB)" range 12 25 default 17 depends on PRINTK help Select the minimal kernel log buffer size as a power of 2. The final size is affected by LOG_CPU_MAX_BUF_SHIFT config parameter, see below. Any higher size also might be forced by "log_buf_len" boot parameter. Examples: 17 => 128 KB 16 => 64 KB 15 => 32 KB 14 => 16 KB 13 => 8 KB 12 => 4 KB 14 sounds like redictiously low for arm64. given that 17 is default across 32-bit arches, and 18 is default on amd64. On a related note, we have CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT policy<{'amd64': '13', 'arm64': '13', 'armhf': '13', 'i386': '13', 'ppc64el': '13', 's390x': '13'}> I'm not sure if we want to bump these up to LOG_BUF_SHIFT size or not. Please backport this to xenial and up. === systemd === systemd, boot-and-services test case can bump the ring buffer before running the tests. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824864/+subscriptions ___ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : group.of.nepali.translators@lists.launchpad.net Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp
[Group.of.nepali.translators] [Bug 1824687] Re: 4.4.0-145-generic Kernel Panic ip6_expire_frag_queue
** Changed in: linux (Ubuntu Cosmic) Status: Incomplete => Invalid -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1824687 Title: 4.4.0-145-generic Kernel Panic ip6_expire_frag_queue Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Released Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Triaged Bug description: [SRU Justification] == Impact == Since 05c0b86b96 "ipv6: frags: rewrite ip6_expire_frag_queue()" the 16.04/4.4 kernel crashes whenever that functions gets called (on busy systems this can be every 3-4 hours). While this potentially affects Cosmic and later, too, the fix differs on later kernels (Bionic is not yet affected as it does not yet carry updates to the frags handling). == Fix == For Xenial and Cosmic, the proposed fix would be additional changes to ip6_expipre_frag_queue(), taken from follow-up changes to ip_expire(). For Disco, I would hold back because we have a backlog of stable patches there and depending on what got backported to 5.0.y there would be a simpler fix. For current development kernels, one just needs to ensure that the following upstream change is included: 47d3d7fdb10a "ip6: fix skb leak in ip6frag_expire_frag_queue()". == Testcase == Unfortunately this could not be re-created locally. But a test kernel which had the proposed fix applied was showing good testing (see comment #37 and #38). == Risk of Regression == The modified function is only called in rare cases and the positive testing in production would cover this. So I would consider it low. --- Description: Ubuntu 16.04.6 LTS Release: 16.04 After upgrading our server to this Kernel we experience frequent Kernel panics (Attachment). Every 3 hours. Our machine has a throuput of about 600 Mbits/s The Panics are around the area of ip6_expire_frag_queue. __pskb_pull_tail ip6_dst_lookup_tail _decode_session6 __xfrm_decode_session icmpv6_route_lookup icmp6_send It seems similar to Bug Report in Debian. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922488 According to the bug finder of above bug it also occurred after using a Kernel with the change of rewrite ip6_expire_frag_queue() Intermediate solution. We disabled IPv6 on this machine to avoid further Panics. Please let me know what information is missing. The ubuntu-bug linux was send. And I hope it is attached to this report. ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-145-generic 4.4.0-145.171 ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176 Uname: Linux 4.4.0-145-generic x86_64 ApportVersion: 2.20.1-0ubuntu2.18 Architecture: amd64 Date: Sun Apr 14 11:40:11 2019 InstallationDate: Installed on 2018-03-18 (391 days ago) InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228) ProcEnviron: LANGUAGE=en_GB:en TERM=xterm-256color PATH=(custom, no user) LANG=en_GB.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed UpgradeStatus: Upgraded to xenial on 2018-10-21 (174 days ago) --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Apr 12 21:04 seq crw-rw 1 root audio 116, 33 Apr 12 21:04 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.18 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 HibernationDevice: RESUME=/dev/mapper/tor3--vg-swap_1 InstallationDate: Installed on 2018-03-18 (393 days ago) InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228) IwConfig: Error: [Errno 2] No such file or directory Lsusb: Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F Package: linux (not installed) PciMultimedia: ProcEnviron: LANGUAGE=en_GB:en TERM=xterm-256color PATH=(custom, no user) LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-145-generic root=/dev/mapper/hostname--vg-root ro ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176 RelatedPackageVersions: linux-restricted-modules-4.4.0-145-generic N/A linux-backports-modules-4.4.0-145-generic N/A linux-firmware
[Group.of.nepali.translators] [Bug 1835322] Re: [linux-azure] panic in ext4_resize_fs() found during storage testing
** Changed in: linux-azure (Ubuntu Cosmic) Status: Fix Committed => Invalid -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1835322 Title: [linux-azure] panic in ext4_resize_fs() found during storage testing Status in linux-azure package in Ubuntu: Fix Released Status in linux-azure source package in Xenial: Fix Released Status in linux-azure source package in Cosmic: Invalid Bug description: A panic was observed during file system testing. The trace is the following: [ 8783.243586] kernel BUG at /build/linux-azure-3iFJ9j/linux-azure-4.18.0/fs/ext4/resize.c:266! [ 8783.252751] invalid opcode: [#1] SMP PTI [ 8783.256735] CPU: 7 PID: 39476 Comm: resize2fs Not tainted 4.18.0-1023-azure #24~18.04.1-Ubuntu [ 8783.256735] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 [ 8783.256735] RIP: 0010:ext4_resize_fs+0x73b/0xf10 [ 8783.256735] Code: 50 ff ff ff 41 8b 75 10 4d 8b 65 00 85 f6 0f 94 c0 4d 85 e4 0f 94 c1 09 c8 83 bd 5c ff ff ff 01 7e 48 84 c0 0f 84 43 06 00 00 <0f> 0b 48 c7 c2 68 a7 8d 8f 48 c7 c6 00 fb 88 8f 4c 89 f7 e8 0d f8 [ 8783.256735] RSP: 0018:984e8dce7cb0 EFLAGS: 00010202 [ 8783.256735] RAX: 00205c01 RBX: 001f RCX: [ 8783.256735] RDX: 8b1dbe1367d0 RSI: RDI: [ 8783.256735] RBP: 984e8dce7d88 R08: 984e8dce7d4c R09: 984e8dce7d54 [ 8783.256735] R10: 0120 R11: 0001 R12: 8b1dbe136800 [ 8783.256735] R13: 8b1d74aefe80 R14: 8b1dbdeb9000 R15: [ 8783.256735] FS: 7f213fed30c0() GS:8b1ded7c() knlGS: [ 8783.256735] CS: 0010 DS: ES: CR0: 80050033 [ 8783.256735] CR2: 556aa08ae9b8 CR3: 001b8e324005 CR4: 003606e0 [ 8783.256735] DR0: DR1: DR2: [ 8783.256735] DR3: DR6: fffe0ff0 DR7: 0400 [ 8783.256735] Call Trace: [ 8783.256735] ? security_capable+0x3c/0x60 [ 8783.256735] ext4_ioctl+0xf91/0x14d0 [ 8783.256735] ? audit_filter_rules.constprop.14+0x325/0xf90 [ 8783.256735] ? audit_filter_rules.constprop.14+0x24b/0xf90 [ 8783.256735] do_vfs_ioctl+0xa8/0x630 [ 8783.256735] ksys_ioctl+0x75/0x80 [ 8783.256735] __x64_sys_ioctl+0x1a/0x20 [ 8783.256735] do_syscall_64+0x6a/0x1a0 [ 8783.256735] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8783.256735] RIP: 0033:0x7f213f3825d7 [ 8783.256735] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 [ 8783.256735] RSP: 002b:7ffe8effd688 EFLAGS: 0246 ORIG_RAX: 0010 [ 8783.256735] RAX: ffda RBX: 556aa08aa980 RCX: 7f213f3825d7 [ 8783.256735] RDX: 7ffe8effd7d0 RSI: 40086610 RDI: 0004 [ 8783.256735] RBP: 0004 R08: R09: [ 8783.256735] R10: R11: 0246 R12: 556aa08ac980 [ 8783.256735] R13: 7ffe8effd7d0 R14: 556aa08a92d0 R15: This issue is resolved by the following upstream commit: f96c3ac8dfc2 ("ext4: fix crash during online resizing") Commit f96c3ac8dfc2 is in mainline as of v5.1-rc1. This commit was requested in the upstream stable kernels. However, the Ubuntu kernels are EOL upstream. Please include this commit in the 16.04 and 18.04 linux-azure kernels. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1835322/+subscriptions ___ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : group.of.nepali.translators@lists.launchpad.net Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp
[Group.of.nepali.translators] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1814095 Title: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer Status in linux package in Ubuntu: Confirmed Status in linux source package in Xenial: New Bug description: [Impact] The bnxt_en_bpo driver experienced tx timeouts causing the system to experience network stalls and fail to send data and heartbeat packets. The following 25Gb Broadcom NIC error was seen on Xenial running the 4.4.0-141-generic kernel on an amd64 host seeing moderate-heavy network traffic (just once): * The bnxt_en_po driver froze on a "TX timed out" error and triggered the Netdev Watchdog timer under load. * From kernel log: "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out" See attached kern.log excerpt file for full excerpt of error log. * Release = Xenial Kernel = 4.4.0-141-generic #167 eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet * This caused the driver to reset in order to recover: "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!" driver: bnxt_en_bpo version: 1.8.1 source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout() * The loss of connectivity and softirq stall caused other failures on the system. * The bnxt_en_po driver is the imported Broadcom driver pulled in to support newer Broadcom HW (specific boards) while the bnx_en module continues to support the older HW. The current Linux upstream driver does not compile easily with the 4.4 kernel (too many changes). * This upstream and bnxt_en driver fix is a likely solution: "bnxt_en: Fix TX timeout during netpoll" commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906 This fix has not been applied to the bnxt_en_po driver version, but review of the code indicates that it is susceptible to the bug, and the fix would be reasonable. [Test Case] * Unfortunately, this is not easy to reproduce. Also, it is only seen on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo driver. [Regression Potential] * The patch is restricted to the bpo driver, with very constrained scope - just the newest Broadcom NICs being used by the Xenial 4.4 kernel (as opposed to the hwe 4.15 etc. kernels, which would have the in-tree fixed driver). * The patch is very small and backport is fairly minimal and simple. * The fix has been running on the in-tree driver in upstream mainline as well as the Ubuntu Linux in-tree driver, although the Broadcom driver has a lot of lower level code that is different, this piece is still the same. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions ___ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : group.of.nepali.translators@lists.launchpad.net Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp