[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."
For item 1: * Confirm that makedumpfile works as expected by triggering a kdump. I can confirm that makedumpfile 1:1.6.7-1ubuntu2.5 from focal- proposed/main worked well when I triggered a dump in a system: ubuntu@fabio-small-makedumpfile:~$ sudo hostnamectl Static hostname: fabio-small-makedumpfile Icon name: computer-vm Chassis: vm Machine ID: dee0adfb9aa54246b4d1e2fc62dd50f7 Boot ID: adba6ba3977f4c758a7008013a7a6d1e Virtualization: oracle Operating System: Ubuntu 20.04.6 LTS Kernel: Linux 5.15.0-1049-oracle Architecture: x86-64 ubuntu@fabio-small-makedumpfile:~$ sudo kdump-config show DUMP_MODE:kdump USE_KDUMP:1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR:/var/crash crashkernel addr: 0x2c00 0xfd7f00 /boot/vmlinuz-5.15.0-1049-oracle kdump initrd: /boot/initrd.img-5.15.0-1049-oracle current state:ready to kdump kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1049-oracle root=UUID=7d8611b4-d3e7-4f1a-a8f9-e1a7e5a2d2f9 ro console=tty1 console=ttyS0 nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 crash_kexec_post_notifiers reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb" --initrd=/boot/initrd.img-5.15.0-1049-oracle /boot/vmlinuz-5.15.0-1049-oracle ubuntu@fabio-small-makedumpfile:~$ sudo dpkg -l makedumpfile Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name VersionArchitecture Description +++-==-==--= ii makedumpfile 1:1.6.7-1ubuntu2.5 amd64VMcore extraction tool ubuntu@fabio-small-makedumpfile:~$ sudo apt-cache policy makedumpfile makedumpfile: Installed: 1:1.6.7-1ubuntu2.5 Candidate: 1:1.6.7-1ubuntu2.5 Version table: *** 1:1.6.7-1ubuntu2.5 500 500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages 100 /var/lib/dpkg/status 1:1.6.7-1ubuntu2.4 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 1:1.6.7-1ubuntu2 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages Output showing that it completed well: [ 54.490112] kdump-tools[676]: Starting kdump-tools: [ 54.876357] kdump-tools[686]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151524/dump-incomplete Checking for memory holes : [100.0 %] \ [ 204.391465] reboot: Restarting system And when I look at the crash, it's properly compressed (system had 1TB of RAM): ubuntu@fabio-small-makedumpfile:~$ ls -lh /var/crash/202312151524 total 2.3G -rw--- 1 root root 126K Dec 15 15:26 dmesg.202312151524 -rw--- 1 root root 2.3G Dec 15 15:26 dump.202312151524 Regards, Fabio Martins -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1970672 Title: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte." Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Focal: Fix Committed Bug description: [Impact] * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with "__vtop4_x86_64: Can't get a valid pmd_pte." * makedumpfile falls back to cp for the dump, resulting in extremely large vmcores. This can impact both collection and analysis due to lack of space for the resulting vmcore. * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1: https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6 commit 646456862df8926ba10dd7330abf3bf0f887e1b6 Author: Kazuhito Hagio Date: Wed May 26 14:31:26 2021 +0900 [PATCH] Increase SECTION_MAP_LAST_BIT to 5 * Required for kernel 5.12 Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions") added a section flag (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on some machines like this: __vtop4_x86_64: Can't get a valid pmd_pte. readmem: Can't convert a virtual address(e2bdc200) to physical address. readmem: type_addr: 0, addr:e2bdc200, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not been used until the change, so we can just increase the value. Signed-off-by: Kazuhito Hagio [Test Plan] * Confirm that makedumpfile works as expected by triggering a kdump. * Confirm that the p
[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."
For item 2: * Confirm that the patched makedumpfile works as expected on a system known to experience the issue. Unfortunately I'm no longer able to reproduce the original issue. Even running on the same hardware where this was originally noticed, with the same kernel version (5.13.0-1027-oracle), makedumpfile from focal-updates/main (1:1.6.7-1ubuntu2.4) is just working well: [ 53.223512] kdump-tools[693]: Starting kdump-tools: [ 53.623944] kdump-tools[702]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151415/dump-incomplete Copying data : [ 196.965120] reboot: Restarting system [ 22.0 %] | Unfortunately I don't have the information and I don't have access to the original system to check what version of makedumpfile it was using back then, so I could test the exact same makedumpfile+kernel versions. I also tested kernel 5.13.0-1027-oracle + makedumpfile 1:1.6.7-1ubuntu2 from focal/main, and in this combinarion, makedumpfile fails with a similar, but slightly different error, then falls back to cp: [ 53.721130] kdump-tools[690]: Starting kdump-tools: [ 54.121624] kdump-tools[699]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151434/dump-incomplete [ 54.249624] kdump-tools[719]: get_mm_sparsemem: Can't get the address of mem_section. [ 54.345410] kdump-tools[719]: The kernel version is not supported. [ 54.425405] kdump-tools[719]: The makedumpfile operation may be incomplete. [ 54.517391] kdump-tools[719]: makedumpfile Failed. [ 54.577916] kdump-tools[699]: * kdump-tools: makedumpfile failed, falling back to 'cp' However, using the latest makedumpfile from focal-updates/main (1:1.6.7-1ubuntu2.4) fixes this situation, as mentioned / shown above. Due to this reason, I can't conclude the item 2. I'll work now on 1. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1970672 Title: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte." Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Focal: Fix Committed Bug description: [Impact] * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with "__vtop4_x86_64: Can't get a valid pmd_pte." * makedumpfile falls back to cp for the dump, resulting in extremely large vmcores. This can impact both collection and analysis due to lack of space for the resulting vmcore. * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1: https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6 commit 646456862df8926ba10dd7330abf3bf0f887e1b6 Author: Kazuhito Hagio Date: Wed May 26 14:31:26 2021 +0900 [PATCH] Increase SECTION_MAP_LAST_BIT to 5 * Required for kernel 5.12 Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions") added a section flag (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on some machines like this: __vtop4_x86_64: Can't get a valid pmd_pte. readmem: Can't convert a virtual address(e2bdc200) to physical address. readmem: type_addr: 0, addr:e2bdc200, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not been used until the change, so we can just increase the value. Signed-off-by: Kazuhito Hagio [Test Plan] * Confirm that makedumpfile works as expected by triggering a kdump. * Confirm that the patched makedumpfile works as expected on a system known to experience the issue. * Confirm that the patched makedumpfile is able to work with a cp- generated known affected vmcore to compress it. The unpatched version fails. [Where problems could occur] * This change could adversely affect the collection/compression of vmcores during a kdump situation resulting in fallback to cp. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1970672/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."
Hi Chris, You're correct, I'm sorry. My test on comment #23 is the 3rd item you listed. Let me work on 1 and 2 and I'll get back here. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1970672 Title: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte." Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Focal: Fix Committed Bug description: [Impact] * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with "__vtop4_x86_64: Can't get a valid pmd_pte." * makedumpfile falls back to cp for the dump, resulting in extremely large vmcores. This can impact both collection and analysis due to lack of space for the resulting vmcore. * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1: https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6 commit 646456862df8926ba10dd7330abf3bf0f887e1b6 Author: Kazuhito Hagio Date: Wed May 26 14:31:26 2021 +0900 [PATCH] Increase SECTION_MAP_LAST_BIT to 5 * Required for kernel 5.12 Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions") added a section flag (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on some machines like this: __vtop4_x86_64: Can't get a valid pmd_pte. readmem: Can't convert a virtual address(e2bdc200) to physical address. readmem: type_addr: 0, addr:e2bdc200, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not been used until the change, so we can just increase the value. Signed-off-by: Kazuhito Hagio [Test Plan] * Confirm that makedumpfile works as expected by triggering a kdump. * Confirm that the patched makedumpfile works as expected on a system known to experience the issue. * Confirm that the patched makedumpfile is able to work with a cp- generated known affected vmcore to compress it. The unpatched version fails. [Where problems could occur] * This change could adversely affect the collection/compression of vmcores during a kdump situation resulting in fallback to cp. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1970672/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."
I've tested makedumpfile from -proposed on Focal and it looks good to me. Using a vmcore file with 2TB as an input: - Original makedumpfile 1.6.7-1ubuntu2.4 fails: ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile makedumpfile: Installed: 1:1.6.7-1ubuntu2.4 Candidate: 1:1.6.7-1ubuntu2.4 Version table: *** 1:1.6.7-1ubuntu2.4 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 100 /var/lib/dpkg/status 1:1.6.7-1ubuntu2 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 ./vmcore.202204202351 ./dump-incomplete-fabio The kernel version is not supported. The makedumpfile operation may be incomplete. Checking for memory holes : [100.0 %] / __vtop4_x86_64: Can't get a valid pmd_pte. readmem: Can't convert a virtual address(ecff8180) to physical address. readmem: type_addr: 0, addr:ecff8180, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. makedumpfile Failed. - Makedumpfile 1.6.7-1ubuntu2.5 from proposed works: ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile makedumpfile: Installed: 1:1.6.7-1ubuntu2.5 Candidate: 1:1.6.7-1ubuntu2.5 Version table: *** 1:1.6.7-1ubuntu2.5 500 500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages 100 /var/lib/dpkg/status 1:1.6.7-1ubuntu2.4 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 1:1.6.7-1ubuntu2 500 500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 ./vmcore.202204202351 ./dump-incomplete-fabio The kernel version is not supported. The makedumpfile operation may be incomplete. Copying data : [100.0 %] - eta: 0s The dumpfile is saved to ./dump-incomplete-fabio. makedumpfile Completed. It reduced the dump file from 2TB down to 4.5G: ubuntu@kdump-instance:/mnt/202204202351$ ls -lh vmcore.202204202351 -r 1 ubuntu ubuntu 2.0T Apr 21 2022 vmcore.202204202351 ubuntu@kdump-instance:/mnt/202204202351$ ls -lh dump-incomplete-fabio -rw--- 1 ubuntu ubuntu 4.5G Dec 12 14:23 dump-incomplete-fabio The reason for having a vmcore file with the size of the installed RAM in the comment reported by Heather, is that you are forcing makedumpfile to fail, by providing "-c -d 32" (which is a level that doesn't exist, as the max is 31) or moving the makedumpfile binary away, so kdump fails over to cp, which hence will produce the vmcore file with the size of the installed RAM. Let me know if this is enough to have focal verification concluded. ** Tags removed: verification-failed-focal verification-needed ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1970672 Title: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte." Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Focal: Fix Committed Bug description: [Impact] * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with "__vtop4_x86_64: Can't get a valid pmd_pte." * makedumpfile falls back to cp for the dump, resulting in extremely large vmcores. This can impact both collection and analysis due to lack of space for the resulting vmcore. * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1: https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6 commit 646456862df8926ba10dd7330abf3bf0f887e1b6 Author: Kazuhito Hagio Date: Wed May 26 14:31:26 2021 +0900 [PATCH] Increase SECTION_MAP_LAST_BIT to 5 * Required for kernel 5.12 Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions") added a section flag (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on some machines like this: __vtop4_x86_64: Can't get a valid pmd_pte. readmem: Can't convert a virtual address(e2bdc200) to physical address. readmem: type_addr: 0, addr:e2bdc200, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not been used until the change, so we can just increase the value. Signed-off-by: Kazuhito Hagio [Test Plan] * Confirm that makedumpfile works as expected by triggering a kdump. * Confirm that th
[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest
Verified a Focal guest as follows: 1. Reproduced the problem with kernel 5.4.0-152-generic: https://pastebin.ubuntu.com/p/Cgj6j4Prbc/ 2. As a workaround removed: 0x0003 3. Installed kernel from -proposed: root@ubuntu:~# apt-cache policy linux-image-virtual linux-virtual linux-image-virtual: Installed: 5.4.0.154.151 Candidate: 5.4.0.154.151 Version table: *** 5.4.0.154.151 500 500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages 100 /var/lib/dpkg/status 5.4.0.152.149 500 500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages 5.4.0.26.32 500 500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages linux-virtual: Installed: 5.4.0.154.151 Candidate: 5.4.0.154.151 Version table: *** 5.4.0.154.151 500 500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages 100 /var/lib/dpkg/status 5.4.0.152.149 500 500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages 5.4.0.26.32 500 500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages 4. Added back: 0x0003 5. Instance booted fine: ubuntu@ubuntu:~$ uname -a Linux ubuntu 5.4.0-154-generic #171-Ubuntu SMP Fri Jun 16 16:29:04 UTC 2023 x86_ 64 x86_64 x86_64 GNU/Linux ubuntu@ubuntu:~$ sudo dmesg | grep -i sev [0.172491] AMD Secure Encrypted Virtualization (SEV) active [5.318658] SVM: KVM is unsupported when running as an SEV guest 6. Full dmesg: https://paste.ubuntu.com/p/dP4Zp8pKfm/ ** Tags removed: verification-needed-focal verification-needed-jammy ** Tags added: verification-done verification-done-focal verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2020319 Title: Encountering an issue with memcpy_fromio causing failed boot of SEV- enabled guest Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: New Status in linux source package in Focal: Fix Committed Status in linux source package in Jammy: Fix Committed Bug description: [Impact] When launching a SEV-enabled guest, the guest kernel panics with the following call trace, indicating a critical error in the system. == [1.090638] software IO TLB: Memory encryption is active and system is using DMA bounce buffers [1.092105] Linux agpgart interface v0.103 [1.092716] BUG: unable to handle page fault for address: 9b820003d068 [1.093445] #PF: supervisor read access in kernel mode [1.093966] #PF: error_code(0x) - not-present page [1.094481] PGD 80010067 P4D 80010067 PUD 8001001d7067 PMD 8001001da067 PTE 8000fed40173 [1.094629] Oops: [#1] SMP NOPTI [1.094629] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-46-generic #49-Ubuntu [1.094629] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [1.094629] RIP: 0010:memcpy_fromio+0x27/0x50 [1.094629] Code: cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 85 d2 74 28 40 f6 c6 01 75 30 48 83 fa 01 76 06 40 f6 c6 02 75 1c 48 89 d1 48 c1 e9 02 a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 5d e9 14 b3 97 00 66 [1.094629] RSP: 0018:9b820001ba50 EFLAGS: 00010212 [1.094629] RAX: 9b820003d040 RBX: 9b820001bac0 RCX: 0002 [1.094629] RDX: 0008 RSI: 9b820003d068 RDI: 9b820001ba90 [1.094629] RBP: 9b820001ba50 R08: 0f80 R09: 0f80 [1.094629] R10: fed40080 R11: 9b820001bac0 R12: 8cc7068eca48 [1.094629] R13: 8cc700a64288 R14: R15: fed40080 [1.094629] FS: () GS:8cc77bd0() knlGS: [1.094629] CS: 0010 DS: ES: CR0: 80050033 [1.094629] CR2: 9b820003d068 CR3: 800174a1 CR4: 00350ee0 [1.094629] Call Trace: [1.094629] [1.094629] crb_map_io+0x315/0x870 [1.094629] ? radix_tree_iter_tag_clear+0x12/0x20 [1.094629] ? _raw_spin_unlock_irqrestore+0xe/0x30 [1.094629] crb_acpi_add+0xc2/0x140 [1.094629] acpi_device_probe+0x4c/0x170 [1.094629] really_probe+0x222/0x420 [1.094629] __driver_probe_device+0x119/0x190 [1.094629] driver_probe_device+0x23/0xc0 [1.094629] __driver_attach+0xbd/0x1e0 [1.094629] ? __device_attach_driver+0x120/0x120 [1.094629] bus_for_each_dev+0x7e/0xd0 [1.094629] driver_attach+0x1e/0x30 [1.094629] bus_add_driver+0x139/0x200 [1.094629] driver_register+0x95/0x100 [1.094629] ? init_tis+0xfd/0xfd [1.094629] acpi_bus_register_driver+0x39/0x50
[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest
I've verified a Jammy guest as follows: 1. Reproduced the problem with kernel 5.15.0-75-generic: https://pastebin.ubuntu.com/p/844W5SzjR8/ 2. As a workaround removed: 0x0003 3. Installed kernel from -proposed: root@ubuntu:~# apt-cache policy linux-image-virtual linux-virtual linux-image-virtual: Installed: 5.15.0.77.75 Candidate: 5.15.0.77.75 Version table: *** 5.15.0.77.75 500 500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 100 /var/lib/dpkg/status 5.15.0.75.73 500 500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages 5.15.0.25.27 500 500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages linux-virtual: Installed: 5.15.0.77.75 Candidate: 5.15.0.77.75 Version table: *** 5.15.0.77.75 500 500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 100 /var/lib/dpkg/status 5.15.0.75.73 500 500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages 5.15.0.25.27 500 500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages 4. Added back: 0x0003 5. Instance booted fine: ubuntu@ubuntu:~$ uname -a Linux ubuntu 5.15.0-77-generic #84-Ubuntu SMP Fri Jun 16 16:16:44 UTC 2023 x86_6 4 x86_64 x86_64 GNU/Linux ubuntu@ubuntu:~$ sudo dmesg | grep -i sev [0.217323] AMD Memory Encryption Features active: SEV [5.296555] SVM: KVM is unsupported when running as an SEV guest 6. Full dmesg: https://paste.ubuntu.com/p/5MDcKbVzPv/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2020319 Title: Encountering an issue with memcpy_fromio causing failed boot of SEV- enabled guest Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: New Status in linux source package in Focal: Fix Committed Status in linux source package in Jammy: Fix Committed Bug description: [Impact] When launching a SEV-enabled guest, the guest kernel panics with the following call trace, indicating a critical error in the system. == [1.090638] software IO TLB: Memory encryption is active and system is using DMA bounce buffers [1.092105] Linux agpgart interface v0.103 [1.092716] BUG: unable to handle page fault for address: 9b820003d068 [1.093445] #PF: supervisor read access in kernel mode [1.093966] #PF: error_code(0x) - not-present page [1.094481] PGD 80010067 P4D 80010067 PUD 8001001d7067 PMD 8001001da067 PTE 8000fed40173 [1.094629] Oops: [#1] SMP NOPTI [1.094629] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-46-generic #49-Ubuntu [1.094629] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [1.094629] RIP: 0010:memcpy_fromio+0x27/0x50 [1.094629] Code: cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 85 d2 74 28 40 f6 c6 01 75 30 48 83 fa 01 76 06 40 f6 c6 02 75 1c 48 89 d1 48 c1 e9 02 a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 5d e9 14 b3 97 00 66 [1.094629] RSP: 0018:9b820001ba50 EFLAGS: 00010212 [1.094629] RAX: 9b820003d040 RBX: 9b820001bac0 RCX: 0002 [1.094629] RDX: 0008 RSI: 9b820003d068 RDI: 9b820001ba90 [1.094629] RBP: 9b820001ba50 R08: 0f80 R09: 0f80 [1.094629] R10: fed40080 R11: 9b820001bac0 R12: 8cc7068eca48 [1.094629] R13: 8cc700a64288 R14: R15: fed40080 [1.094629] FS: () GS:8cc77bd0() knlGS: [1.094629] CS: 0010 DS: ES: CR0: 80050033 [1.094629] CR2: 9b820003d068 CR3: 800174a1 CR4: 00350ee0 [1.094629] Call Trace: [1.094629] [1.094629] crb_map_io+0x315/0x870 [1.094629] ? radix_tree_iter_tag_clear+0x12/0x20 [1.094629] ? _raw_spin_unlock_irqrestore+0xe/0x30 [1.094629] crb_acpi_add+0xc2/0x140 [1.094629] acpi_device_probe+0x4c/0x170 [1.094629] really_probe+0x222/0x420 [1.094629] __driver_probe_device+0x119/0x190 [1.094629] driver_probe_device+0x23/0xc0 [1.094629] __driver_attach+0xbd/0x1e0 [1.094629] ? __device_attach_driver+0x120/0x120 [1.094629] bus_for_each_dev+0x7e/0xd0 [1.094629] driver_attach+0x1e/0x30 [1.094629] bus_add_driver+0x139/0x200 [1.094629] driver_register+0x95/0x100 [1.094629] ? init_tis+0xfd/0xfd [1.094629] acpi_bus_register_driver+0x39/0x50 [1.094629] crb_acpi_driver_init+0x15/0x1b [1.094629] do_one_initcall+0x48/0x1e0 [1.094629] do_initcalls+0x12f/0x159 [1.094629]
[Kernel-packages] [Bug 1990167] Re: cma alloc failure in large 5.15 arm instances
I believe this patch might have been dropped for newer linux-aws kernels. I just reproduced this problem while running 5.15.0-1026-aws -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1990167 Title: cma alloc failure in large 5.15 arm instances Status in linux-aws package in Ubuntu: Invalid Status in linux-aws source package in Jammy: Fix Committed Status in linux-aws source package in Kinetic: Invalid Bug description: When launching large arm64 instances on the focal or jammy ami, cma allocation errors appear in the dmesg out: [0.063255] cma: cma_alloc: reserved: alloc failed, req-size: 4096 pages, ret: -12 As far as I can tell, this does not impact instance launch in a meaningful way, but I am unsure of the other implications of this. I was able to confirm that these messages are only present in 5.15, as they do not show up in the bionic image, and rolling back focal to linux-aws 5.4 avoids them as well. This was present in at least 2 instance types and only appears to pop up in large sizes (2x4 does not produce them, 64x124 (c6gn.16xlarge) does) This could be as simple as just disabling CMA in the linux-aws pkg, as it appears this is already the case in linux-azure(LP: #1949770). Attaching dmesg out to the report. # Replication + Launch a large arm64 instance (c6gn.16xlarge) + Observe the messages in kern.log / dmesg To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1990167/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1955655] Re: kernel-5.13.0-23-generic : Unable to boot when Secure Encrypted Virtualization( SEV) is enabled without setting swiotlb boot param
This is a grub bug and it is being tracked here: [SRU] unable to boot guest with large memory when SEV is enabled on host https://bugs.launchpad.net/ubuntu/+source/grub2-unsigned/+bug/1989446 ** Changed in: grub2 (Ubuntu) Status: Confirmed => Invalid ** Changed in: grub2 (Ubuntu Impish) Status: Confirmed => Invalid ** Changed in: grub2 (Ubuntu Jammy) Status: Confirmed => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1955655 Title: kernel-5.13.0-23-generic : Unable to boot when Secure Encrypted Virtualization( SEV) is enabled without setting swiotlb boot param Status in grub2 package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in grub2 source package in Impish: Invalid Status in linux source package in Impish: Invalid Status in grub2 source package in Jammy: Invalid Status in linux source package in Jammy: Invalid Bug description: While investigating LP: #1955395 by using the -generic kernel image, it appeared that it is impossible to boot the kernel unless the boot parameter swiotlb is set to 512M (swiotlb=262144). Wnen not set, the kernel tries to adjust the bounce buffer to 1024Mb it fails and later trigger a kernel panic with the following trace : $ grep TLB /tmp/console.log [0.003665] software IO TLB: SWIOTLB bounce buffer size adjusted to 1024MB [0.034219] kvm-guest: KVM setup pv remote TLB flush [0.037063] software IO TLB: Cannot allocate buffer [0.223009] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127 [0.223634] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0 [0.297424] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [0.297424] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [1.018860] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [1.019552] software IO TLB: No low mem [1.451497] Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer [1.491589] ---[ end Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer ]--- The SWIOTLB adjustment comes from the following kernel commit : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e998879d4fb7991856916972168cf27c0d86ed12 For some reason, the LowMem allocation fails (as seen by the "software IO TLB: No low mem" msg),hence the SWIOTLB adjustment cannot be completed. When booting with the swiotlb=262144 value, we get the following output : $ grep TLB /tmp/console.log [0.050908] kvm-guest: KVM setup pv remote TLB flush [0.308896] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127 [0.309494] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0 [0.373162] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [0.373162] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [1.071136] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [1.071837] software IO TLB: mapped [mem 0x5bebe000-0x7bebe000] (512MB) [1.529804] software IO TLB: Memory encryption is active and system is using DMA bounce buffers For comparaison, the Fedora 34 kernel (5.15.4-101.fc34.x86_64) with the same adjustment mechanism does correctly adjust the SWIOTLB bounce buffer, without the need to set the swiotlb= value at boot time. The SWIOTLB buffer adjustment has been introduced in kernel 5.11. We can make SEV enabled resources available for testing if needed. ...Louis --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Dec 23 13:32 seq crw-rw 1 root audio 116, 33 Dec 23 13:32 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu74 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown DistroRelease: Ubuntu 22.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: MachineType: Scaleway SCW-ENT1-S Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-23-generic root=UUID=1f577236-6bf2-48ef-998a-ba45f71aca7f ro console=tty1 console=ttyS0 swiotlb=262144 ProcVersionSignature: Ubuntu 5.13.0-23.23-generic 5.13.19 RelatedPackageVersions: linux-restricted-modules-5.13.0-23-generic
[Kernel-packages] [Bug 1983625] Re: 5.15.0-1013-oracle: Unable to boot large memory SEV guest without setting swiotlb parameter.
This is a grub bug and it is being tracked here: [SRU] unable to boot guest with large memory when SEV is enabled on host https://bugs.launchpad.net/ubuntu/+source/grub2-unsigned/+bug/1989446 ** Changed in: linux (Ubuntu) Status: Confirmed => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1983625 Title: 5.15.0-1013-oracle: Unable to boot large memory SEV guest without setting swiotlb parameter. Status in linux package in Ubuntu: Invalid Bug description: When launching a SEV Ubuntu22.04 guest with e.g. memory > 16876M, Ubuntu kernel 5.15.0-1013-oracle panics unless swiotlb=262144 is specified on guest kernel parameters. It seems that the kernel tries to adjust swiotlb buffer size but can not do that and crashes. With a memory size such as 8G, the guest boots fine. HOST INFO Host type : OCI Bare-Metal Server Server/Machine: ORACLE SERVER E2-2c CPU model : AMD EPYC 7742 64-Core Processor Architecture : x86_64 Hostname : atanveer-amd-sme OS: Oracle Linux Server release 7.9 Kernel: 5.4.17-2136.309.4.el7uek.x86_64 #2 SMP Tue Jun 28 17:35:13 PDT 2022 Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-18.oci.el7) OVMF/AAVMF: OVMF-1.6.3-1.el7.noarch Qemu command to launch SEV guest: /bin/qemu-system-x86_64 -name OL22.04-uefi \ -machine q35 \ -enable-kvm \ -cpu host,+host-phys-bits \ -m 16877M \ -smp 8,maxcpus=240 \ -D ./22.04-uefi.log \ -nodefaults \ -monitor stdio \ -vnc 0.0.0.0:0,to=999 \ -vga std \ -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly \ -drive file=OVMF_VARS.pure-efi.fd.ol22.04,index=1,if=pflash,format=raw \ -device virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true \ -drive file=Ubuntu-22.04-2022.06.16-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk \ -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 \ -qmp tcp:127.0.0.1:3334,server,nowait \ -serial telnet:127.0.0.1:,server,nowait \ -device virtio-rng-pci,disable-legacy=on,iommu_platform=true \ -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \ -machine memory-encryption=sev0 Console log: [0.005025] software IO TLB: SWIOTLB bounce buffer size adjusted to 1011MB [0.033881] kvm-guest: KVM setup pv remote TLB flush [0.054931] software IO TLB: Cannot allocate buffer [0.248933] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127 [0.249582] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0 [0.317440] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [0.317440] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [0.424952] iommu: DMA domain TLB invalidation policy: lazy mode [0.570923] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [0.571669] software IO TLB: No low mem . . . . [1.233515] ata1: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010100 irq 28 [1.234985] ata2: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010180 irq 28 [1.236464] ata3: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010200 irq 28 [1.237863] ata4: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010280 irq 28 [1.239257] ata5: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010300 irq 28 [1.240659] ata6: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010380 irq 28 [1.555165] ata5: SATA link down (SStatus 0 SControl 300) [1.556661] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [1.558232] ata2: SATA link down (SStatus 0 SControl 300) [1.559728] ata4: SATA link down (SStatus 0 SControl 300) [1.560996] ata3: SATA link down (SStatus 0 SControl 300) [1.562134] ata1: SATA link down (SStatus 0 SControl 300) [1.563566] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [6.911450] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [6.912906] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ 12.288045] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 12.289904] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ 17.663548] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 242.883993] INFO: task kworker/u480:0:9 blocked for more than 120 seconds. [ 242.885619] Not tainted 5.15.0-1013-oracle #17-Ubuntu [ 242.886743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.888198] task:kworker/u480:0 state:D stack:0 pid:9 ppid: 2 flags:0x4000 [ 242.889703] Workqueue: events_unbound async_run_entry_fn [ 242.890882] Call Trace: [ 242.891727] Full console log is attached. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+so
[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used
Hi Si-Wei, the 5.11 kernel has reached EOL in Feb 2022. Kernel 5.15 is the one currently being used for linux-oracle kernel on Focal (20.04) and Jammy (22.04), and it has the commit that you mentioned above: $ git log --oneline | grep -i "Fix page DMA map/unmap attributes" a865fe280b96 net/mlx5e: Fix page DMA map/unmap attributes $ git tag --contains a865fe280b96 Ubuntu-oracle-5.15.0-1001.1 Ubuntu-oracle-5.15.0-1001.2 Ubuntu-oracle-5.15.0-1001.3 Ubuntu-oracle-5.15.0-1002.4 Ubuntu-oracle-5.15.0-1003.5 Ubuntu-oracle-5.15.0-1004.6 Ubuntu-oracle-5.15.0-1005.7 Ubuntu-oracle-5.15.0-1006.8 Ubuntu-oracle-5.15.0-1007.9 Ubuntu-oracle-5.15.0-1009.12 Ubuntu-oracle-5.15.0-1011.15 Ubuntu-oracle-5.15.0-1012.16 Ubuntu-oracle-5.15.0-1013.17 Can you test a guest running 5.15 to see if this addresses the problem? Regards, Fabio Martins -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oracle-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1980884 Title: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used Status in linux-oracle-5.11 package in Ubuntu: New Bug description: Guest kernel panic can be observed when Ubuntu SEV guest with mlx5 vfio-pci is started as iperf3 server using "iperf3 -s" and as soon as the client tries to connect with it. Steps to reproduce: HOST INFO Host type : OCI (Oracle Cloud) Bare-Metal Server Server/Machine: ORACLE SERVER E4-2c CPU model : AMD EPYC 7J13 64-Core Processor Architecture : x86_64 Host OS : Oracle Linux Server release 7.9 Host Kernel : 5.4.17-2136.309.3.el7uek.x86_64 #2 SMP Tue Jun 14 21:58:29 PDT 2022 Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-17.1.el7) OVMF/AAVMF: OVMF-1.6.2-2.el7.noarch libiscsi : libiscsi-1.19.0-1.el7.x86_64 Guest Kernel : 5.11.0-1028-ORACLE 1) Start Ubuntu 20.04/18.04 SEV guest with vfio-pci: /usr/bin/qemu-system-x86_64 -machine q35 -name OL20.04-uefi -enable-kvm -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio -vnc 0.0.0.0:0,to=999 -vga std -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly -drive file=OVMF_VARS.pure-efi.fd.ol20.04,index=1,if=pflash,format=raw -device virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true -drive file=/systest/atanveer/scripts/Ubuntu-20.04-2022.02.15-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 -net none -device vfio-pci,host=:21:10.1 -qmp tcp:127.0.0.1:3334,server,nowait -serial telnet:127.0.0.1:,server,nowait -D ./OL20.04-uefi.log -device virtio-rng-pci,disable-legacy=on,iommu_platform=true -object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine memory-encryption=sev0 2) Start a client guest OL/Ubuntu: /usr/bin/qemu-system-x86_64 -machine q35 -name OL18.04-uefi -enable-kvm -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio -vnc 0.0.0.0:0,to=999 -vga std -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly -drive file=OVMF_VARS.pure-efi.fd.ol18.04,index=1,if=pflash,format=raw -device virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true -drive file=/systest/atanveer/scripts/Ubuntu-18.04-2022.02.13-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 -net none -device vfio-pci,host=:21:10.2 -qmp tcp:127.0.0.1:,server,nowait -serial telnet:127.0.0.1:,server,nowait -D ./OL18.04-uefi.log -device virtio-rng-pci,disable-legacy=on,iommu_platform=true -object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine memory-encryption=sev0 3) Flush iptables on both the VMs using "iptables -F" 4) Start the iperf3 server on the first VM using "iperf3 -s" 5) Start the iperf3 client on the second VM using "iperf3 -c -4 -f M -i 0 -t 70 -O 10 -P 64" The kernel panic is seen on the first VM i.e. Ubuntu 20.04 with iperf3 also showing "Bad Address" error. Console logs: root@ubuntu-20-04:~# iperf3 -s --- Server listening on 5201 --- Accepted connection from 10.196.246.104, port 33732 [ 5] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33734 [ 8] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33736 [ 10] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33738 iperf3: error - unable to read from stream socket: Bad address --- Server listening on 5201 --- [ 91.083856] general protection fault: [#1] SMP NOPTI [ 91
[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used
That is also available in the 5.4 kernel, so that also covers Bionic (18.04) guests if needed: $ git log --oneline | grep -i "Fix page DMA map/unmap attributes" 53176ef0d809 net/mlx5e: Fix page DMA map/unmap attributes $ git tag --contains 53176ef0d809 Ubuntu-oracle-5.4.0-1071.77 Ubuntu-oracle-5.4.0-1072.78 Ubuntu-oracle-5.4.0-1073.79 Ubuntu-oracle-5.4.0-1074.80 Ubuntu-oracle-5.4.0-1076.83 Ubuntu-oracle-5.4.0-1078.86 Ubuntu-oracle-5.4.0-1079.87 Ubuntu-oracle-5.4.0-1080.88 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oracle-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1980884 Title: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used Status in linux-oracle-5.11 package in Ubuntu: New Bug description: Guest kernel panic can be observed when Ubuntu SEV guest with mlx5 vfio-pci is started as iperf3 server using "iperf3 -s" and as soon as the client tries to connect with it. Steps to reproduce: HOST INFO Host type : OCI (Oracle Cloud) Bare-Metal Server Server/Machine: ORACLE SERVER E4-2c CPU model : AMD EPYC 7J13 64-Core Processor Architecture : x86_64 Host OS : Oracle Linux Server release 7.9 Host Kernel : 5.4.17-2136.309.3.el7uek.x86_64 #2 SMP Tue Jun 14 21:58:29 PDT 2022 Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-17.1.el7) OVMF/AAVMF: OVMF-1.6.2-2.el7.noarch libiscsi : libiscsi-1.19.0-1.el7.x86_64 Guest Kernel : 5.11.0-1028-ORACLE 1) Start Ubuntu 20.04/18.04 SEV guest with vfio-pci: /usr/bin/qemu-system-x86_64 -machine q35 -name OL20.04-uefi -enable-kvm -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio -vnc 0.0.0.0:0,to=999 -vga std -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly -drive file=OVMF_VARS.pure-efi.fd.ol20.04,index=1,if=pflash,format=raw -device virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true -drive file=/systest/atanveer/scripts/Ubuntu-20.04-2022.02.15-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 -net none -device vfio-pci,host=:21:10.1 -qmp tcp:127.0.0.1:3334,server,nowait -serial telnet:127.0.0.1:,server,nowait -D ./OL20.04-uefi.log -device virtio-rng-pci,disable-legacy=on,iommu_platform=true -object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine memory-encryption=sev0 2) Start a client guest OL/Ubuntu: /usr/bin/qemu-system-x86_64 -machine q35 -name OL18.04-uefi -enable-kvm -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio -vnc 0.0.0.0:0,to=999 -vga std -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly -drive file=OVMF_VARS.pure-efi.fd.ol18.04,index=1,if=pflash,format=raw -device virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true -drive file=/systest/atanveer/scripts/Ubuntu-18.04-2022.02.13-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 -net none -device vfio-pci,host=:21:10.2 -qmp tcp:127.0.0.1:,server,nowait -serial telnet:127.0.0.1:,server,nowait -D ./OL18.04-uefi.log -device virtio-rng-pci,disable-legacy=on,iommu_platform=true -object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine memory-encryption=sev0 3) Flush iptables on both the VMs using "iptables -F" 4) Start the iperf3 server on the first VM using "iperf3 -s" 5) Start the iperf3 client on the second VM using "iperf3 -c -4 -f M -i 0 -t 70 -O 10 -P 64" The kernel panic is seen on the first VM i.e. Ubuntu 20.04 with iperf3 also showing "Bad Address" error. Console logs: root@ubuntu-20-04:~# iperf3 -s --- Server listening on 5201 --- Accepted connection from 10.196.246.104, port 33732 [ 5] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33734 [ 8] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33736 [ 10] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33738 iperf3: error - unable to read from stream socket: Bad address --- Server listening on 5201 --- [ 91.083856] general protection fault: [#1] SMP NOPTI [ 91.084591] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.11.0-1028-oracle #31~20.04.1-Ubuntu [ 91.085393] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.6.2 06/01/2022 [ 91.086205] RIP: 0010:memcpy_erms+0x6/0x10 [ 91.086640] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1
[Kernel-packages] [Bug 1977919] Re: Docker container creation causes kernel oops on linux-aws 5.13.0.1028.31~20.04.22
Just tested this 5.13.0-1029.32~lp1977919.1 kernel and confirmed that it fixes the issue (doesn't crash when running the same docker container that would crash in the -1028 kernel) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1977919 Title: Docker container creation causes kernel oops on linux-aws 5.13.0.1028.31~20.04.22 Status in linux-aws package in Ubuntu: Confirmed Status in linux-gcp package in Ubuntu: Confirmed Bug description: Running the attached script on the latest AWS AMI for Ubuntu 20.04, I get a kernel panic and hard reset of the node. [ 12.314552] VFS: Close: file count is 0 [ 12.351090] [ cut here ] [ 12.351093] kernel BUG at include/linux/fs.h:3104! [ 12.355272] invalid opcode: [#1] SMP PTI [ 12.358963] CPU: 1 PID: 863 Comm: sed Not tainted 5.13.0-1028-aws #31~20.04.1-Ubuntu [ 12.366241] Hardware name: Amazon EC2 m5.large/, BIOS 1.0 10/16/2017 [ 12.371130] RIP: 0010:__fput+0x247/0x250 [ 12.374897] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48 [ 12.389075] RSP: 0018:b50280d9fd88 EFLAGS: 00010246 [ 12.393425] RAX: RBX: 000a801d RCX: 9152e0716000 [ 12.398679] RDX: 9152cf075280 RSI: 0001 RDI: [ 12.403879] RBP: b50280d9fdb0 R08: 0001 R09: 9152dfcba2c8 [ 12.409102] R10: b50280d9fd88 R11: 9152d04e9d10 R12: 9152d04e9d00 [ 12.414333] R13: 9152dfcba2c8 R14: 9152cf0752a0 R15: 9152dfc2e180 [ 12.419533] FS: () GS:9153ea90() knlGS: [ 12.426937] CS: 0010 DS: ES: CR0: 80050033 [ 12.431506] CR2: 556cf30250a8 CR3: bce10006 CR4: 007706e0 [ 12.436716] DR0: DR1: DR2: [ 12.441941] DR3: DR6: fffe0ff0 DR7: 0400 [ 12.447170] PKRU: 5554 [ 12.450355] Call Trace: [ 12.453408] [ 12.456296] fput+0xe/0x10 [ 12.459633] task_work_run+0x70/0xb0 [ 12.463157] do_exit+0x37b/0xaf0 [ 12.466570] do_group_exit+0x43/0xb0 [ 12.470142] __x64_sys_exit_group+0x18/0x20 [ 12.473989] do_syscall_64+0x61/0xb0 [ 12.477565] ? exit_to_user_mode_prepare+0x9b/0x1c0 [ 12.481734] ? do_user_addr_fault+0x1d0/0x650 [ 12.485665] ? irqentry_exit_to_user_mode+0x9/0x20 [ 12.489790] ? irqentry_exit+0x19/0x30 [ 12.493443] ? exc_page_fault+0x8f/0x170 [ 12.497199] ? asm_exc_page_fault+0x8/0x30 [ 12.501013] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 12.505289] RIP: 0033:0x7f80d42a1bd6 [ 12.508868] Code: Unable to access opcode bytes at RIP 0x7f80d42a1bac. [ 12.513783] RSP: 002b:7ffe924f9ed8 EFLAGS: 0246 ORIG_RAX: 00e7 [ 12.520897] RAX: ffda RBX: 7f80d45a4740 RCX: 7f80d42a1bd6 [ 12.526115] RDX: RSI: 003c RDI: [ 12.531328] RBP: R08: 00e7 R09: fe98 [ 12.536484] R10: 7f80d3d422a0 R11: 0246 R12: 7f80d45a4740 [ 12.541687] R13: 0002 R14: 7f80d45ad708 R15: [ 12.546916] [ 12.549829] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd parport_pc input_leds parport ena serio_raw sch_fq_codel ipmi_devintf ipmi_msghandler msr drm ip_tables x_tables autofs4 [ 12.583913] ---[ end trace 77367fed4d782aa4 ]--- [ 12.587963] RIP: 0010:__fput+0x247/0x250 [ 12.591729] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48 [ 12.605796] RSP: 0018:b50280d9fd88 EFLAGS: 00010246 [ 12.610166] RAX: RBX: 000a801d RCX: 9152e0716000 [ 12.615417] RDX: 9152cf075280 RSI: 0001 RDI: [ 12.620635] RBP: b50280d9fdb0 R08: 0001 R09: 9152dfcba2c8 [ 12.625878] R10: b50280d9fd88 R11: 9152d04e9d10 R12: 9152d04e9d00 [ 12.631121] R13: 9152dfcba2c8 R14: 9152cf0752a0 R15: 9152dfc2e180 [ 12.636358] FS: () GS:9153ea90() knlGS:
[Kernel-packages] [Bug 1944574] Re: EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
I believe the "EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary" that was reported on this bug is not what is preventing the VM from booting. If you look into the full log you provided, that message is logged both on the boot that succeeded and in the one that got stuck. I believe whatever was done in the VM in between these reboots, might have caused the problem you are observing. There's also a known problem on linux-oracle kernel that prevents you from seeing the serial console. That is being fixed for Focal images on the linux-oracle 5.13.0.1023.28~20.04.1 kernel, which is currenlty in focal-proposed. If this is something you can reproduce with Focal images, using the kernel from proposed might help you see what is really preventing the VM from starting. We'll review the 'kernel image not aligned on 64k boundary' error anyway to assess what might be causing it. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oracle in Ubuntu. https://bugs.launchpad.net/bugs/1944574 Title: EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary Status in linux-oracle package in Ubuntu: Confirmed Bug description: While reviewing some dkms failures on arm64 impish/linux-oracle, i noticed this: ... EFI stub: Booting Linux Kernel... EFI stub: EFI_RNG_PROTOCOL unavailable EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... ... and the VM doesn't come back. Full log here: https://autopkgtest.ubuntu.com/results/autopkgtest- impish/impish/arm64/b/backport-iwlwifi- dkms/20210921_165307_4f8e0@/log.gz To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-oracle/+bug/1944574/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1921104] Re: net/mlx5e: Add missing capability check for uplink follow
Another customer has provided positive feedback that it fixes the issue on Focal: 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1921104 Title: net/mlx5e: Add missing capability check for uplink follow Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Released Bug description: SRU Justification: == [Impact] * Since older firmware may not support the uplink state setting, this can lead to problems. * Now expose firmware indication that it supports setting eswitch uplink state to follow the physical link. * If a kernel without the backport is used on an adapter which does not have the latest adapter firmware, the adapter silently drops outgoing traffic. * This is a regression which was introduced with kernel 5.4.0-48. [Fix] * upstream fix (as in 5.11): 9c9be85f6b59d80efe4705109c0396df18d4e11d 9c9be85f6b59 "net/mlx5e: Add missing capability check for uplink follow" * backport for focal: https://launchpadlibrarian.net/529543695/0001 -Backport-net-mlx5e-Add-missing-capability-check-for-.patch * backport for groovy: https://launchpadlibrarian.net/529775887/0001 -Backport-groovy-net-mlx5e-Add-missing-capability-che.patch [Test Case] * Two IBM Z or LinuxONE systems, installed with Ubuntu Server 20.04 or 20.10 on LPAR, are needed. * Each with RoCE Express 2.x adapters (Mellanox ConnectX4/5) attached and firmware 16.29.1006 or earlier. * Assign an IP address to the adapters on both systems and try to ping one node from the other. * The ping will just fail with the stock Ubuntu kernels (not having the patch), but will succeed with kernels that incl. the patches (like the test builds from the PPA mentioned below). * Due to the lack of hardware this needs to be verified by IBM. [Regression Potential] * Undesired / erroneous behavior in case the modified if condition is assembled in a wrong way. * Again wrong behavior in case the modification of the capability bits in mlx5_ifc_cmd_hca_cap_bits are wrong. * All modification are limited to the mlx5 driver only. * The changes are relatively limited with effectively two lines removed and 4 added (three of them adjustments of the capability bits only). * The modifications were done and tested by IBM and reviewed by Mellanox (see LP comments), based on a PPA test build. [Other] * The above patch/commit was upstream accepted with kernel 5.11. * Hence the patch is not needed for hirsute, just needs to be SRUed for groovy and focal. * The commit couldn't be cleanly cherry-picked, mainly due to changed context, hence the backport(s). __ Expose firmware indication that it supports setting eswitch uplink state to follow (follow the physical link). Condition setting the eswitch uplink admin-state with this capability bit. Older FW may not support the uplink state setting. Available fix with kernel 5.11. https://github.com/torvalds/linux/commit/9c9be85f6b59d80efe4705109c0396df18d4e11d Now required for Ubuntu 20.04 via backport patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1921104/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring
Hi Po, IIUC this bug is related to commit 627450ac21f7f4a44b949c5d5e2c35829ff1784f, which is in 4.15.0-74, which I see now in -updates / -security. Isn't it completed yet? $ git tag --contains 627450ac21f7f4a44b949c5d5e2c35829ff1784f Ubuntu-4.15.0-73.82 Ubuntu-4.15.0-74.84 Ubuntu-raspi2-4.15.0-1053.57 Ubuntu-snapdragon-4.15.0-1070.77 $ rmadison linux-image-4.15.0-74-generic linux-image-4.15.0-74-generic | 4.15.0-74.83~16.04.1 | xenial-security | amd64, arm64, armhf, i386, ppc64el, s390x linux-image-4.15.0-74-generic | 4.15.0-74.83~16.04.1 | xenial-updates | amd64, arm64, armhf, i386, ppc64el, s390x linux-image-4.15.0-74-generic | 4.15.0-74.84 | bionic-security | amd64, arm64, armhf, i386, ppc64el, s390x linux-image-4.15.0-74-generic | 4.15.0-74.84 | bionic-updates | amd64, arm64, armhf, i386, ppc64el, s390x -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1852077 Title: Backport: bonding: fix state transition issue in link monitoring Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: In Progress Bug description: == Justification == From the well explained commit message: Since de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring"), the bonding driver has utilized two separate variables to indicate the next link state a particular slave should transition to. Each is used to communicate to a different portion of the link state change commit logic; one to the bond_miimon_commit function itself, and another to the state transition logic. Unfortunately, the two variables can become unsynchronized, resulting in incorrect link state transitions within bonding. This can cause slaves to become stuck in an incorrect link state until a subsequent carrier state transition. The issue occurs when a special case in bond_slave_netdev_event sets slave->link directly to BOND_LINK_FAIL. On the next pass through bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL case will set the proposed next state (link_new_state) to BOND_LINK_UP, but the new_link to BOND_LINK_DOWN. The setting of the final link state from new_link comes after that from link_new_state, and so the slave will end up incorrectly in _DOWN state. Resolve this by combining the two variables into one. == Fixes == * 1899bb32 (bonding: fix state transition issue in link monitoring) This patch can be cherry-picked into E/F For older releases like B/D, it will needs to be backported as they are missing the slave_err() printk marco added in 5237ff79 (bonding: add slave_foo printk macros) as well as the commit to replace netdev_err() with slave_err() in e2a7420d (bonding/main: convert to using slave printk macros) For Xenial, the commit that causes this issue, de77ecd4, does not exist. == Test == Test kernels can be found here: https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/ The X-hwe and Disco kernel were tested by the bug reporter, Aleksei, the patched kernel works as expected. == Regression Potential == Low. This patch just unify the variable used in link state change commit logic to prevent the occurrence of an incorrect state. And the changes are limited to the bonding driver itself. (Although the include/net/bonding.h will be used in other drivers, but the changes to that file is only affecting this bond_main.c driver) == Original Bug Report == There's an issue with bonding driver in the current ubuntu kernels. Sometimes one link stuck in a weird state. It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html in upstream. Commit 1899bb325149e481de31a4f32b59ea6f24e176ea. We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it should be reproducible with other current kernel versions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1793430] Re: Page leaking in cachefiles_read_backing_file while vmscan is active
** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1793430 Title: Page leaking in cachefiles_read_backing_file while vmscan is active Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Bug description: SRU Justification - [Description] In a heavily loaded system where the system pagecache is nearing memory limits and fscache is enabled, pages can be leaked by fscache while trying read pages from cachefiles backend. This can happen because two applications can be reading same page from a single mount, two threads can be trying to read the backing page at same time. This results in one of the thread finding that a page for the backing file or netfs file is already in the radix tree. During the error handling cachefiles does not cleanup the reference on backing page, leading to page leak. [Fix] The fix is straightforward, to decrement the reference when error is encounterd. [Testing] A user has tested the fix using following method for 12+ hrs. 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc :/export /mnt/nfs 2) create 1 files of 2.8MB in a NFS mount. 3) start a thread to simulate heavy VM presssure (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)& 4) start multiple parallel reader for data set at same time find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & .. .. find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ; free -h , cat /proc/meminfo and page-types -r -b lru to ensure all pages are freed. [Regression Potential] Limited to cachefiles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1793430/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1734327] Re: Kernel panic on a nfsroot system
** Changed in: linux (Ubuntu Artful) Status: Fix Committed => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1734327 Title: Kernel panic on a nfsroot system Status in linux package in Ubuntu: Fix Committed Status in linux source package in Artful: Confirmed Bug description: == SRU Justification == The following commit introduced a regression identified in bug 1734327: ac8f82a0b6d9 ("UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the remaining blobs") The regression causes a kernel panic to occur after multiple TCP connection creations/closures to the localhost. The bug was found using STAF RPC calls, but is easily reproducible with SSH. A revert of commit ac8f82a0b6d9 is needed to resolve this bug. However, commit 4ae2508f0bed also needs to be reverted because it depend on commit ac8f82a0b6d9. == Fix == Revert 4ae2508f0bed ("UBUNTU: SAUCE: LSM stacking: add stacking support to apparmor network hooks") Revert ac8f82a0b6d9 ("UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the remaining blobs") == Test Case == A test kernel was built with these two commits reverted and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Summary: Kernel panic occurs after multiple TCP connection creations/closures to the localhost. The bug was found using STAF RPC calls, but is easily reproducible with SSH. The bug doesn't appear on an identical virtual machine booting from the disk. The bug is not reproducible on a similarly-prepared Ubuntu 16.04 machine. The bug is reproducible using an older 4.13.0-16-generic kernel Reproducible on multiple hardware types. Unable to create a kernel memory dump due to makedumpfile errors. apport-bug save attached. NFSRoot boot options: vmlinuz initrd=initrd.img boot=nfs root=/dev/nfs nfsroot=190.0.0.254:/diskless/host/u1616/Ubuntu/17.10 intel_iommu=on net.ifnames=0 biosdevname=0 apparmor=0 ip=:eth0:dhcp blacklist=i40e,ixgbe,fm10k crashkernel=384M-:768M rw Software: OS: Ubuntu 17.10 Kernel: 4.13.0-17-generic x86_64 Reproduction steps: 1. Boot a system from a nfsroot 2. Configure password-less localhost ssh access 3. Run a loop: `while true; do ssh localhost 'uname -a'; done` 4. Wait for system to crash Trace: 4,1151,52372730,-;general protection fault: [#1] SMP 4,1152,52372771,-;Modules linked in: arc4 md4 rpcsec_gss_krb5 nls_utf8 auth_rpcgss cifs nfsv4 ccm ipmi_ssif intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp intel_cstate mei_me input_leds joydev intel_rapl_perf mei kvm_intel lpc_ich ioatdma kvm irqbypass ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ast ttm aesni_intel igb drm_kms_helper aes_x86_64 crypto_simd syscopyarea glue_helper 4,1153,52373251,c; sysfillrect dca cryptd sysimgblt i2c_algo_bit fb_sys_fops ahci ptp drm libahci pps_core wmi 4,1154,52373322,-;CPU: 11 PID: 1848 Comm: STAFProc Not tainted 4.13.0-17-generic #20-Ubuntu 4,1155,52373371,-;Hardware name: Supermicro Super Server/X10SRD-F, BIOS 2.0 12/17/2015 4,1156,52373418,-;task: 9d09267f5d00 task.stack: afddc3a7 4,1157,52373461,-;RIP: 0010:kfree+0x53/0x160 4,1158,52373486,-;RSP: 0018:9d092ecc3bc8 EFLAGS: 00010207 4,1159,52373521,-;RAX: RBX: 241c89490001 RCX: 0004 4,1160,52373566,-;RDX: 32d49081cc08 RSI: 00010080 RDI: 62fac000 4,1161,52373611,-;RBP: 9d092ecc3be0 R08: 0001f4c0 R09: 943bb839 4,1162,52373656,-;R10: 00904c789100 R11: R12: 9d09267ef000 4,1163,52373701,-;R13: 93fa155e R14: 9d09267ef000 R15: 9d09267ef000 4,1164,52373746,-;FS: 7f3a53313700() GS:9d092ecc() knlGS: 4,1165,52373797,-;CS: 0010 DS: ES: CR0: 80050033 4,1166,52373834,-;CR2: 7fd5c9ffa780 CR3: 0004666d7000 CR4: 003406e0 4,1167,52373878,-;DR0: DR1: DR2: 4,1168,52373923,-;DR3: DR6: fffe0ff0 DR7: 0400 4,1169,52373968,-;Call Trace: 4,1170,52373987,-; 4,1171,52374009,-; security_sk_free+0x3e/0x50 4,1172,52374042,-; __sk_destruct+0x108/0x190 4,1173,52374070,-; sk_destruct+0x20/0x30 4,1174,52374095,-; __sk_free+0x82/0xa0 4,1175,52374120,-; sk_free