from:"Fabio Augusto Miranda Martins"

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

2023-12-15 Thread Fabio Augusto Miranda Martins

For item 1:

 * Confirm that makedumpfile works as expected by triggering a kdump.

I can confirm that makedumpfile 1:1.6.7-1ubuntu2.5 from focal-
proposed/main worked well when I triggered a dump in a system:

ubuntu@fabio-small-makedumpfile:~$ sudo hostnamectl
   Static hostname: fabio-small-makedumpfile
 Icon name: computer-vm
   Chassis: vm
Machine ID: dee0adfb9aa54246b4d1e2fc62dd50f7
   Boot ID: adba6ba3977f4c758a7008013a7a6d1e
Virtualization: oracle
  Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.0-1049-oracle
  Architecture: x86-64
ubuntu@fabio-small-makedumpfile:~$ sudo kdump-config show
DUMP_MODE:kdump
USE_KDUMP:1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR:/var/crash
crashkernel addr: 0x2c00
0xfd7f00
   /boot/vmlinuz-5.15.0-1049-oracle
kdump initrd: 
   /boot/initrd.img-5.15.0-1049-oracle
current state:ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1049-oracle 
root=UUID=7d8611b4-d3e7-4f1a-a8f9-e1a7e5a2d2f9 ro console=tty1 console=ttyS0 
nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 
crash_kexec_post_notifiers reset_devices systemd.unit=kdump-tools-dump.service 
nr_cpus=1 irqpoll nousb" --initrd=/boot/initrd.img-5.15.0-1049-oracle 
/boot/vmlinuz-5.15.0-1049-oracle
ubuntu@fabio-small-makedumpfile:~$ sudo dpkg -l makedumpfile
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name   VersionArchitecture Description
+++-==-==--=
ii  makedumpfile   1:1.6.7-1ubuntu2.5 amd64VMcore extraction tool
ubuntu@fabio-small-makedumpfile:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.5
  Candidate: 1:1.6.7-1ubuntu2.5
  Version table:
 *** 1:1.6.7-1ubuntu2.5 500
500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 1:1.6.7-1ubuntu2.4 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main 
amd64 Packages
 1:1.6.7-1ubuntu2 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 
Packages


Output showing that it completed well:

[   54.490112] kdump-tools[676]: Starting kdump-tools:
[   54.876357] kdump-tools[686]:  * running makedumpfile -c -d 31 /proc/vmcore 
/var/crash/202312151524/dump-incomplete
Checking for memory holes : [100.0 %] \   [  
204.391465] reboot: Restarting system
   
And when I look at the crash, it's properly compressed (system had 1TB of RAM):

ubuntu@fabio-small-makedumpfile:~$ ls -lh /var/crash/202312151524
total 2.3G
-rw--- 1 root root 126K Dec 15 15:26 dmesg.202312151524
-rw--- 1 root root 2.3G Dec 15 15:26 dump.202312151524

Regards,
Fabio Martins

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1970672

Title:
  makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
  pmd_pte."

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Focal:
  Fix Committed

Bug description:
  [Impact] 
   * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with 
"__vtop4_x86_64: Can't get a valid pmd_pte."

   * makedumpfile falls back to cp for the dump, resulting in extremely
  large vmcores. This can impact both collection and analysis due to
  lack of space for the resulting vmcore.

   * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1:
  
https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6

  commit 646456862df8926ba10dd7330abf3bf0f887e1b6
  Author: Kazuhito Hagio 
  Date:   Wed May 26 14:31:26 2021 +0900

  [PATCH] Increase SECTION_MAP_LAST_BIT to 5
  
  * Required for kernel 5.12
  
  Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
  ZONE_DEVICE section collisions") added a section flag
  (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
  some machines like this:
  
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(e2bdc200) to physical 
address.
readmem: type_addr: 0, addr:e2bdc200, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
  
  Increase SECTION_MAP_LAST_BIT to 5 to fix this.  The bit had not
  been used until the change, so we can just increase the value.
  
  Signed-off-by: Kazuhito Hagio 

  [Test Plan]
   * Confirm that makedumpfile works as expected by triggering a kdump.

   * Confirm that the p

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

2023-12-15 Thread Fabio Augusto Miranda Martins

For item 2:

 * Confirm that the patched makedumpfile works as expected on a system
known to experience the issue.

Unfortunately I'm no longer able to reproduce the original issue.

Even running on the same hardware where this was originally noticed,
with the same kernel version (5.13.0-1027-oracle), makedumpfile from
focal-updates/main (1:1.6.7-1ubuntu2.4) is just working well:

[   53.223512] kdump-tools[693]: Starting kdump-tools:
[   53.623944] kdump-tools[702]:  * running makedumpfile -c -d 31 /proc/vmcore 
/var/crash/202312151415/dump-incomplete
Copying data  : [  196.965120] reboot: 
Restarting system
[ 22.0 %] | 

Unfortunately I don't have the information and I don't have access to
the original system to check what version of makedumpfile it was using
back then, so I could test the exact same makedumpfile+kernel versions.

I also tested kernel 5.13.0-1027-oracle + makedumpfile 1:1.6.7-1ubuntu2
from focal/main, and in this combinarion, makedumpfile fails with a
similar, but slightly different error, then falls back to cp:

[   53.721130] kdump-tools[690]: Starting kdump-tools:
[   54.121624] kdump-tools[699]:  * running makedumpfile -c -d 31 /proc/vmcore 
/var/crash/202312151434/dump-incomplete
[   54.249624] kdump-tools[719]: get_mm_sparsemem: Can't get the address of 
mem_section.
[   54.345410] kdump-tools[719]: The kernel version is not supported.
[   54.425405] kdump-tools[719]: The makedumpfile operation may be incomplete.
[   54.517391] kdump-tools[719]: makedumpfile Failed.
[   54.577916] kdump-tools[699]:  * kdump-tools: makedumpfile failed, falling 
back to 'cp'

However, using the latest makedumpfile from focal-updates/main
(1:1.6.7-1ubuntu2.4) fixes this situation, as mentioned / shown above.

Due to this reason, I can't conclude the item 2.

I'll work now on 1.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1970672

Title:
  makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
  pmd_pte."

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Focal:
  Fix Committed

Bug description:
  [Impact] 
   * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with 
"__vtop4_x86_64: Can't get a valid pmd_pte."

   * makedumpfile falls back to cp for the dump, resulting in extremely
  large vmcores. This can impact both collection and analysis due to
  lack of space for the resulting vmcore.

   * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1:
  
https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6

  commit 646456862df8926ba10dd7330abf3bf0f887e1b6
  Author: Kazuhito Hagio 
  Date:   Wed May 26 14:31:26 2021 +0900

  [PATCH] Increase SECTION_MAP_LAST_BIT to 5
  
  * Required for kernel 5.12
  
  Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
  ZONE_DEVICE section collisions") added a section flag
  (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
  some machines like this:
  
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(e2bdc200) to physical 
address.
readmem: type_addr: 0, addr:e2bdc200, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
  
  Increase SECTION_MAP_LAST_BIT to 5 to fix this.  The bit had not
  been used until the change, so we can just increase the value.
  
  Signed-off-by: Kazuhito Hagio 

  [Test Plan]
   * Confirm that makedumpfile works as expected by triggering a kdump.

   * Confirm that the patched makedumpfile works as expected on a system
  known to experience the issue.

   * Confirm that the patched makedumpfile is able to work with a cp-
  generated known affected vmcore to compress it. The unpatched version
  fails.

  [Where problems could occur]

   * This change could adversely affect the collection/compression of
  vmcores during a kdump situation resulting in fallback to cp.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1970672/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

2023-12-14 Thread Fabio Augusto Miranda Martins

Hi Chris,

You're correct, I'm sorry. My test on comment #23 is the 3rd item you
listed.

Let me work on 1 and 2 and I'll get back here.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1970672

Title:
  makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
  pmd_pte."

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Focal:
  Fix Committed

Bug description:
  [Impact] 
   * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with 
"__vtop4_x86_64: Can't get a valid pmd_pte."

   * makedumpfile falls back to cp for the dump, resulting in extremely
  large vmcores. This can impact both collection and analysis due to
  lack of space for the resulting vmcore.

   * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1:
  
https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6

  commit 646456862df8926ba10dd7330abf3bf0f887e1b6
  Author: Kazuhito Hagio 
  Date:   Wed May 26 14:31:26 2021 +0900

  [PATCH] Increase SECTION_MAP_LAST_BIT to 5
  
  * Required for kernel 5.12
  
  Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
  ZONE_DEVICE section collisions") added a section flag
  (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
  some machines like this:
  
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(e2bdc200) to physical 
address.
readmem: type_addr: 0, addr:e2bdc200, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
  
  Increase SECTION_MAP_LAST_BIT to 5 to fix this.  The bit had not
  been used until the change, so we can just increase the value.
  
  Signed-off-by: Kazuhito Hagio 

  [Test Plan]
   * Confirm that makedumpfile works as expected by triggering a kdump.

   * Confirm that the patched makedumpfile works as expected on a system
  known to experience the issue.

   * Confirm that the patched makedumpfile is able to work with a cp-
  generated known affected vmcore to compress it. The unpatched version
  fails.

  [Where problems could occur]

   * This change could adversely affect the collection/compression of
  vmcores during a kdump situation resulting in fallback to cp.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1970672/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

2023-12-12 Thread Fabio Augusto Miranda Martins

I've tested makedumpfile from -proposed on Focal and it looks good to
me.

Using a vmcore file with 2TB as an input:

- Original makedumpfile 1.6.7-1ubuntu2.4 fails:

ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.4
  Candidate: 1:1.6.7-1ubuntu2.4
  Version table:
 *** 1:1.6.7-1ubuntu2.4 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main 
amd64 Packages
100 /var/lib/dpkg/status
 1:1.6.7-1ubuntu2 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 
Packages


ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 
./vmcore.202204202351 ./dump-incomplete-fabio
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Checking for memory holes : [100.0 %] / 
 __vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(ecff8180) to physical address.
readmem: type_addr: 0, addr:ecff8180, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.

makedumpfile Failed.


- Makedumpfile 1.6.7-1ubuntu2.5 from proposed works:

ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.5
  Candidate: 1:1.6.7-1ubuntu2.5
  Version table:
 *** 1:1.6.7-1ubuntu2.5 500
500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 1:1.6.7-1ubuntu2.4 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main 
amd64 Packages
 1:1.6.7-1ubuntu2 500
500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 
Packages

ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 
./vmcore.202204202351 ./dump-incomplete-fabio
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Copying data  : [100.0 %] -   eta: 
0s

The dumpfile is saved to ./dump-incomplete-fabio.

makedumpfile Completed.

It reduced the dump file from 2TB down to 4.5G:

ubuntu@kdump-instance:/mnt/202204202351$ ls -lh vmcore.202204202351 
-r 1 ubuntu ubuntu 2.0T Apr 21  2022 vmcore.202204202351

ubuntu@kdump-instance:/mnt/202204202351$ ls -lh dump-incomplete-fabio 
-rw--- 1 ubuntu ubuntu 4.5G Dec 12 14:23 dump-incomplete-fabio

The reason for having a vmcore file with the size of the installed RAM
in the comment reported by Heather, is that you are forcing makedumpfile
to fail, by providing "-c -d 32" (which is a level that doesn't exist,
as the max is 31) or moving the makedumpfile binary away, so kdump fails
over to cp, which hence will produce the vmcore file with the size of
the installed RAM.

Let me know if this is enough to have focal verification concluded.

** Tags removed: verification-failed-focal verification-needed
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1970672

Title:
  makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
  pmd_pte."

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Focal:
  Fix Committed

Bug description:
  [Impact] 
   * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with 
"__vtop4_x86_64: Can't get a valid pmd_pte."

   * makedumpfile falls back to cp for the dump, resulting in extremely
  large vmcores. This can impact both collection and analysis due to
  lack of space for the resulting vmcore.

   * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1:
  
https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6

  commit 646456862df8926ba10dd7330abf3bf0f887e1b6
  Author: Kazuhito Hagio 
  Date:   Wed May 26 14:31:26 2021 +0900

  [PATCH] Increase SECTION_MAP_LAST_BIT to 5
  
  * Required for kernel 5.12
  
  Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
  ZONE_DEVICE section collisions") added a section flag
  (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
  some machines like this:
  
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(e2bdc200) to physical 
address.
readmem: type_addr: 0, addr:e2bdc200, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
  
  Increase SECTION_MAP_LAST_BIT to 5 to fix this.  The bit had not
  been used until the change, so we can just increase the value.
  
  Signed-off-by: Kazuhito Hagio 

  [Test Plan]
   * Confirm that makedumpfile works as expected by triggering a kdump.

   * Confirm that th

[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest

2023-06-19 Thread Fabio Augusto Miranda Martins

Verified a Focal guest as follows:

1. Reproduced the problem with kernel 5.4.0-152-generic:

https://pastebin.ubuntu.com/p/Cgj6j4Prbc/

2. As a workaround removed:

  
0x0003
  
  
3. Installed kernel from -proposed:

root@ubuntu:~# apt-cache policy linux-image-virtual linux-virtual
linux-image-virtual:
  Installed: 5.4.0.154.151
  Candidate: 5.4.0.154.151
  Version table:
 *** 5.4.0.154.151 500
500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 5.4.0.152.149 500
500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
 5.4.0.26.32 500
500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
linux-virtual:
  Installed: 5.4.0.154.151
  Candidate: 5.4.0.154.151
  Version table:
 *** 5.4.0.154.151 500
500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 5.4.0.152.149 500
500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
 5.4.0.26.32 500
500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages


4. Added back:

  
0x0003
  
  

5. Instance booted fine:

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 5.4.0-154-generic #171-Ubuntu SMP Fri Jun 16 16:29:04 UTC 2023 x86_
64 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu:~$ sudo dmesg | grep -i sev
[0.172491] AMD Secure Encrypted Virtualization (SEV) active
[5.318658] SVM: KVM is unsupported when running as an SEV guest

6. Full dmesg: https://paste.ubuntu.com/p/dP4Zp8pKfm/

** Tags removed: verification-needed-focal verification-needed-jammy
** Tags added: verification-done verification-done-focal verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2020319

Title:
  Encountering an issue with memcpy_fromio causing failed boot of SEV-
  enabled guest

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]
  When launching a SEV-enabled guest, the guest kernel panics with the 
following call trace,
  indicating a critical error in the system.

  ==
  [1.090638] software IO TLB: Memory encryption is active and system is 
using DMA bounce buffers
  [1.092105] Linux agpgart interface v0.103
  [1.092716] BUG: unable to handle page fault for address: 9b820003d068
  [1.093445] #PF: supervisor read access in kernel mode
  [1.093966] #PF: error_code(0x) - not-present page
  [1.094481] PGD 80010067 P4D 80010067 PUD 8001001d7067 PMD 
8001001da067 PTE 8000fed40173
  [1.094629] Oops:  [#1] SMP NOPTI
  [1.094629] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-46-generic 
#49-Ubuntu
  [1.094629] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
  [1.094629] RIP: 0010:memcpy_fromio+0x27/0x50
  [1.094629] Code: cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 85 d2 74 28 40 f6 
c6 01 75 30 48 83 fa 01 76 06 40 f6 c6 02 75 1c 48 89 d1 48 c1 e9 02  a5 f6 
c2 02 74 02 66 a5 f6 c2 01 74 01 a4 5d e9 14 b3 97 00 66
  [1.094629] RSP: 0018:9b820001ba50 EFLAGS: 00010212
  [1.094629] RAX: 9b820003d040 RBX: 9b820001bac0 RCX: 
0002
  [1.094629] RDX: 0008 RSI: 9b820003d068 RDI: 
9b820001ba90
  [1.094629] RBP: 9b820001ba50 R08: 0f80 R09: 
0f80
  [1.094629] R10: fed40080 R11: 9b820001bac0 R12: 
8cc7068eca48
  [1.094629] R13: 8cc700a64288 R14:  R15: 
fed40080
  [1.094629] FS:  () GS:8cc77bd0() 
knlGS:
  [1.094629] CS:  0010 DS:  ES:  CR0: 80050033
  [1.094629] CR2: 9b820003d068 CR3: 800174a1 CR4: 
00350ee0
  [1.094629] Call Trace:
  [1.094629]  
  [1.094629]  crb_map_io+0x315/0x870
  [1.094629]  ? radix_tree_iter_tag_clear+0x12/0x20
  [1.094629]  ? _raw_spin_unlock_irqrestore+0xe/0x30
  [1.094629]  crb_acpi_add+0xc2/0x140
  [1.094629]  acpi_device_probe+0x4c/0x170
  [1.094629]  really_probe+0x222/0x420
  [1.094629]  __driver_probe_device+0x119/0x190
  [1.094629]  driver_probe_device+0x23/0xc0
  [1.094629]  __driver_attach+0xbd/0x1e0
  [1.094629]  ? __device_attach_driver+0x120/0x120
  [1.094629]  bus_for_each_dev+0x7e/0xd0
  [1.094629]  driver_attach+0x1e/0x30
  [1.094629]  bus_add_driver+0x139/0x200
  [1.094629]  driver_register+0x95/0x100
  [1.094629]  ? init_tis+0xfd/0xfd
  [1.094629]  acpi_bus_register_driver+0x39/0x50

[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest

2023-06-19 Thread Fabio Augusto Miranda Martins

I've verified a Jammy guest as follows:

1. Reproduced the problem with kernel 5.15.0-75-generic:

https://pastebin.ubuntu.com/p/844W5SzjR8/


2. As a workaround removed:

  
0x0003
  
  
  
3. Installed kernel from -proposed:

root@ubuntu:~# apt-cache policy linux-image-virtual linux-virtual
linux-image-virtual:
  Installed: 5.15.0.77.75
  Candidate: 5.15.0.77.75
  Version table:
 *** 5.15.0.77.75 500
500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 5.15.0.75.73 500
500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
 5.15.0.25.27 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
linux-virtual:
  Installed: 5.15.0.77.75
  Candidate: 5.15.0.77.75
  Version table:
 *** 5.15.0.77.75 500
500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
100 /var/lib/dpkg/status
 5.15.0.75.73 500
500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
 5.15.0.25.27 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages


4. Added back:

  
0x0003
  
  
5. Instance booted fine:

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 5.15.0-77-generic #84-Ubuntu SMP Fri Jun 16 16:16:44 UTC 2023 x86_6
4 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu:~$ sudo dmesg | grep -i sev
[0.217323] AMD Memory Encryption Features active: SEV
[5.296555] SVM: KVM is unsupported when running as an SEV guest


6. Full dmesg: https://paste.ubuntu.com/p/5MDcKbVzPv/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2020319

Title:
  Encountering an issue with memcpy_fromio causing failed boot of SEV-
  enabled guest

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]
  When launching a SEV-enabled guest, the guest kernel panics with the 
following call trace,
  indicating a critical error in the system.

  ==
  [1.090638] software IO TLB: Memory encryption is active and system is 
using DMA bounce buffers
  [1.092105] Linux agpgart interface v0.103
  [1.092716] BUG: unable to handle page fault for address: 9b820003d068
  [1.093445] #PF: supervisor read access in kernel mode
  [1.093966] #PF: error_code(0x) - not-present page
  [1.094481] PGD 80010067 P4D 80010067 PUD 8001001d7067 PMD 
8001001da067 PTE 8000fed40173
  [1.094629] Oops:  [#1] SMP NOPTI
  [1.094629] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-46-generic 
#49-Ubuntu
  [1.094629] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
  [1.094629] RIP: 0010:memcpy_fromio+0x27/0x50
  [1.094629] Code: cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 85 d2 74 28 40 f6 
c6 01 75 30 48 83 fa 01 76 06 40 f6 c6 02 75 1c 48 89 d1 48 c1 e9 02  a5 f6 
c2 02 74 02 66 a5 f6 c2 01 74 01 a4 5d e9 14 b3 97 00 66
  [1.094629] RSP: 0018:9b820001ba50 EFLAGS: 00010212
  [1.094629] RAX: 9b820003d040 RBX: 9b820001bac0 RCX: 
0002
  [1.094629] RDX: 0008 RSI: 9b820003d068 RDI: 
9b820001ba90
  [1.094629] RBP: 9b820001ba50 R08: 0f80 R09: 
0f80
  [1.094629] R10: fed40080 R11: 9b820001bac0 R12: 
8cc7068eca48
  [1.094629] R13: 8cc700a64288 R14:  R15: 
fed40080
  [1.094629] FS:  () GS:8cc77bd0() 
knlGS:
  [1.094629] CS:  0010 DS:  ES:  CR0: 80050033
  [1.094629] CR2: 9b820003d068 CR3: 800174a1 CR4: 
00350ee0
  [1.094629] Call Trace:
  [1.094629]  
  [1.094629]  crb_map_io+0x315/0x870
  [1.094629]  ? radix_tree_iter_tag_clear+0x12/0x20
  [1.094629]  ? _raw_spin_unlock_irqrestore+0xe/0x30
  [1.094629]  crb_acpi_add+0xc2/0x140
  [1.094629]  acpi_device_probe+0x4c/0x170
  [1.094629]  really_probe+0x222/0x420
  [1.094629]  __driver_probe_device+0x119/0x190
  [1.094629]  driver_probe_device+0x23/0xc0
  [1.094629]  __driver_attach+0xbd/0x1e0
  [1.094629]  ? __device_attach_driver+0x120/0x120
  [1.094629]  bus_for_each_dev+0x7e/0xd0
  [1.094629]  driver_attach+0x1e/0x30
  [1.094629]  bus_add_driver+0x139/0x200
  [1.094629]  driver_register+0x95/0x100
  [1.094629]  ? init_tis+0xfd/0xfd
  [1.094629]  acpi_bus_register_driver+0x39/0x50
  [1.094629]  crb_acpi_driver_init+0x15/0x1b
  [1.094629]  do_one_initcall+0x48/0x1e0
  [1.094629]  do_initcalls+0x12f/0x159
  [1.094629]

[Kernel-packages] [Bug 1990167] Re: cma alloc failure in large 5.15 arm instances

2023-01-01 Thread Fabio Augusto Miranda Martins

I believe this patch might have been dropped for newer linux-aws
kernels. I just reproduced this problem while running 5.15.0-1026-aws

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1990167

Title:
  cma alloc failure in large 5.15 arm instances

Status in linux-aws package in Ubuntu:
  Invalid
Status in linux-aws source package in Jammy:
  Fix Committed
Status in linux-aws source package in Kinetic:
  Invalid

Bug description:
  When launching large arm64 instances on the focal or jammy ami, cma
  allocation errors appear in the dmesg out:

  [0.063255] cma: cma_alloc: reserved: alloc failed, req-size: 4096
  pages, ret: -12

  As far as I can tell, this does not impact instance launch in a
  meaningful way, but I am unsure of the other implications of this. I
  was able to confirm that these messages are only present in 5.15, as
  they do not show up in the bionic image, and rolling back focal to
  linux-aws 5.4 avoids them as well.

  This was present in at least 2 instance types and only appears to pop
  up in large sizes (2x4 does not produce them, 64x124 (c6gn.16xlarge)
  does)

  This could be as simple as just disabling CMA in the linux-aws pkg, as
  it appears this is already the case in linux-azure(LP:  #1949770).

  Attaching dmesg out to the report.

  
  # Replication
  + Launch a large arm64 instance (c6gn.16xlarge)
  + Observe the messages in kern.log / dmesg

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1990167/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1955655] Re: kernel-5.13.0-23-generic : Unable to boot when Secure Encrypted Virtualization( SEV) is enabled without setting swiotlb boot param

2022-09-15 Thread Fabio Augusto Miranda Martins

This is a grub bug and it is being tracked here:

[SRU] unable to boot guest with large memory when SEV is enabled on host
https://bugs.launchpad.net/ubuntu/+source/grub2-unsigned/+bug/1989446


** Changed in: grub2 (Ubuntu)
   Status: Confirmed => Invalid

** Changed in: grub2 (Ubuntu Impish)
   Status: Confirmed => Invalid

** Changed in: grub2 (Ubuntu Jammy)
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1955655

Title:
  kernel-5.13.0-23-generic : Unable to boot when Secure Encrypted
  Virtualization( SEV) is enabled without setting swiotlb boot param

Status in grub2 package in Ubuntu:
  Invalid
Status in linux package in Ubuntu:
  Invalid
Status in grub2 source package in Impish:
  Invalid
Status in linux source package in Impish:
  Invalid
Status in grub2 source package in Jammy:
  Invalid
Status in linux source package in Jammy:
  Invalid

Bug description:
  While investigating LP: #1955395 by using the -generic kernel image,
  it appeared that it is impossible to boot the kernel unless the boot
  parameter swiotlb is set to 512M (swiotlb=262144).

  Wnen not set, the kernel tries to adjust the bounce buffer to 1024Mb
  it fails and later trigger a kernel panic with the following trace :

  $ grep TLB /tmp/console.log
  [0.003665] software IO TLB: SWIOTLB bounce buffer size adjusted to 1024MB
  [0.034219] kvm-guest: KVM setup pv remote TLB flush
  [0.037063] software IO TLB: Cannot allocate buffer
  [0.223009] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
  [0.223634] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
  [0.297424] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  [0.297424] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
  [1.018860] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  [1.019552] software IO TLB: No low mem
  [1.451497] Kernel panic - not syncing: Can not allocate SWIOTLB buffer 
earlier and can't now provide you with the DMA bounce buffer
  [1.491589] ---[ end Kernel panic - not syncing: Can not allocate SWIOTLB 
buffer earlier and can't now provide you with the DMA bounce buffer ]---

  The SWIOTLB adjustment comes from the following kernel commit :
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e998879d4fb7991856916972168cf27c0d86ed12

  For some reason, the LowMem allocation fails (as seen by the "software
  IO TLB: No low mem" msg),hence the SWIOTLB adjustment cannot be
  completed.

  When booting with the swiotlb=262144 value, we get the following output :
  $ grep TLB /tmp/console.log

  [0.050908] kvm-guest: KVM setup pv remote TLB flush
  [0.308896] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
  [0.309494] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
  [0.373162] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  [0.373162] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
  [1.071136] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  [1.071837] software IO TLB: mapped [mem 
0x5bebe000-0x7bebe000] (512MB)
  [1.529804] software IO TLB: Memory encryption is active and system is 
using DMA bounce buffers

  
  For comparaison, the Fedora 34 kernel (5.15.4-101.fc34.x86_64) with the same 
adjustment mechanism does correctly adjust the SWIOTLB bounce buffer, without 
the need to set the swiotlb= value at boot time.

  The SWIOTLB buffer adjustment has been introduced in kernel 5.11.

  We can make SEV enabled resources available for testing if needed.

  ...Louis
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Dec 23 13:32 seq
   crw-rw 1 root audio 116, 33 Dec 23 13:32 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu74
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  DistroRelease: Ubuntu 22.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Scaleway SCW-ENT1-S
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-23-generic 
root=UUID=1f577236-6bf2-48ef-998a-ba45f71aca7f ro console=tty1 console=ttyS0 
swiotlb=262144
  ProcVersionSignature: Ubuntu 5.13.0-23.23-generic 5.13.19
  RelatedPackageVersions:
   linux-restricted-modules-5.13.0-23-generic

[Kernel-packages] [Bug 1983625] Re: 5.15.0-1013-oracle: Unable to boot large memory SEV guest without setting swiotlb parameter.

2022-09-15 Thread Fabio Augusto Miranda Martins

This is a grub bug and it is being tracked here:

[SRU] unable to boot guest with large memory when SEV is enabled on host
https://bugs.launchpad.net/ubuntu/+source/grub2-unsigned/+bug/1989446


** Changed in: linux (Ubuntu)
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1983625

Title:
  5.15.0-1013-oracle: Unable to boot large memory SEV guest without
  setting swiotlb parameter.

Status in linux package in Ubuntu:
  Invalid

Bug description:
  When launching a SEV Ubuntu22.04 guest with e.g. memory > 16876M, Ubuntu 
kernel
  5.15.0-1013-oracle panics unless swiotlb=262144 is specified on guest kernel
  parameters. It seems that the kernel tries to adjust swiotlb buffer size but
  can not do that and crashes. With a memory size such as 8G, the guest boots 
fine.

  HOST INFO

  Host type : OCI Bare-Metal Server
  Server/Machine: ORACLE SERVER E2-2c
  CPU model : AMD EPYC 7742 64-Core Processor
  Architecture  : x86_64
  Hostname  : atanveer-amd-sme
  OS: Oracle Linux Server release 7.9
  Kernel: 5.4.17-2136.309.4.el7uek.x86_64 #2 SMP Tue Jun 28 17:35:13 
PDT 2022
  Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-18.oci.el7)
  OVMF/AAVMF: OVMF-1.6.3-1.el7.noarch

  Qemu command to launch SEV guest:

  /bin/qemu-system-x86_64 -name OL22.04-uefi \
  -machine q35 \
  -enable-kvm \
  -cpu host,+host-phys-bits \
  -m 16877M \
  -smp 8,maxcpus=240 \
  -D ./22.04-uefi.log \
  -nodefaults \
  -monitor stdio \
  -vnc 0.0.0.0:0,to=999 \
  -vga std \
  -drive 
file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly
 \
  -drive file=OVMF_VARS.pure-efi.fd.ol22.04,index=1,if=pflash,format=raw \
  -device 
virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true \
  -drive 
file=Ubuntu-22.04-2022.06.16-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk
 \
  -device ide-hd,drive=local_disk0,id=local_disk1,bootindex=0 \
  -qmp tcp:127.0.0.1:3334,server,nowait \
  -serial telnet:127.0.0.1:,server,nowait \
  -device virtio-rng-pci,disable-legacy=on,iommu_platform=true \
  -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \
  -machine memory-encryption=sev0

  Console log:

  [0.005025] software IO TLB: SWIOTLB bounce buffer size adjusted to 1011MB
  [0.033881] kvm-guest: KVM setup pv remote TLB flush
  [0.054931] software IO TLB: Cannot allocate buffer
  [0.248933] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
  [0.249582] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
  [0.317440] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  [0.317440] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
  [0.424952] iommu: DMA domain TLB invalidation policy: lazy mode
  [0.570923] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  [0.571669] software IO TLB: No low mem
  .
  .
  .
  .
  [1.233515] ata1: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010100 
irq 28
  [1.234985] ata2: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010180 
irq 28
  [1.236464] ata3: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010200 
irq 28
  [1.237863] ata4: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010280 
irq 28
  [1.239257] ata5: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010300 
irq 28
  [1.240659] ata6: SATA max UDMA/133 abar m4096@0xc101 port 0xc1010380 
irq 28
  [1.555165] ata5: SATA link down (SStatus 0 SControl 300)
  [1.556661] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [1.558232] ata2: SATA link down (SStatus 0 SControl 300)
  [1.559728] ata4: SATA link down (SStatus 0 SControl 300)
  [1.560996] ata3: SATA link down (SStatus 0 SControl 300)
  [1.562134] ata1: SATA link down (SStatus 0 SControl 300)
  [1.563566] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, 
err_mask=0x80)
  [6.911450] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [6.912906] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, 
err_mask=0x80)
  [   12.288045] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [   12.289904] ata6.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, 
err_mask=0x80)
  [   17.663548] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [  242.883993] INFO: task kworker/u480:0:9 blocked for more than 120 seconds.
  [  242.885619]   Not tainted 5.15.0-1013-oracle #17-Ubuntu
  [  242.886743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  242.888198] task:kworker/u480:0  state:D stack:0 pid:9 ppid: 2 
flags:0x4000
  [  242.889703] Workqueue: events_unbound async_run_entry_fn
  [  242.890882] Call Trace:
  [  242.891727]  

  
  Full console log is attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+so

[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used

2022-07-21 Thread Fabio Augusto Miranda Martins

Hi Si-Wei, the 5.11 kernel has reached EOL in Feb 2022. Kernel 5.15 is
the one currently being used for linux-oracle kernel on Focal (20.04)
and Jammy (22.04), and it has the commit that you mentioned above:

$ git log --oneline | grep -i "Fix page DMA map/unmap attributes"
a865fe280b96 net/mlx5e: Fix page DMA map/unmap attributes

$ git tag --contains a865fe280b96
Ubuntu-oracle-5.15.0-1001.1
Ubuntu-oracle-5.15.0-1001.2
Ubuntu-oracle-5.15.0-1001.3
Ubuntu-oracle-5.15.0-1002.4
Ubuntu-oracle-5.15.0-1003.5
Ubuntu-oracle-5.15.0-1004.6
Ubuntu-oracle-5.15.0-1005.7
Ubuntu-oracle-5.15.0-1006.8
Ubuntu-oracle-5.15.0-1007.9
Ubuntu-oracle-5.15.0-1009.12
Ubuntu-oracle-5.15.0-1011.15
Ubuntu-oracle-5.15.0-1012.16
Ubuntu-oracle-5.15.0-1013.17

Can you test a guest running 5.15 to see if this addresses the problem?

Regards,
Fabio Martins

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oracle-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1980884

Title:
  ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF
  is used

Status in linux-oracle-5.11 package in Ubuntu:
  New

Bug description:
  Guest kernel panic can be observed when Ubuntu SEV guest with mlx5 vfio-pci 
is started
  as iperf3 server using "iperf3 -s" and as soon as the client tries to connect
  with it.

  Steps to reproduce:

  HOST INFO
  Host type : OCI (Oracle Cloud) Bare-Metal Server
  Server/Machine: ORACLE SERVER E4-2c
  CPU model : AMD EPYC 7J13 64-Core Processor
  Architecture  : x86_64
  Host OS   : Oracle Linux Server release 7.9
  Host Kernel   : 5.4.17-2136.309.3.el7uek.x86_64 #2 SMP Tue Jun 14 21:58:29 
PDT 2022
  Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-17.1.el7)
  OVMF/AAVMF: OVMF-1.6.2-2.el7.noarch
  libiscsi  : libiscsi-1.19.0-1.el7.x86_64
  Guest Kernel  : 5.11.0-1028-ORACLE

  1) Start Ubuntu 20.04/18.04 SEV guest with vfio-pci:

  /usr/bin/qemu-system-x86_64 -machine q35 -name OL20.04-uefi -enable-kvm
  -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio
  -vnc 0.0.0.0:0,to=999 -vga std -drive
  
file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly
 -drive file=OVMF_VARS.pure-efi.fd.ol20.04,index=1,if=pflash,format=raw
  -device
  virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true
  -drive
  
file=/systest/atanveer/scripts/Ubuntu-20.04-2022.02.15-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk
 -device
  ide-hd,drive=local_disk0,id=local_disk1,bootindex=0  -net none -device
  vfio-pci,host=:21:10.1 -qmp tcp:127.0.0.1:3334,server,nowait -serial
  telnet:127.0.0.1:,server,nowait  -D ./OL20.04-uefi.log -device
  virtio-rng-pci,disable-legacy=on,iommu_platform=true -object
  sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine
  memory-encryption=sev0

  2) Start a client guest OL/Ubuntu:

  /usr/bin/qemu-system-x86_64 -machine q35 -name OL18.04-uefi -enable-kvm
  -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio
  -vnc 0.0.0.0:0,to=999 -vga std -drive
  
file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly
 -drive file=OVMF_VARS.pure-efi.fd.ol18.04,index=1,if=pflash,format=raw
  -device
  virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true
  -drive
  
file=/systest/atanveer/scripts/Ubuntu-18.04-2022.02.13-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk
 -device
  ide-hd,drive=local_disk0,id=local_disk1,bootindex=0  -net none -device
  vfio-pci,host=:21:10.2 -qmp tcp:127.0.0.1:,server,nowait -serial
  telnet:127.0.0.1:,server,nowait  -D ./OL18.04-uefi.log -device
  virtio-rng-pci,disable-legacy=on,iommu_platform=true -object
  sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine
  memory-encryption=sev0

  3) Flush iptables on both the VMs using "iptables -F"

  4) Start the iperf3 server on the first VM using "iperf3 -s"

  5) Start the iperf3 client on the second VM using "iperf3 -c  -4
  -f M -i 0 -t 70 -O 10 -P 64"

  The kernel panic is seen on the first VM i.e. Ubuntu 20.04 with iperf3 also
  showing "Bad Address" error.

  Console logs:

  root@ubuntu-20-04:~# iperf3 -s
  ---
  Server listening on 5201
  ---
  Accepted connection from 10.196.246.104, port 33732
  [  5] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33734
  [  8] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33736
  [ 10] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33738
  iperf3: error - unable to read from stream socket: Bad address
  ---
  Server listening on 5201
  ---
  [   91.083856] general protection fault:  [#1] SMP NOPTI
  [   91

[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used

2022-07-21 Thread Fabio Augusto Miranda Martins

That is also available in the 5.4 kernel, so that also covers Bionic
(18.04) guests if needed:

$ git log --oneline | grep -i "Fix page DMA map/unmap attributes"
53176ef0d809 net/mlx5e: Fix page DMA map/unmap attributes

$ git tag --contains 53176ef0d809
Ubuntu-oracle-5.4.0-1071.77
Ubuntu-oracle-5.4.0-1072.78
Ubuntu-oracle-5.4.0-1073.79
Ubuntu-oracle-5.4.0-1074.80
Ubuntu-oracle-5.4.0-1076.83
Ubuntu-oracle-5.4.0-1078.86
Ubuntu-oracle-5.4.0-1079.87
Ubuntu-oracle-5.4.0-1080.88

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oracle-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1980884

Title:
  ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF
  is used

Status in linux-oracle-5.11 package in Ubuntu:
  New

Bug description:
  Guest kernel panic can be observed when Ubuntu SEV guest with mlx5 vfio-pci 
is started
  as iperf3 server using "iperf3 -s" and as soon as the client tries to connect
  with it.

  Steps to reproduce:

  HOST INFO
  Host type : OCI (Oracle Cloud) Bare-Metal Server
  Server/Machine: ORACLE SERVER E4-2c
  CPU model : AMD EPYC 7J13 64-Core Processor
  Architecture  : x86_64
  Host OS   : Oracle Linux Server release 7.9
  Host Kernel   : 5.4.17-2136.309.3.el7uek.x86_64 #2 SMP Tue Jun 14 21:58:29 
PDT 2022
  Hypervisor: QEMU emulator version 4.2.1 (qemu-4.2.1-17.1.el7)
  OVMF/AAVMF: OVMF-1.6.2-2.el7.noarch
  libiscsi  : libiscsi-1.19.0-1.el7.x86_64
  Guest Kernel  : 5.11.0-1028-ORACLE

  1) Start Ubuntu 20.04/18.04 SEV guest with vfio-pci:

  /usr/bin/qemu-system-x86_64 -machine q35 -name OL20.04-uefi -enable-kvm
  -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio
  -vnc 0.0.0.0:0,to=999 -vga std -drive
  
file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly
 -drive file=OVMF_VARS.pure-efi.fd.ol20.04,index=1,if=pflash,format=raw
  -device
  virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true
  -drive
  
file=/systest/atanveer/scripts/Ubuntu-20.04-2022.02.15-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk
 -device
  ide-hd,drive=local_disk0,id=local_disk1,bootindex=0  -net none -device
  vfio-pci,host=:21:10.1 -qmp tcp:127.0.0.1:3334,server,nowait -serial
  telnet:127.0.0.1:,server,nowait  -D ./OL20.04-uefi.log -device
  virtio-rng-pci,disable-legacy=on,iommu_platform=true -object
  sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine
  memory-encryption=sev0

  2) Start a client guest OL/Ubuntu:

  /usr/bin/qemu-system-x86_64 -machine q35 -name OL18.04-uefi -enable-kvm
  -nodefaults -cpu host,+host-phys-bits -m 8G -smp 8,maxcpus=240 -monitor stdio
  -vnc 0.0.0.0:0,to=999 -vga std -drive
  
file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,index=0,if=pflash,format=raw,readonly
 -drive file=OVMF_VARS.pure-efi.fd.ol18.04,index=1,if=pflash,format=raw
  -device
  virtio-scsi-pci,id=virtio-scsi-pci0,disable-legacy=on,iommu_platform=true
  -drive
  
file=/systest/atanveer/scripts/Ubuntu-18.04-2022.02.13-0-uefi-x86_64.qcow2,if=none,id=local_disk0,format=qcow2,media=disk
 -device
  ide-hd,drive=local_disk0,id=local_disk1,bootindex=0  -net none -device
  vfio-pci,host=:21:10.2 -qmp tcp:127.0.0.1:,server,nowait -serial
  telnet:127.0.0.1:,server,nowait  -D ./OL18.04-uefi.log -device
  virtio-rng-pci,disable-legacy=on,iommu_platform=true -object
  sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -machine
  memory-encryption=sev0

  3) Flush iptables on both the VMs using "iptables -F"

  4) Start the iperf3 server on the first VM using "iperf3 -s"

  5) Start the iperf3 client on the second VM using "iperf3 -c  -4
  -f M -i 0 -t 70 -O 10 -P 64"

  The kernel panic is seen on the first VM i.e. Ubuntu 20.04 with iperf3 also
  showing "Bad Address" error.

  Console logs:

  root@ubuntu-20-04:~# iperf3 -s
  ---
  Server listening on 5201
  ---
  Accepted connection from 10.196.246.104, port 33732
  [  5] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33734
  [  8] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33736
  [ 10] local 10.196.247.88 port 5201 connected to 10.196.246.104 port 33738
  iperf3: error - unable to read from stream socket: Bad address
  ---
  Server listening on 5201
  ---
  [   91.083856] general protection fault:  [#1] SMP NOPTI
  [   91.084591] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.11.0-1028-oracle
  #31~20.04.1-Ubuntu
  [   91.085393] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.6.2
  06/01/2022
  [   91.086205] RIP: 0010:memcpy_erms+0x6/0x10
  [   91.086640] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03
  83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1

[Kernel-packages] [Bug 1977919] Re: Docker container creation causes kernel oops on linux-aws 5.13.0.1028.31~20.04.22

2022-06-08 Thread Fabio Augusto Miranda Martins

Just tested this 5.13.0-1029.32~lp1977919.1 kernel and confirmed that it
fixes the issue (doesn't crash when running the same docker container
that would crash in the -1028 kernel)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1977919

Title:
  Docker container creation causes kernel oops on linux-aws
  5.13.0.1028.31~20.04.22

Status in linux-aws package in Ubuntu:
  Confirmed
Status in linux-gcp package in Ubuntu:
  Confirmed

Bug description:
  Running the attached script on the latest AWS AMI for Ubuntu 20.04, I
  get a kernel panic and hard reset of the node.

  [   12.314552] VFS: Close: file count is 0
  [   12.351090] [ cut here ]
  [   12.351093] kernel BUG at include/linux/fs.h:3104!
  [   12.355272] invalid opcode:  [#1] SMP PTI
  [   12.358963] CPU: 1 PID: 863 Comm: sed Not tainted 5.13.0-1028-aws 
#31~20.04.1-Ubuntu
  [   12.366241] Hardware name: Amazon EC2 m5.large/, BIOS 1.0 10/16/2017
  [   12.371130] RIP: 0010:__fput+0x247/0x250
  [   12.374897] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff 
e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 
1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
  [   12.389075] RSP: 0018:b50280d9fd88 EFLAGS: 00010246
  [   12.393425] RAX:  RBX: 000a801d RCX: 
9152e0716000
  [   12.398679] RDX: 9152cf075280 RSI: 0001 RDI: 

  [   12.403879] RBP: b50280d9fdb0 R08: 0001 R09: 
9152dfcba2c8
  [   12.409102] R10: b50280d9fd88 R11: 9152d04e9d10 R12: 
9152d04e9d00
  [   12.414333] R13: 9152dfcba2c8 R14: 9152cf0752a0 R15: 
9152dfc2e180
  [   12.419533] FS:  () GS:9153ea90() 
knlGS:
  [   12.426937] CS:  0010 DS:  ES:  CR0: 80050033
  [   12.431506] CR2: 556cf30250a8 CR3: bce10006 CR4: 
007706e0
  [   12.436716] DR0:  DR1:  DR2: 

  [   12.441941] DR3:  DR6: fffe0ff0 DR7: 
0400
  [   12.447170] PKRU: 5554
  [   12.450355] Call Trace:
  [   12.453408]  
  [   12.456296]  fput+0xe/0x10
  [   12.459633]  task_work_run+0x70/0xb0
  [   12.463157]  do_exit+0x37b/0xaf0
  [   12.466570]  do_group_exit+0x43/0xb0
  [   12.470142]  __x64_sys_exit_group+0x18/0x20
  [   12.473989]  do_syscall_64+0x61/0xb0
  [   12.477565]  ? exit_to_user_mode_prepare+0x9b/0x1c0
  [   12.481734]  ? do_user_addr_fault+0x1d0/0x650
  [   12.485665]  ? irqentry_exit_to_user_mode+0x9/0x20
  [   12.489790]  ? irqentry_exit+0x19/0x30
  [   12.493443]  ? exc_page_fault+0x8f/0x170
  [   12.497199]  ? asm_exc_page_fault+0x8/0x30
  [   12.501013]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [   12.505289] RIP: 0033:0x7f80d42a1bd6
  [   12.508868] Code: Unable to access opcode bytes at RIP 0x7f80d42a1bac.
  [   12.513783] RSP: 002b:7ffe924f9ed8 EFLAGS: 0246 ORIG_RAX: 
00e7
  [   12.520897] RAX: ffda RBX: 7f80d45a4740 RCX: 
7f80d42a1bd6
  [   12.526115] RDX:  RSI: 003c RDI: 

  [   12.531328] RBP:  R08: 00e7 R09: 
fe98
  [   12.536484] R10: 7f80d3d422a0 R11: 0246 R12: 
7f80d45a4740
  [   12.541687] R13: 0002 R14: 7f80d45ad708 R15: 

  [   12.546916]  
  [   12.549829] Modules linked in: xt_conntrack xt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter 
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 
bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul ppdev crc32_pclmul 
ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd parport_pc 
input_leds parport ena serio_raw sch_fq_codel ipmi_devintf ipmi_msghandler msr 
drm ip_tables x_tables autofs4
  [   12.583913] ---[ end trace 77367fed4d782aa4 ]---
  [   12.587963] RIP: 0010:__fput+0x247/0x250
  [   12.591729] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff 
e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 
1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
  [   12.605796] RSP: 0018:b50280d9fd88 EFLAGS: 00010246
  [   12.610166] RAX:  RBX: 000a801d RCX: 
9152e0716000
  [   12.615417] RDX: 9152cf075280 RSI: 0001 RDI: 

  [   12.620635] RBP: b50280d9fdb0 R08: 0001 R09: 
9152dfcba2c8
  [   12.625878] R10: b50280d9fd88 R11: 9152d04e9d10 R12: 
9152d04e9d00
  [   12.631121] R13: 9152dfcba2c8 R14: 9152cf0752a0 R15: 
9152dfc2e180
  [   12.636358] FS:  () GS:9153ea90() 
knlGS:

[Kernel-packages] [Bug 1944574] Re: EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary

2022-03-31 Thread Fabio Augusto Miranda Martins

I believe the "EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned
on 64k boundary" that was reported on this bug is not what is preventing
the VM from booting. If you look into the full log you provided, that
message is logged both on the boot that succeeded and in the one that
got stuck.

I believe whatever was done in the VM in between these reboots, might
have caused the problem you are observing.

There's also a known problem on linux-oracle kernel that prevents you
from seeing the serial console. That is being fixed for Focal images on
the linux-oracle 5.13.0.1023.28~20.04.1 kernel, which is currenlty in
focal-proposed. If this is something you can reproduce with Focal
images, using the kernel from proposed might help you see what is really
preventing the VM from starting.

We'll review the 'kernel image not aligned on 64k boundary' error anyway
to assess what might be causing it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oracle in Ubuntu.
https://bugs.launchpad.net/bugs/1944574

Title:
  EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k
  boundary

Status in linux-oracle package in Ubuntu:
  Confirmed

Bug description:
  While reviewing some dkms failures on arm64 impish/linux-oracle, i
  noticed this:

  ...
  EFI stub: Booting Linux Kernel...
  EFI stub: EFI_RNG_PROTOCOL unavailable
  EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
  EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region
  EFI stub: Using DTB from configuration table
  EFI stub: Exiting boot services and installing virtual address map...
  ...

  and the VM doesn't come back.

  Full log here: https://autopkgtest.ubuntu.com/results/autopkgtest-
  impish/impish/arm64/b/backport-iwlwifi-
  dkms/20210921_165307_4f8e0@/log.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oracle/+bug/1944574/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1921104] Re: net/mlx5e: Add missing capability check for uplink follow

2021-04-21 Thread Fabio Augusto Miranda Martins

Another customer has provided positive feedback that it fixes the issue
on Focal:

5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1921104

Title:
  net/mlx5e: Add missing capability check for uplink follow

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Released

Bug description:
  SRU Justification:
  ==

  [Impact]

  * Since older firmware may not support the uplink state setting, this
  can lead to problems.

  * Now expose firmware indication that it supports setting eswitch
  uplink state to follow the physical link.

  * If a kernel without the backport is used on an adapter which does
  not have the latest adapter firmware, the adapter silently drops
  outgoing traffic.

  * This is a regression which was introduced with kernel 5.4.0-48.

  [Fix]

  * upstream fix (as in 5.11):
    9c9be85f6b59d80efe4705109c0396df18d4e11d 9c9be85f6b59 "net/mlx5e: Add 
missing capability check for uplink follow"

  * backport for focal: https://launchpadlibrarian.net/529543695/0001
  -Backport-net-mlx5e-Add-missing-capability-check-for-.patch

  * backport for groovy: https://launchpadlibrarian.net/529775887/0001
  -Backport-groovy-net-mlx5e-Add-missing-capability-che.patch

  [Test Case]

  * Two IBM Z or LinuxONE systems, installed with Ubuntu Server 20.04 or
  20.10 on LPAR, are needed.

  * Each with RoCE Express 2.x adapters (Mellanox ConnectX4/5) attached
  and firmware 16.29.1006 or earlier.

  * Assign an IP address to the adapters on both systems and try to ping
  one node from the other.

  * The ping will just fail with the stock Ubuntu kernels (not having
  the patch), but will succeed with kernels that incl. the patches (like
  the test builds from the PPA mentioned below).

  * Due to the lack of hardware this needs to be verified by IBM.

  [Regression Potential]

  * Undesired / erroneous behavior in case the modified if condition is
  assembled in a wrong way.

  * Again wrong behavior in case the modification of the capability bits
  in mlx5_ifc_cmd_hca_cap_bits are wrong.

  * All modification are limited to the mlx5 driver only.

  * The changes are relatively limited with effectively two lines
  removed and 4 added (three of them adjustments of the capability bits
  only).

  * The modifications were done and tested by IBM and reviewed by
  Mellanox (see LP comments),   based on a PPA test build.

  [Other]

  * The above patch/commit was upstream accepted with kernel 5.11.

  * Hence the patch is not needed for hirsute, just needs to be SRUed
  for groovy and focal.

  * The commit couldn't be cleanly cherry-picked, mainly due to changed
  context, hence the backport(s).

  __

  Expose firmware indication that it supports setting eswitch uplink state
  to follow (follow the physical link). Condition setting the eswitch
  uplink admin-state with this capability bit. Older FW may not support
  the uplink state setting.

  Available fix with kernel 5.11.
  
https://github.com/torvalds/linux/commit/9c9be85f6b59d80efe4705109c0396df18d4e11d

  Now required for Ubuntu 20.04 via backport patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1921104/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring

2020-01-10 Thread Fabio Augusto Miranda Martins

Hi Po,

IIUC this bug is related to commit
627450ac21f7f4a44b949c5d5e2c35829ff1784f, which is in 4.15.0-74,  which
I see now in -updates / -security. Isn't it completed yet?

$ git tag --contains 627450ac21f7f4a44b949c5d5e2c35829ff1784f
Ubuntu-4.15.0-73.82
Ubuntu-4.15.0-74.84
Ubuntu-raspi2-4.15.0-1053.57
Ubuntu-snapdragon-4.15.0-1070.77


$ rmadison linux-image-4.15.0-74-generic
 linux-image-4.15.0-74-generic | 4.15.0-74.83~16.04.1 | xenial-security | 
amd64, arm64, armhf, i386, ppc64el, s390x
 linux-image-4.15.0-74-generic | 4.15.0-74.83~16.04.1 | xenial-updates  | 
amd64, arm64, armhf, i386, ppc64el, s390x
 linux-image-4.15.0-74-generic | 4.15.0-74.84 | bionic-security | 
amd64, arm64, armhf, i386, ppc64el, s390x
 linux-image-4.15.0-74-generic | 4.15.0-74.84 | bionic-updates  | 
amd64, arm64, armhf, i386, ppc64el, s390x

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1852077

Title:
  Backport: bonding: fix state transition issue in link monitoring

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  In Progress

Bug description:
  == Justification ==
  From the well explained commit message:

  Since de77ecd4ef02 ("bonding: improve link-status update in
  mii-monitoring"), the bonding driver has utilized two separate variables
  to indicate the next link state a particular slave should transition to.
  Each is used to communicate to a different portion of the link state
  change commit logic; one to the bond_miimon_commit function itself, and
  another to the state transition logic.

   Unfortunately, the two variables can become unsynchronized,
  resulting in incorrect link state transitions within bonding.  This can
  cause slaves to become stuck in an incorrect link state until a
  subsequent carrier state transition.

   The issue occurs when a special case in bond_slave_netdev_event
  sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
  bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
  case will set the proposed next state (link_new_state) to BOND_LINK_UP,
  but the new_link to BOND_LINK_DOWN.  The setting of the final link state
  from new_link comes after that from link_new_state, and so the slave
  will end up incorrectly in _DOWN state.

   Resolve this by combining the two variables into one.

  == Fixes ==
  * 1899bb32 (bonding: fix state transition issue in link monitoring)

  This patch can be cherry-picked into E/F

  For older releases like B/D, it will needs to be backported as they are
  missing the slave_err() printk marco added in 5237ff79 (bonding: add
  slave_foo printk macros) as well as the commit to replace netdev_err()
  with slave_err() in e2a7420d (bonding/main: convert to using slave
  printk macros)

  For Xenial, the commit that causes this issue, de77ecd4, does not
  exist.

  == Test ==
  Test kernels can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/

  The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
  the patched kernel works as expected.

  == Regression Potential ==
  Low.
  This patch just unify the variable used in link state change commit
  logic to prevent the occurrence of an incorrect state. And the changes
  are limited to the bonding driver itself.

  (Although the include/net/bonding.h will be used in other drivers, but
  the changes to that file is only affecting this bond_main.c driver)

  == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.

  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1793430] Re: Page leaking in cachefiles_read_backing_file while vmscan is active

2018-10-10 Thread Fabio Augusto Miranda Martins

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1793430

Title:
  Page leaking in cachefiles_read_backing_file while vmscan is active

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Description]
  In a heavily loaded system where the system pagecache is nearing memory 
limits and fscache is enabled, pages can be leaked by fscache while trying read 
pages from cachefiles backend. This can happen because two applications can be 
reading same page from a single mount, two threads can be trying to read the 
backing page at same time. This results in one of the thread finding that a 
page for the backing file or netfs file is already in the radix tree. During 
the error handling cachefiles does not cleanup the reference on backing page, 
leading to page leak.
  
  [Fix]
  The fix is straightforward, to decrement the reference when error is 
encounterd.
  
  [Testing]
  A user has tested the fix using following method for 12+ hrs.
  
  1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc :/export /mnt/nfs
  2) create 1 files of 2.8MB in a NFS mount.
  3) start a thread to simulate heavy VM presssure
 (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
  4) start multiple parallel reader for data set at same time
 find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
 find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
 find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
 ..
 ..
 find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
 find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
  5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
 free -h , cat /proc/meminfo and page-types -r -b lru
 to ensure all pages are freed.

  [Regression Potential]
  Limited to cachefiles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1793430/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1734327] Re: Kernel panic on a nfsroot system

2018-04-11 Thread Fabio Augusto Miranda Martins

** Changed in: linux (Ubuntu Artful)
   Status: Fix Committed => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1734327

Title:
  Kernel panic on a nfsroot system

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Artful:
  Confirmed

Bug description:
  == SRU Justification ==
  The following commit introduced a regression identified in bug 1734327:
  ac8f82a0b6d9 ("UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of 
the remaining blobs")

  The regression causes a kernel panic to occur after multiple TCP connection 
  creations/closures to the localhost.  The bug was found using STAF RPC calls, 
  but is easily reproducible with SSH.

  A revert of commit ac8f82a0b6d9 is needed to resolve this bug.  However, 
commit 4ae2508f0bed
  also needs to be reverted because it depend on commit ac8f82a0b6d9.

  == Fix ==
  Revert 4ae2508f0bed ("UBUNTU: SAUCE: LSM stacking: add stacking support to 
apparmor network hooks")
  Revert ac8f82a0b6d9 ("UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure 
management of the remaining blobs")

  == Test Case ==
  A test kernel was built with these two commits reverted and tested by the 
original bug reporter.
  The bug reporter states the test kernel resolved the bug.



  
  == Original Bug Description ==
  Summary:
  Kernel panic occurs after multiple TCP connection creations/closures to the 
localhost.
  The bug was found using STAF RPC calls, but is easily reproducible with SSH.
  The bug doesn't appear on an identical virtual machine booting from the disk.
  The bug is not reproducible on a similarly-prepared Ubuntu 16.04 machine.
  The bug is reproducible using an older 4.13.0-16-generic kernel
  Reproducible on multiple hardware types.
  Unable to create a kernel memory dump due to makedumpfile errors.
  apport-bug save attached.

  NFSRoot boot options:
  vmlinuz initrd=initrd.img boot=nfs root=/dev/nfs 
nfsroot=190.0.0.254:/diskless/host/u1616/Ubuntu/17.10 intel_iommu=on 
net.ifnames=0 biosdevname=0 apparmor=0 ip=:eth0:dhcp 
blacklist=i40e,ixgbe,fm10k crashkernel=384M-:768M rw

  Software:
  OS: Ubuntu 17.10
  Kernel: 4.13.0-17-generic x86_64

  Reproduction steps:
  1. Boot a system from a nfsroot
  2. Configure password-less localhost ssh access
  3. Run a loop: `while true; do ssh localhost 'uname -a'; done`
  4. Wait for system to crash

  Trace:
  4,1151,52372730,-;general protection fault:  [#1] SMP
  4,1152,52372771,-;Modules linked in: arc4 md4 rpcsec_gss_krb5 nls_utf8 
auth_rpcgss cifs nfsv4 ccm ipmi_ssif intel_rapl sb_edac x86_pkg_temp_thermal 
intel_powerclamp coretemp intel_cstate mei_me input_leds joydev intel_rapl_perf 
mei kvm_intel lpc_ich ioatdma kvm irqbypass ipmi_si ipmi_devintf 
ipmi_msghandler shpchp acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm 
ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables 
x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic 
usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ast ttm aesni_intel igb 
drm_kms_helper aes_x86_64 crypto_simd syscopyarea glue_helper
  4,1153,52373251,c; sysfillrect dca cryptd sysimgblt i2c_algo_bit fb_sys_fops 
ahci ptp drm libahci pps_core wmi
  4,1154,52373322,-;CPU: 11 PID: 1848 Comm: STAFProc Not tainted 
4.13.0-17-generic #20-Ubuntu
  4,1155,52373371,-;Hardware name: Supermicro Super Server/X10SRD-F, BIOS 2.0 
12/17/2015
  4,1156,52373418,-;task: 9d09267f5d00 task.stack: afddc3a7
  4,1157,52373461,-;RIP: 0010:kfree+0x53/0x160
  4,1158,52373486,-;RSP: 0018:9d092ecc3bc8 EFLAGS: 00010207
  4,1159,52373521,-;RAX:  RBX: 241c89490001 RCX: 
0004
  4,1160,52373566,-;RDX: 32d49081cc08 RSI: 00010080 RDI: 
62fac000
  4,1161,52373611,-;RBP: 9d092ecc3be0 R08: 0001f4c0 R09: 
943bb839
  4,1162,52373656,-;R10: 00904c789100 R11:  R12: 
9d09267ef000
  4,1163,52373701,-;R13: 93fa155e R14: 9d09267ef000 R15: 
9d09267ef000
  4,1164,52373746,-;FS:  7f3a53313700() GS:9d092ecc() 
knlGS:
  4,1165,52373797,-;CS:  0010 DS:  ES:  CR0: 80050033
  4,1166,52373834,-;CR2: 7fd5c9ffa780 CR3: 0004666d7000 CR4: 
003406e0
  4,1167,52373878,-;DR0:  DR1:  DR2: 

  4,1168,52373923,-;DR3:  DR6: fffe0ff0 DR7: 
0400
  4,1169,52373968,-;Call Trace:
  4,1170,52373987,-; 
  4,1171,52374009,-; security_sk_free+0x3e/0x50
  4,1172,52374042,-; __sk_destruct+0x108/0x190
  4,1173,52374070,-; sk_destruct+0x20/0x30
  4,1174,52374095,-; __sk_free+0x82/0xa0
  4,1175,52374120,-; sk_free

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

[Kernel-packages] [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest

[Kernel-packages] [Bug 2020319] Re: Encountering an issue with memcpy_fromio causing failed boot of SEV-enabled guest

[Kernel-packages] [Bug 1990167] Re: cma alloc failure in large 5.15 arm instances

[Kernel-packages] [Bug 1955655] Re: kernel-5.13.0-23-generic : Unable to boot when Secure Encrypted Virtualization( SEV) is enabled without setting swiotlb boot param

[Kernel-packages] [Bug 1983625] Re: 5.15.0-1013-oracle: Unable to boot large memory SEV guest without setting swiotlb parameter.

[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used

[Kernel-packages] [Bug 1980884] Re: ubuntu guest kernel panics when a sev guest with passthrough mlx5 VF is used

[Kernel-packages] [Bug 1977919] Re: Docker container creation causes kernel oops on linux-aws 5.13.0.1028.31~20.04.22

[Kernel-packages] [Bug 1944574] Re: EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary

[Kernel-packages] [Bug 1921104] Re: net/mlx5e: Add missing capability check for uplink follow

[Kernel-packages] [Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring

[Kernel-packages] [Bug 1793430] Re: Page leaking in cachefiles_read_backing_file while vmscan is active

[Kernel-packages] [Bug 1734327] Re: Kernel panic on a nfsroot system

17 matches

Site Navigation

Mail list logo

Footer information