https://bugzilla.kernel.org/show_bug.cgi?id=220749

--- Comment #25 from Mario Limonciello (AMD) ([email protected]) ---
> If serial console output would still be useful after this round of testing, I
> can pick up a PCIe serial card and try to capture logs that way. For now, the
> information below is from dmesg/journalctl on successful boots.

At this point we're still looking for a needle in a haystack and guessing.  No
promises of course; but a serial console might get us a better hint at where
things are going awry.

> the system also fails to reach userspace, but in this case it drops to the
> initramfs prompt with an error that it cannot find the root filesystem and
> offers a rescue shell. This behavior is consistent whenever `iommu=pt` is
> removed, and has occurred on each Debian kernel I’ve tested on this machine
> (e.g. 6.12.x, 6.16.8, and previously 6.18.x before I removed it).

OK this is really helpful to know actually.  It points at an IOMMU issue that
it would be helpful to look at the logs for.  You should be able to get a
working network stack at the initramfs prompt and use something like netconsole
or ssh (where you'll have to include these in your initramfs) to be able to
send the full kernel log in this configuration to another machine.

Instead of iommu=pt can you please try amd_iommu=off both with and with and
without pci=noacpi?

> I’ve double-checked the firmware setup utility on this HP OmniDesk M02-0310.
> There is no TSME / SME / “Memory Guard” / “Transparent Secure Memory
> Encryption” option exposed anywhere in the BIOS (I checked under Security,
> Advanced, and CPU-related menus). So there is nothing I can toggle for TSME
> on this system.

As your previous testing with memory encryption was pointless (where did this
come from?) then TSME will also be pointless.

> Under the working combination (`pci=noacpi iommu=pt`), GPU acceleration
> appears to be working correctly:

OK, this is a much more usable system at least then.

> [    0.033598] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID00, rdevid:0xa0
> [    0.033600] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID01, rdevid:0xa0
> [    0.033603] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID02, rdevid:0xa0
> [    0.033604] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID03, rdevid:0xa0
> [    0.033605] AMD-Vi: Using global IVHD EFR:0x246577efa2254afa, EFR2:0x10
> [    0.033646] AMD-Vi: [Firmware Bug]: : No southbridge IOAPIC found
> [    0.033650] AMD-Vi: Disabling interrupt remapping
> [    0.033655] clocksource: tsc-early: mask: 0xfffffff

From your logs I noticed this while IOMMU is in pass through.  You can see
there is definitely a firmware bug in the IVRS table entries that the kernel
tells you  about.

https://github.com/torvalds/linux/blob/23cb64fb76257309e396ea4cec8396d4a1dbae68/drivers/iommu/amd/init.c#L3103

If you look at commit history you can see this message was added in place to
turn off interrupt remapping and try to work around the BIOS bug.

https://github.com/torvalds/linux/commit/c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059

This is probably enough to /start/ your conversation with HP support about
fixing the firmware as it's tangible and definitely not a Linux problem.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
acpi-bugzilla mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to