This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-focal ** Tags added: verification-needed-eoan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1865988 Title: Performing function level reset of AMD onboard USB and audio devices causes system lockup Status in linux package in Ubuntu: In Progress Status in linux-oem-5.6 package in Ubuntu: Invalid Status in linux-oem-osp1 package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux-oem-5.6 source package in Bionic: Invalid Status in linux-oem-osp1 source package in Bionic: In Progress Status in linux source package in Eoan: Fix Committed Status in linux-oem-5.6 source package in Eoan: Invalid Status in linux-oem-osp1 source package in Eoan: Invalid Status in linux source package in Focal: Fix Committed Status in linux-oem-5.6 source package in Focal: In Progress Status in linux-oem-osp1 source package in Focal: Invalid Status in linux source package in Groovy: In Progress Status in linux-oem-5.6 source package in Groovy: Invalid Status in linux-oem-osp1 source package in Groovy: Invalid Bug description: [SRU Justifcation] [Impact] Devices affected: * [1022:148c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller * [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller * [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller Despite advertising FLReset device capabilities, performing a function level reset of either of these devices causes the system to lock up. This is of particular issue where these devices appear in their own IOMMU groups and are well suited to VFIO passthrough. Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode update, and affects dozens of motherboard models across various vendors. Additional discussion of this issue: https://www.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/ [Fix] Two commits currently landed in linux-pci pci/virutualization: * 0d14f06cd665 PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0 * 5727043c73fd PCI: Avoid FLR for AMD Starship USB 3.0 [Test Case] Peform the test on an impacted system: * B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 socket); * Ryzen 3000-series CPU (2000-series possibly also affected); * BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check vendor release notes) In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue a reset command: $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset Impacted systems will not return successfully and become unstable, requiring a reboot. `/var/logs/syslog` will show something resembling the following: xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (1817664630139, 314261908)<-(1817981099530, -2209419) [Regression Risk] Low. These two patches affect only systems with a device needs fix. ========== Original Bug Description ========== $ lsb_release -rd Description: Ubuntu 19.10 Release: 19.10 [Impact] Devices affected: * [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller * [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller Despite advertising FLReset device capabilities, performing a function level reset of either of these devices causes the system to lock up. This is of particular issue where these devices appear in their own IOMMU groups and are well suited to VFIO passthrough. Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode update, and affects dozens of motherboard models across various vendors. Additional discussion of this issue: https://www.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/ [Fix] Add a quirk to disable FLR on these devices. Sample patch attached. [Test Case] Peform the test on an impacted system: * B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 socket); * Ryzen 3000-series CPU (2000-series possibly also affected); * BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check vendor release notes) In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue a reset command $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset Impacted systems will not return successfully and become unstable, requiring a reboot. `/var/logs/syslog` will show something resembling the following Mar 4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting Mar 4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting Mar 4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting Mar 4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting Mar 4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting Mar 4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting Mar 4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up Mar 4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large: Mar 4 14:52:39 bunty kernel: [ 1817.978806] clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff Mar 4 14:52:39 bunty kernel: [ 1817.978809] clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff Mar 4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due to clocksource watchdog Mar 4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. Mar 4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable (1817664630139, 314261908)<-(1817981099530, -2209419) [Regression Risk] Unknown --- ProblemType: Bug ApportVersion: 2.20.11-0ubuntu8.2 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D2c', '/dev/snd/pcmC1D1p', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 19.10 MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M. NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia Package: linux (not installed) ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-40+amdnoflr-generic root=UUID=f2f3748c-f017-47ae-aa38-943e5f5189e0 ro amd_iommu=on ProcVersionSignature: Ubuntu 5.3.0-40.32+amdnoflr-generic 5.3.18 RelatedPackageVersions: linux-restricted-modules-5.3.0-40+amdnoflr-generic N/A linux-backports-modules-5.3.0-40+amdnoflr-generic N/A linux-firmware 1.183.3 Tags: eoan Uname: Linux 5.3.0-40+amdnoflr-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo _MarkForUpload: False dmi.bios.date: 11/14/2019 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: L3.77 dmi.board.name: X470 Taichi dmi.board.vendor: ASRock dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrL3.77:bd11/14/2019:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX470Taichi:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.: dmi.product.family: To Be Filled By O.E.M. dmi.product.name: To Be Filled By O.E.M. dmi.product.sku: To Be Filled By O.E.M. dmi.product.version: To Be Filled By O.E.M. dmi.sys.vendor: To Be Filled By O.E.M. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865988/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp