FYI I have captured the `sudo lspci -vv` output on the kernel 5.8 *before* the issue here https://pastebin.ubuntu.com/p/GtZyTWzKTd/ it is subtly different to the 5.4 kernel (which has not had the issue) in case that mattered.
I was also able to reproduce the issue again by causing high disk I/O, specifically I needed to have writes occurring for it to happen (I was recursive grep'ing the whole filesystem while installing apt/pip packages inside a docker container). This then froze the system for 120 seconds until write timeouts occurred, then the disk was remounted as read-only. After this point commands on the system would fail with I/O errors (even basic ones such as "top", although some such as "mount" still work). However our plan was to try to retrieve more information by copying the lspci binary and libs into a tmpfs system in RAM, so it'd still be accessible when the disk stopped. This almost worked, but it appears a few more configuration files would need to be placed in RAM (I could run "lspci --help" but not "lspci" or "lspci -vv"). Instead popey has suggested maybe using a USB key with debootstrap/chroot. (Any suggestions of how we can retrieve more information at this point are welcome and any commands that would be useful to run). Also as a note, if I use REISUB ( https://en.m.wikipedia.org/wiki/Magic_SysRq_key#Uses ) to reboot the machine it enters a Dell BIOS/recovery thing that states that "No Hard Disk is found". Then after a full power off the machine works again. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1910866 Title: nvme drive fails after some time Status in linux package in Ubuntu: Confirmed Bug description: Sorry for the vague title. I thought this was a hardware issue until someone else online mentioned their nvme drive goes "read only" after some time. I tend not to reboot my system much, so have a large journal. Either way this happens once in a while. The / drive is fine, but /home is on nvme which just disappears. I reboot and everything is fine. But leave it long enough and it'll fail again. Here's the most recent snippet about the nvme drive before I restarted the system. Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset controller Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset controller Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure status: -19 Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more than 120 seconds. Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D 0 731 2 0x00004000 Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123967, lost async page write Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): __ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading directory lblock 0 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123917, lost async page write Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123909, lost async page write Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 0, lost sync page write Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing superblock ProblemType: Bug DistroRelease: Ubuntu 20.10 Package: linux-image-5.8.0-34-generic 5.8.0-34.37 ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18 Uname: Linux 5.8.0-34-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu50.3 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Sat Jan 9 11:56:28 2021 InstallationDate: Installed on 2020-08-15 (146 days ago) InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) MachineType: Intel Corporation NUC8i7HVK ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7 RebootRequiredPkgs: linux-image-5.8.0-36-generic linux-base RelatedPackageVersions: linux-restricted-modules-5.8.0-34-generic N/A linux-backports-modules-5.8.0-34-generic N/A linux-firmware 1.190.2 SourcePackage: linux UpgradeStatus: Upgraded to groovy on 2020-09-20 (110 days ago) dmi.bios.date: 12/17/2018 dmi.bios.release: 5.6 dmi.bios.vendor: Intel Corp. dmi.bios.version: HNKBLi70.86A.0053.2018.1217.1739 dmi.board.name: NUC8i7HVB dmi.board.vendor: Intel Corporation dmi.board.version: J68196-502 dmi.chassis.type: 3 dmi.chassis.vendor: Intel Corporation dmi.chassis.version: 2.0 dmi.modalias: dmi:bvnIntelCorp.:bvrHNKBLi70.86A.0053.2018.1217.1739:bd12/17/2018:br5.6:svnIntelCorporation:pnNUC8i7HVK:pvrJ71485-502:rvnIntelCorporation:rnNUC8i7HVB:rvrJ68196-502:cvnIntelCorporation:ct3:cvr2.0: dmi.product.family: Intel NUC dmi.product.name: NUC8i7HVK dmi.product.version: J71485-502 dmi.sys.vendor: Intel Corporation To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp