I have implemented a similar workaround to @fatordee: $ sudo smartctl -a /dev/nvme0 (...) Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 3.00W - - 0 0 0 0 0 0 1 + 2.60W - - 1 1 1 1 0 0 2 + 1.70W - - 2 2 2 2 0 0 3 - 0.0250W - - 3 3 3 3 5000 9000 4 - 0.0025W - - 4 4 4 4 5000 44000 (...)
$ cat /etc/default/grub | grep latency GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=9000" I used Ex_Lat from the state right before the last one, as per [1]. It's a less aggressive workaround, as this one just disables the lowest power state, instead of them all. Seems to be working pretty well. [1] https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_APST_support -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1910866 Title: nvme drive fails after some time Status in linux package in Ubuntu: Confirmed Status in linux source package in Groovy: Fix Released Status in Debian: New Bug description: Sorry for the vague title. I thought this was a hardware issue until someone else online mentioned their nvme drive goes "read only" after some time. I tend not to reboot my system much, so have a large journal. Either way this happens once in a while. The / drive is fine, but /home is on nvme which just disappears. I reboot and everything is fine. But leave it long enough and it'll fail again. Here's the most recent snippet about the nvme drive before I restarted the system. Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset controller Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset controller Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371 Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure status: -19 Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more than 120 seconds. Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D 0 731 2 0x00004000 Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123967, lost async page write Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): __ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading directory lblock 0 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123917, lost async page write Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123909, lost async page write Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 0, lost sync page write Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing superblock ProblemType: Bug DistroRelease: Ubuntu 20.10 Package: linux-image-5.8.0-34-generic 5.8.0-34.37 ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18 Uname: Linux 5.8.0-34-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu50.3 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Sat Jan 9 11:56:28 2021 InstallationDate: Installed on 2020-08-15 (146 days ago) InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) MachineType: Intel Corporation NUC8i7HVK ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7 RebootRequiredPkgs: linux-image-5.8.0-36-generic linux-base RelatedPackageVersions: linux-restricted-modules-5.8.0-34-generic N/A linux-backports-modules-5.8.0-34-generic N/A linux-firmware 1.190.2 SourcePackage: linux UpgradeStatus: Upgraded to groovy on 2020-09-20 (110 days ago) dmi.bios.date: 12/17/2018 dmi.bios.release: 5.6 dmi.bios.vendor: Intel Corp. dmi.bios.version: HNKBLi70.86A.0053.2018.1217.1739 dmi.board.name: NUC8i7HVB dmi.board.vendor: Intel Corporation dmi.board.version: J68196-502 dmi.chassis.type: 3 dmi.chassis.vendor: Intel Corporation dmi.chassis.version: 2.0 dmi.modalias: dmi:bvnIntelCorp.:bvrHNKBLi70.86A.0053.2018.1217.1739:bd12/17/2018:br5.6:svnIntelCorporation:pnNUC8i7HVK:pvrJ71485-502:rvnIntelCorporation:rnNUC8i7HVB:rvrJ68196-502:cvnIntelCorporation:ct3:cvr2.0: dmi.product.family: Intel NUC dmi.product.name: NUC8i7HVK dmi.product.version: J71485-502 dmi.sys.vendor: Intel Corporation To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp