I have implemented a similar workaround to @fatordee:

$ sudo smartctl -a /dev/nvme0
(...)
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.00W       -        -    0  0  0  0        0       0
 1 +     2.60W       -        -    1  1  1  1        0       0
 2 +     1.70W       -        -    2  2  2  2        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000    9000
 4 -   0.0025W       -        -    4  4  4  4     5000   44000
(...)

$ cat /etc/default/grub | grep latency
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash 
nvme_core.default_ps_max_latency_us=9000"

I used Ex_Lat from the state right before the last one, as per [1].

It's a less aggressive workaround, as this one just disables the lowest
power state, instead of them all.

Seems to be working pretty well.

[1]
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_APST_support

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting     
                                                                                
                           
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting     
                                                                                
                           
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting     
                                                                                
                           
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting     
                                                                                
                           
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D    0   731      2 0x00004000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware                            1.190.2
  SourcePackage: linux
  UpgradeStatus: Upgraded to groovy on 2020-09-20 (110 days ago)
  dmi.bios.date: 12/17/2018
  dmi.bios.release: 5.6
  dmi.bios.vendor: Intel Corp.
  dmi.bios.version: HNKBLi70.86A.0053.2018.1217.1739
  dmi.board.name: NUC8i7HVB
  dmi.board.vendor: Intel Corporation
  dmi.board.version: J68196-502
  dmi.chassis.type: 3
  dmi.chassis.vendor: Intel Corporation
  dmi.chassis.version: 2.0
  dmi.modalias: 
dmi:bvnIntelCorp.:bvrHNKBLi70.86A.0053.2018.1217.1739:bd12/17/2018:br5.6:svnIntelCorporation:pnNUC8i7HVK:pvrJ71485-502:rvnIntelCorporation:rnNUC8i7HVB:rvrJ68196-502:cvnIntelCorporation:ct3:cvr2.0:
  dmi.product.family: Intel NUC
  dmi.product.name: NUC8i7HVK
  dmi.product.version: J71485-502
  dmi.sys.vendor: Intel Corporation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to