FYI, the drives are super common: === START OF INFORMATION SECTION === Model Family: Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs Device Model: Crucial_CT1050MX300SSD1 Serial Number: 164914EE5DDD LU WWN Device Id: 5 00a075 114ee5ddd Firmware Version: M0CR031 User Capacity: 1,050,214,588,416 bytes [1.05 TB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Sep 4 08:48:51 2020 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled
=== START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N5VD5C7Z LU WWN Device Id: 5 0014ee 263d50d79 Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Sep 4 08:49:19 2020 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1894230 Title: Device queue depth should be 31 not 32 Status in linux package in Ubuntu: Confirmed Bug description: Setting drive device/queue_depth to 31 from 32 resolved an issue whereby I had numerous zpool and ATA errors but only under high load (zpool scrub) or when trimming the drives. Was able to reduce incidence by setting to libata.force=noncqtrim, and resolve with libata.force=noncq, but with an obvious performance impact. Upstream kernel seems to be aware of this issue, so I'm assuming this is a downstream or udev configuration issue, see: https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ Scrub repaired all errors, but, because repairs were made, it seems like it's not just a communications issue, and that there is the potential for DATA LOSS on non-redundant and/or non-ZFS configurations. Sample syslog error: [ 33.688898] ata1.00: exception Emask 0x50 SAct 0x1003000 SErr 0x4c0900 action 0x6 frozen [ 33.688908] ata1.00: irq_stat 0x08000000, interface fatal error [ 33.688913] ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk } [ 33.688917] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688923] ata1.00: cmd 61/00:60:df:28:3d/01:00:2a:00:00/40 tag 12 ncq dma 131072 out [ 33.688929] ata1.00: status: { DRDY } [ 33.688931] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688937] ata1.00: cmd 61/08:68:18:a9:d2/00:00:04:00:00/40 tag 13 ncq dma 4096 out [ 33.688942] ata1.00: status: { DRDY } [ 33.688945] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688951] ata1.00: cmd 61/00:c0:df:27:3d/01:00:2a:00:00/40 tag 24 ncq dma 131072 out [ 33.688956] ata1.00: status: { DRDY } [ 33.688963] ata1: hard resetting link To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894230/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp