On 3/21/2024 8:46 AM, mike tancsa wrote:

summary: WD Blue 510 SSDs when attached to the mpr controller seem to start throwing errors on random disks in the pools (see https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000100.html for examples) after copying and destroying a zfs 200G dataset with many small files 3 or 4 times on a set of 4 disks in raidz1. Doing a hard trim -f da on the disks and recreating the pool allows me to do the tests 3 or 4 more times before hitting the errors again.  The same tests with the same disks attached to a sata controller doesnt show the errors. I also ran into the same problem with a similar LSI controller but using the mrsas controller/driver (<AVAGO Invader SAS Controller>).  It seems to be trim related?  Using samsung SSDs on the mpr controller does not seem to show the issue.

I decided to try the same tests on the exact same hardware but booting truenas scale (the linux variant) to see if the problem persists.  If I do a manual trim between zfs send | zfs recv, zfs destroy, the performance seems fairly consistent and there are no crashes/resets of the drives in the pool on linux (6.6.20-production+truenas).

Not a linux person so hard to say if there are some quirks for these disks on linux.

root@truenas[/var/log]# hdparm -I /dev/sda | grep -i tri
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read data after TRIM
root@truenas[/var/log]#

If I dont do the manual TRIM between send|recv (ie zpool trim -w pool), I get the same pattern as when I do a manual trim -f /dev/da[x] on each disk one by one on FreeBSD.  I get 3 full speed loops and after that, super slow until a proper trim is done. On FreeBSD I do this to the raidz1 pool by doing a trim -f /dev/da[1-4] one by one and resilver.

So it does seem to point to TRIM via zfs (be that manual or autotrim) somehow broken with this drive on FreeBSD via the mpr driver and via the ATA driver.

given the output of hdparm on linux and trim being limited to 8 blocks, anyone know if there is a quirk I can try on FreeBSD to maybe get TRIM working for these SSDs ?

details captured in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992

the attachment in the PR, https://bugs.freebsd.org/bugzilla/attachment.cgi?id=250268 has a PNG showing the performance when the TRIM is not done.

    ---Mike




OK, some updates.  I took the same 4 disks off the mpr controller and put them off the motherboard and the problem seems to disappear.  If it is still related to trim, I notice that on the mpr controller the trim method is ATA_TRIM and when attached to the motherboard SATA its DSM_TRIM.  Not sure if there is any difference there ? Or its some other problem.  PR time for the mpr driver ?

kern.cam.ada.1.trim_ticks: 0
kern.cam.ada.1.trim_goal: 0
kern.cam.ada.1.flags: 0x1be3bde<CAN_48BIT,CAN_FLUSHCACHE,CAN_NCQ,CAN_DMA,WAS_OTAG,CAN_TRIM,OPEN,SCTX_INIT,CAN_POWERMGT,CAN_DMA48,CAN_LOG,CAN_WCACHE,CAN_RAHEAD,PROBED,ANNOUNCED,DIRTY,PIM_ATA_EXT,UNMAPPEDIO>
kern.cam.ada.1.trim_lbas: 6356918872
kern.cam.ada.1.trim_ranges: 171552
kern.cam.ada.1.trim_count: 84205
kern.cam.ada.1.delete_method: DSM_TRIM

kern.cam.da.6.trim_ticks: 0
kern.cam.da.6.trim_goal: 0
kern.cam.da.6.sort_io_queue: 0
kern.cam.da.6.unmapped_io: 1
kern.cam.da.6.rotating: 0
kern.cam.da.6.flags: 0x10ef40<WAS_OTAG,OPEN,SCTX_INIT,CAN_RC16,PROBED,ANNOUCNED,CAN_ATA_DMA,CAN_ATA_LOG,UNMAPPEDIO>
kern.cam.da.6.p_type: 0
kern.cam.da.6.error_inject: 0
kern.cam.da.6.max_seq_zones: 0
kern.cam.da.6.optimal_nonseq_zones: 0
kern.cam.da.6.optimal_seq_zones: 0
kern.cam.da.6.zone_support: None
kern.cam.da.6.zone_mode: Not Zoned
kern.cam.da.6.trim_lbas: 0
kern.cam.da.6.trim_ranges: 0
kern.cam.da.6.trim_count: 0
kern.cam.da.6.minimum_cmd_size: 6
kern.cam.da.6.delete_max: 17179607040
kern.cam.da.6.delete_method: ATA_TRIM

camcontrol iden doesnt show much difference really

 diff -bu wd.mpr wd.ata
--- wd.mpr      2024-03-21 08:27:02.995734000 -0400
+++ wd.ata      2024-03-21 08:21:42.310055000 -0400
@@ -1,5 +1,6 @@
+# camcontrol ide ada1
 pass6: <WD Blue SA510 2.5 1000GB 52046100> ACS-4 ATA SATA 3.x device
-pass6: 600.000MB/s transfers, Command Queueing Enabled
+pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

 protocol              ACS-4 ATA SATA 3.x
 device model          WD Blue SA510 2.5 1000GB


Controller is

 mprutil show adapter
mpr0 Adapter:
       Board Name: INSPUR 3008IT
   Board Assembly: INSPUR
        Chip Name: LSISAS3008
    Chip Revision: ALL
    BIOS Revision: 18.00.00.00
Firmware Revision: 16.00.12.00
  Integrated RAID: no
         SATA NCQ: ENABLED
 PCIe Width/Speed: x8 (8.0 GB/sec)
        IOC Speed: Full
      Temperature: 51 C

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max Device
0       0001        0009       N         6.0     3.0    12     SAS Initiator 1       0001        0009       N         6.0     3.0    12     SAS Initiator 2       0001        0009       N         6.0     3.0    12     SAS Initiator 3       0001        0009       N         6.0     3.0    12     SAS Initiator 4                              N                 3.0    12     SAS Initiator 5                              N                 3.0    12     SAS Initiator 6                              N                 3.0    12     SAS Initiator 7                              N                 3.0    12     SAS Initiator



Reply via email to