Hi All, Just had to recover one of my boxes.
Looking for a steer or any advice as to what might be going on. I returned to a console which had dropped into ddb, the host itself is running a -current release and was updated on the 23rd Feb. The logs I’ve managed to pull from it that seem pertinent are: Feb 24 04:05:00 fw0 vnstatd[58770]: Error: Commit transaction to database failed (10): disk I/O error Feb 24 04:08:06 fw0 /bsd: ahci0: NCQ errored slot 0 is idle (70003000 active) Feb 24 04:09:10 fw0 /bsd: ahci0: attempting to idle device Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still active. Feb 24 04:09:10 fw0 /bsd: ahci0: failed to soft reset device Feb 24 04:09:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all outstanding commands. Feb 24 04:09:10 fw0 /bsd: ahci0: log page read failed, slot 31 was still active. Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still active. Feb 24 04:09:10 fw0 /bsd: ahci0: attempting to idle device Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still active. Feb 24 04:09:10 fw0 /bsd: ahci0: failed to soft reset device Feb 24 04:09:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all outstanding commands. Feb 24 04:09:10 fw0 pflogd[90658]: Logging suspended: fwrite: Input/output error Feb 24 04:10:00 fw0 vnstatd[58770]: Error: Exec step failed (11: database disk image is malformed): "update hour set rx=rx+0, tx=tx+1500 where interface=4 and date=strftime('%Y-%m-%d %H:00:00', datetime(1645675500, 'unixepoch'), 'localtime')" Feb 24 04:10:00 fw0 vnstatd[58770]: Error: Fatal database error detected, exiting. Feb 24 04:10:10 fw0 /bsd: ahci0: NCQ errored slot 29 is idle (00002000 active) Feb 24 04:10:10 fw0 /bsd: ahci0: attempting to idle device Feb 24 04:10:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still active. Feb 24 04:10:10 fw0 /bsd: ahci0: failed to soft reset device Feb 24 04:10:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all outstanding commands. Feb 24 04:11:12 fw0 /bsd: ahci0: attempting to idle device Feb 24 04:11:12 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still active. Feb 24 04:11:12 fw0 /bsd: ahci0: failed to soft reset device Feb 24 04:11:12 fw0 /bsd: ahci0: NCQ errored slot 8 is idle (00000200 active) The box should have been quiet at this time, no heavy load expected. Running fsck on the filesystem didn’t end well for me - it resulted in a slew of NCQ error messages, and lost data. The partitions that I didn’t run fsck against kept all their data. I’ve since wiped and restored all the filesystem partitions. I’ve also replaced the SATA cable, but wondering if anyone can shine a light as to what might have happened - the disk (SSD) is only 30 days old, and seems to be OK after restoring a backup onto it. Disk Info: Model Family: Phison Driven SSDs Device Model: KINGSTON SA400S37240G Thanks, Simon.