Hi All,

Just had to recover one of my boxes.  

Looking for a steer or any advice as to what might be going on.  I returned to 
a console which had dropped into ddb, the host itself is running a -current 
release and was updated on the 23rd Feb.  The logs I’ve managed to pull from it 
that seem pertinent are:

Feb 24 04:05:00 fw0 vnstatd[58770]: Error: Commit transaction to database 
failed (10): disk I/O error
Feb 24 04:08:06 fw0 /bsd: ahci0: NCQ errored slot 0 is idle (70003000 active)
Feb 24 04:09:10 fw0 /bsd: ahci0: attempting to idle device
Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still 
active.
Feb 24 04:09:10 fw0 /bsd: ahci0: failed to soft reset device
Feb 24 04:09:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all 
outstanding commands.
Feb 24 04:09:10 fw0 /bsd: ahci0: log page read failed, slot 31 was still active.
Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still 
active.
Feb 24 04:09:10 fw0 /bsd: ahci0: attempting to idle device
Feb 24 04:09:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still 
active.
Feb 24 04:09:10 fw0 /bsd: ahci0: failed to soft reset device
Feb 24 04:09:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all 
outstanding commands.
Feb 24 04:09:10 fw0 pflogd[90658]: Logging suspended: fwrite: Input/output error
Feb 24 04:10:00 fw0 vnstatd[58770]: Error: Exec step failed (11: database disk 
image is malformed): "update hour set rx=rx+0, tx=tx+1500 where interface=4 and 
date=strftime('%Y-%m-%d %H:00:00', datetime(1645675500, 'unixepoch'), 
'localtime')"
Feb 24 04:10:00 fw0 vnstatd[58770]: Error: Fatal database error detected, 
exiting.
Feb 24 04:10:10 fw0 /bsd: ahci0: NCQ errored slot 29 is idle (00002000 active)
Feb 24 04:10:10 fw0 /bsd: ahci0: attempting to idle device
Feb 24 04:10:10 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still 
active.
Feb 24 04:10:10 fw0 /bsd: ahci0: failed to soft reset device
Feb 24 04:10:10 fw0 /bsd: ahci0: couldn't recover NCQ error, failing all 
outstanding commands.
Feb 24 04:11:12 fw0 /bsd: ahci0: attempting to idle device
Feb 24 04:11:12 fw0 /bsd: ahci0: stopping the port, softreset slot 31 was still 
active.
Feb 24 04:11:12 fw0 /bsd: ahci0: failed to soft reset device
Feb 24 04:11:12 fw0 /bsd: ahci0: NCQ errored slot 8 is idle (00000200 active)


The box should have been quiet at this time, no heavy load expected.  

Running fsck on the filesystem didn’t end well for me - it resulted in a slew 
of NCQ error messages, and lost data.  The partitions that I didn’t run fsck 
against kept all their data.  I’ve since wiped and restored all the filesystem 
partitions.

I’ve also replaced the SATA cable, but wondering if anyone can shine a light as 
to what might have happened - the disk (SSD) is only 30 days old, and seems to 
be OK after restoring a backup onto it.

Disk Info:
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37240G

Thanks,

Simon.


Reply via email to