I'm running OpenBSD on a Protectli box as a router/firewall. The disk is
an SSD. Every now and then I reboot it ("sudo shutdown -r now") just to
make sure it comes back up. Several times it hung on disk errors that
the auto 'fsck' can't fix. I was able to manually run 'fsck' and answer
its prompts to clean up the problems, which sometimes were unreferenced
inodes or similar things. It deleted some files in /var. The system runs
OK, so perhaps the files aren't used in my minimal setup.
I have two questions:
(1) In "/etc/rc" I changed [fsck -p "$@"] to [fsck -f "$@"] in an
attempt to get it to force fix problems, so the system could recover
without someone manually doing it. That didn't work (it still stopped
startup with the disk errors), so I tried making it [do_fsck -f -y] but
that didn't work either. How does one make the system recover (e.g., how
would an unstaffed/dark computer operations center do it)?
(2) Why would the system develop disk problems? Might the SSD be
failing? Should I proactively replace it? If I do replace it, should I
start fresh with a clean install versus cloning the current disk?
By the way, the SSD is a Samsung SSD 870 EVO 500GB (only using a tiny
bit of it). Micromat's Lifespan says it has 100% life left, and their
Tech Tools Pro found no bad blocks.
--Randall
- File corruption on SSD disk Randall Gellens
-