Hi All,

I wanted to post an update with some further research on the topic.

You can start on the thread here, if you wish:
https://marc.info/?l=openbsd-bugs&m=175795080000738&w=2

Or, to long; didn't read:

I found that having softraid RAID 1, with one drive or two in the
mirror, practically guarantees corruption of FFS during heavy writes
during power loss.

I found that if you mount FFS, without RAID 1, with sync, it's very
reliable. I have yet to be able to make FFS mounted with sync get to the
point where automatic fsck does not repair the filesystem.

However, mounted async, the default, it happens readily under heavy
writes. Most of my heavy writes testing has involved Monero syncing
the blockchain, often with Bitcoin at the same time, dd small files,
dd large files, bonnie++, and iozone.

But even if mounted sync, which is stable without RAID 1, RAID 1
would almost always make the partition not automatically fsckable,
requiring manual intervention, and sometimes being so corrupt that
kernel panics resulted.

Now I had been testing with SSD drives and HDD drives thus far.
These both, to my knowledge, were 4K sector drives with 512 byte
sector emulation.

I had a very far fetched idea that perhaps, the 4K sectors were
somehow involved. And this appears to be true from my initial
testing.

So, I pulled out some old 512 byte sector drives and tested this.

If I mount my partitions with sync, on RAID one with two 512 byte
sector drives on a healthy mirror, I cannot get FFS corruption to
the level of automatic fsck being incapable of fixing it. This is
with three times of pulling the power under very heavy writes. The
same conditions that bare FFS (no RAID) with async, or RAID 1 + FFS
in sync or async will cause notable corruption.

Thus, it appears the corruption issue is related to softraid + 4K
sector drives. Whether 4K native drives are impacted or not, I have
no idea. It appears specific to 512e. To be clear, 512 and 512e are
fine with FFS, provided sync is used.

Crystal Kolipe shared some great wisdom in this thread about RAID 1
that still applies, but nevertheless I was curious and wished to
test further.

Can someone else try to reproduce this?

I would like to test the CRYPTO discipline in a similar manner, and
perhaps RAID 1C.

Just to note, so far all testing has been done on OpenBSD 7.7.

Thank you for reading.

-Henrich

Reply via email to