Re: softraid RAID 1 makes corruption more likely

H. Hartzer Fri, 03 Oct 2025 11:03:21 -0700

On Fri Oct 3, 2025 at 4:09 PM UTC, Crystal Kolipe wrote:
>> the default is 'nosync', not 'async'.
>
> The sync and async options are independent.
>
> The default is nosync AND noasync.
>
> You can set sync and async simultaneously.  One does not remove the other.
>
> The defaults work well in the most common situations.


I understand.

>> I am not sure how HDDs handle power
>> loss
>
> Decent mechanical drives can make use of the inertia of the spinning platters
> to write data that is in volitile memory to the media.

This makes sense.

>> There may be another way to test this that's safer, other than cutting
>> power. I have had crashes also cause this type of corruption.
>> 
>> Perhaps something like halt -n might work?
>
> Or just use a normal filesystem on a normal drive without RAID.
>
> I see nothing in your setup that would benefit from the complexities you are
> creating.
>
> Simple solutions such as storing highly variable data on it's on dedicated
> partition can probably do more to reduce downtime due to unexpected events
> than worrying about how a particular drive will behave.

I am not saying that RAID is the best option for me here. I merely
wanted to share my further testing. It is alarming to me that
softraid 1 with any FFS mount configuration is so fragile under
heavy writes, and that the default FFS mount options, sans RAID,
is likewise so fragile. And it's true that most usage patterns won't
trigger this, but my case of syncing a blockchain certainly is. My
tests involved additional writes just to be more consistent and
thorough -- I don't normally run bonnie++, iozone, and dd on
production systems.

Even with a dedicated partition, a system may not come back after
power loss or a crash. If automatic FSCK fails, manual intervention
has to be done via the console. On a remote system, this may be a
pretty big deal. Of course I would love simple serial console and
power access to every system I have, but this is not the reality.
And even if it were, there's still the chance that the corruption
will be so much that manual fsck can't take care of it and the
kernel will panic trying to read data from the partition. So then
I have to reformat the partition and start over.

Starting over could mean syncing a 200GB+ blockchain, or transfering
that from another machine. This just isn't a fast operation by any
means. If I can avoid this, which it appears I can, I will be much
better off.

Again, this behavior is out-of-the-box defaults with or without
RAID 1.  I did not see this tested or documented anywhere. And it's
the first time I've had this issue after many years of Linux and
FreeBSD use (both with FFS and ZFS.)

I felt it was useful to share these caveats and workarounds. Solene
Rapenne stopped using OpenBSD partly because of data loss issues[1].
I am sure she is not the only person to have done so. I am still
new to OpenBSD after encountering serious issues with FreeBSD,
mostly involving desktop use. OpenBSD has been far better on the
desktop for me, but reliable filesystems that can handle crashes
and power loss seem quite important.

Sincerely,

Henrich

1:
https://dataswamp.org/~solene/2024-11-15-why-i-stopped-using-openbsd.html

Re: softraid RAID 1 makes corruption more likely

Reply via email to