Hi bugs@,

I started down this rabbithole in the "panic: ffs_valloc: dup alloc"
thread: https://marc.info/?l=openbsd-bugs&m=175745729132070&w=2

That seems to be its own "bug," to one degree or another, caused by
notable filesystem corruption.

Now, my new findings.

I did a fair bit of testing to try and see what scenarios result in
stable writes, or at least to the point that a partition doesn't get so
corrupt the system can no longer boot.

I tested, generally with Bitcoin syncing, Monero syncing, a bunch of
dd's, bonnie++, and iozone -- usually all at the same time, on the same
partition. "Failure," in this case, is indicated by requiring a manual
fsck after a hard reboot.

This has been on amd64-type hardware, with SSDs.

# What is robust:

Automatic partitioning, no RAID, and sync mount option.

I could not get this to require any manual fsck interventions. It was
perfectly stable. Now monerod itself had its own lmdb corruption, but
there was no manual fscking required. I don't know much about lmdb, if
it's supposed to be ACID compliant or not.

# What is *not* robust under sustained writes:

Mounting without sync.

RAID 1, regardless of sync.

RAID 1, on a single disk, regardless of sync.

So to be very clear, using the install instructions in the FAQ for RAID
1, you are *much more likely* to sustain some kind of filesystem damage
having RAID 1 present (during sustained writes), than not. I suspected
that perhaps it was a write out of sync issue so I offlined one of
the disks and the same corruption happened.

Thus it seems that the softraid layer's RAID 1 is doing "something"
perhaps not in a syncronous or power safe manner.

This is a little bit frustrating as one usually uses RAID 1 to
improve reliability, not decrease it!

I don't know if this is an inveitability by design choice or if
there's something that can be done to remedy it.

Thanks!

-Henrich

Reply via email to