Hi bugs@, I started down this rabbithole in the "panic: ffs_valloc: dup alloc" thread: https://marc.info/?l=openbsd-bugs&m=175745729132070&w=2
That seems to be its own "bug," to one degree or another, caused by notable filesystem corruption. Now, my new findings. I did a fair bit of testing to try and see what scenarios result in stable writes, or at least to the point that a partition doesn't get so corrupt the system can no longer boot. I tested, generally with Bitcoin syncing, Monero syncing, a bunch of dd's, bonnie++, and iozone -- usually all at the same time, on the same partition. "Failure," in this case, is indicated by requiring a manual fsck after a hard reboot. This has been on amd64-type hardware, with SSDs. # What is robust: Automatic partitioning, no RAID, and sync mount option. I could not get this to require any manual fsck interventions. It was perfectly stable. Now monerod itself had its own lmdb corruption, but there was no manual fscking required. I don't know much about lmdb, if it's supposed to be ACID compliant or not. # What is *not* robust under sustained writes: Mounting without sync. RAID 1, regardless of sync. RAID 1, on a single disk, regardless of sync. So to be very clear, using the install instructions in the FAQ for RAID 1, you are *much more likely* to sustain some kind of filesystem damage having RAID 1 present (during sustained writes), than not. I suspected that perhaps it was a write out of sync issue so I offlined one of the disks and the same corruption happened. Thus it seems that the softraid layer's RAID 1 is doing "something" perhaps not in a syncronous or power safe manner. This is a little bit frustrating as one usually uses RAID 1 to improve reliability, not decrease it! I don't know if this is an inveitability by design choice or if there's something that can be done to remedy it. Thanks! -Henrich
