Hello, due to work on SR RAID1 check summing support where I've touched SR RAID internals (workunit scheduling) I'd like to test SR RAID5/6 functionality on snapshot and on my tree to see that I've not broken the stuff while hacking it. My current problem is that I'm not able to come with some testing which would not break RAID5 (I'm starting with it) after several hours of execution while using snapshot. My test is basically: - on one console in loop mount raid to /raid rsync /usr/src/ to /raid compute sha1 sums of all files in /raid umount /raid mount /raid check sha1 -- if failure, fail the test, if not, just repeat - on another console in loop - off line random drive - wait random time (up to minute) - rebuild raid with the offlined drive - wait random time (up to 2 minutes) - repeat
Now, the issue with this is that I get sha1 errors from time to time. Usually in such case the problematic source file contain some garbage. Since I do not yet have a machine dedicated to this testing, I'm using for this thinkpad T500 with one drive. I just created 4 RAID slices in OpenBSD partition. Last week I've been using vndX devices (and files), but this way I even got to kernel panic (on snapshot) like this one: http://openbsd-archive.7691.n7.nabble.com/panic-ffs-valloc-dup-alloc-td254738.html -- so this weekend I've started testing with slices and so far not panic, but still data corruption issue. Last snapshot I'm using for testing is from last Sunday. Let me ask, should SR RAID5 survive such testing or is for example rebuilding with off-lined drive considered unsupported feature? Thanks! Karel