Hello,

due to work on SR RAID1 check summing support where I've touched SR
RAID internals (workunit scheduling) I'd like to test SR RAID5/6
functionality on snapshot and on my tree to see that I've not broken
the stuff while hacking it. My current problem is that I'm not able to
come with some testing which would not break RAID5 (I'm starting with
it) after several hours of execution while using snapshot. My test is
basically:
- on one console in loop
  mount raid to /raid
  rsync /usr/src/ to /raid
  compute sha1 sums of all files in /raid
  umount /raid
  mount /raid
  check sha1 -- if failure, fail the test, if not, just repeat
- on another console in loop
  - off line random drive
  - wait random time (up to minute)
  - rebuild raid with the offlined drive
  - wait random time (up to 2 minutes)
  - repeat

Now, the issue with this is that I get sha1 errors from time to time.
Usually in such case the problematic source file contain some garbage.
Since I do not yet have a machine dedicated to this testing, I'm using
for this thinkpad T500 with one drive. I just created 4 RAID slices in
OpenBSD partition. Last week I've been using vndX devices (and files),
but this way I even got to kernel panic (on snapshot) like this one:
http://openbsd-archive.7691.n7.nabble.com/panic-ffs-valloc-dup-alloc-td254738.html
-- so this weekend I've started testing with slices and so far not
panic, but still data corruption issue. Last snapshot I'm using for
testing is from last Sunday.

Let me ask, should SR RAID5 survive such testing or is for example
rebuilding with off-lined drive considered unsupported feature?

Thanks!
Karel

Reply via email to