I've been messing with this for days now, and it's driving me insane. Here are the particulars:
- SunFire V100 w/ two 40GB disks, identically partitioned. - Kernel 2.4.25 (kernel.org source) compiled with egcs64 from Woody - Kernel 2.6.3 (kernel.org source) compiled with gcc 3.3.3 from Sid - raidtools2 1.00.3-5 backported to Woody (also tried the Woody version) - ext3 is being used in all test cases Both disks work perfectly fine with linux native partitions and a root filesystem installed on them. Both disks also work perfectly fine if one or the other is the single member of a degraded RAID1 array. Both of these configurations have been stress-tested (under both kernels) and all is well after a fair number of reads and writes. The problem comes in when I make both disks a member of the same array. (in this case, hda2 and hdc2 are members of md1). As soon as I sync the array, I write to it, and it pretty much instantly corrupts. Unmounting md1 and running a fsck on it shows a large number of illegal blocks. md5sums of various binaries on the system are wrong, apt and dpkg's status files get so horribly corrupted that they segfault or refuse to run. All within minutes of even seconds of writing to md device. Now, this looks like a pretty huge glaring bug that I would have expected others to run into, but Google hasn't turned up anything yet, so I'm stumped. Am I going insane, or is there something horribly wrong here? I've tested with several disks in two different SunFire V100 machines, always with the same results. All works well with one disk, everything blows up with two. Any help anyone can be would be VERY much appreciated, as I was supposed to have this box online several days ago. <sigh> ... Adam Conrad P.S. I can't get at the machine right now as I'm at home, and it's not online, but I'll post relevant configs (fdisk -l, /etc/raidtab, etc) tomorrow if someone hasn't already replied with "yet, there's a glaring bug in Sparc's RAID1, here's more about it")