Hello I am trying to understand what I may be missing (I have been noticing this issue for a year or so).
I have a machine running -current that is setup with 2 SSD hard drives. The SSD's are fdisk'ed with 1 openbsd partition: # fdisk sd0 Disk: sd0 geometry: 19457/255/63 [312581808 Sectors] Offset: 0 Signature: 0xAA55 Starting Ending LBA Info: #: id C H S - C H S [ start: size ] ------------------------------------------------------------------------------- 0: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused 1: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused 2: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused *3: A6 0 1 2 - 19456 254 63 [ 64: 312576641 ] OpenBSD The disklabels on each disk have an "a" 4.2BSD partition, a "b" swap partition, and then a "m" RAID partition: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: INTEL SSDSA2BW16 duid: 43d094716532e926 flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 19457 total sectors: 312581808 boundstart: 64 boundend: 312576705 drivedata: 0 16 partitions: # size offset fstype [fsize bsize cpg] a: 2104448 64 4.2BSD 2048 16384 1 # / b: 18860313 2104512 swap # none c: 312581808 0 unused m: 291611880 20964825 RAID Most of the time, everything is fine: # bioctl -i sd2 Volume Status Size Device softraid0 0 Online 149305012224 sd2 RAID1 0 Online 149305012224 0:0.0 noencl <sd0m> 1 Online 149305012224 0:1.0 noencl <sd1m> BUT, every once in a while (let's say, a couple of weeks, then a couple of months), all of sudden the array will report as being degraded. However, other than the notice that the array is degraded and that a mirror is offline, I can find nothing in any log, or any changes in the dmesg to suggest what may have happened. I have changed the hard drive cables. I have changed out the SSD drives. But, it still happens every so often. When the array is degraded, I can still fdisk/disklabel the "offline" disk without a problem. I can rebuild the degraded array with the "offline" disk (# bioctl -R /dev/sd1m sd2), and the rebuild completes without a problem, and the array is stable for weeks/months until, randomly, it happens again. I am wondering if there is anything I should be looking at/for to help figure out what the issue is? As I said, I have already swapped out hardware (at least) once. If it is a hardware issue, I can keep swapping out hardware, but (at this point) it seems that the probability is really low that I would have multiple drives that have the same intermittent problem (but, obviously, not zero). I would appreciate any advice on how to track down what the problem may be the next time it happens. Thanks Ted