Dear misc@,

I'm curious if anyone has any sort of tools / patches to verify the consistency 
of softraid(4) RAID1 volumes?


If one adds a new disc (i.e. chunk) to a volume with the RAID1 discipline, the 
resilvering process of softraid(4) will read data from one of the existing 
discs, and write it back to all the discs, ridding you of the artefacts that 
could potentially be used to reconstruct the flipped bits correctly.

Additionally, this resilvering process is also really slow.  Per my notes from 
a few years ago, softraid has a fixed block size of 64KB (MAXPHYS); if we're 
talking about spindle-based HDDs, they only support like 80 random IOPS at 7,2k 
RPM, half of which we gotta use for reads, half for writes; this means it'll 
take (1TB/64KB/(80/s/2)) = 4,5 days to resilver each 1TB of an average 7,2k RPM 
HDD; compare this with sequential resilvering, which will take (1TB/120MB/s) = 
2,3 hours; the reality may vary from these imprecise calculations, but these 
numbers do seem representative of the experience.

The above behaviour is defined here:

http://bxr.su/o/sys/dev/softraid_raid1.c#sr_raid1_rw

369        } else {
370            /* writes go on all working disks */
371            chunk = i;
372            scp = sd->sd_vol.sv_chunks[chunk];
373            switch (scp->src_meta.scm_status) {
374            case BIOC_SDONLINE:
375            case BIOC_SDSCRUB:
376            case BIOC_SDREBUILD:
377                break;
378
379            case BIOC_SDHOTSPARE: /* should never happen */
380            case BIOC_SDOFFLINE:
381                continue;
382
383            default:
384                goto bad;
385            }
386        }


What we could do is something like the following, to pretend that any online 
volume is not available for writes when the wu (Work Unit) we're handling is 
part of the rebuild process from http://bxr.su/o/sys/dev/softraid.c#sr_rebuild, 
mimicking the BIOC_SDOFFLINE behaviour for BIOC_SDONLINE chunks (discs) when 
the SR_WUF_REBUILD flag is set for the workunit:

                        switch (scp->src_meta.scm_status) {
                        case BIOC_SDONLINE:
+                               if (wu->swu_flags & SR_WUF_REBUILD)
+                                       continue;       /* must be same as 
BIOC_SDOFFLINE case */
+                               /* FALLTHROUGH */
                        case BIOC_SDSCRUB:
                        case BIOC_SDREBUILD:


Obviously, there's both pros and cons to such an approach; I've tested a 
variation of the above in production (not a fan weeks-long random-read/write 
rebuilds); but use this at your own risk, obviously.

...

But back to the original problem, this consistency check would have to be 
file-system-specific, because we gotta know which blocks of softraid have and 
have not been used by the filesystem, as softraid itself is 
filesystem-agnostic.  I'd imagine it'll be somewhat similar in concept to the 
fstrim(8) utility on GNU/Linux -- 
http://man7.org/linux/man-pages/man8/fstrim.8.html -- and would also open the 
door for the cron-based TRIM support as well (it would also have to know the 
softraid format itself, too).  Any pointers or hints where to get started, or 
whether anyone has worked on this in the past?


Cheers,
Constantine.                                            http://cm.su/

Reply via email to