Marco Peereboom wrote:
First let me defend softraid.  The rebuild code is designed to offer
maximum data protection.  With this is mind certain assumptions were
made.

Sorry... I haven't stated that I think that data protection is king. Any performance increase that could compromise the disk consistency is bad in my view. So... If my patch is not good enough for you to include, it (probably) isn't good enough for me to trust with my data.

My instinct, when editing the code, was that you had created the SR_WUF_REBUILD tag to enable this kind of optimisation later... oh well, it seems that I _cannot_ read between the lines.

---

With luck (assuming that I, not my mail client, messed up the tabs last
time); this is my original 'nasty' patch:

Index: softraid_raid1.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid_raid1.c,v
retrieving revision 1.23
diff -u softraid_raid1.c
--- softraid_raid1.c    26 Mar 2010 11:20:34 -0000      1.23
+++ softraid_raid1.c    15 Apr 2010 17:34:48 -0000
@@ -447,13 +447,25 @@
                        }
                } else {
                        /* writes go on all working disks */
+                       /* XXX - copy and paste code warning */
                        x = i;
                        scp = sd->sd_vol.sv_chunks[x];
                        switch (scp->src_meta.scm_status) {
-                       case BIOC_SDONLINE:
                        case BIOC_SDSCRUB:
                        case BIOC_SDREBUILD:
                                b->b_flags |= B_WRITE;
+                               break;
+
+                       case BIOC_SDONLINE:
+                               /* when rebuilding, take care to only send the
+                                * rebuild writes to disks that need them
+                                */
+                               if (wu->swu_flags & SR_WUF_REBUILD) {
+                                       wu->swu_io_count--;
+                                       sr_ccb_put(ccb);
+                                       continue;
+                               } else
+                                       b->b_flags |= B_WRITE;
                                break;

                        case BIOC_SDHOTSPARE: /* should never happen */



And at the end of the email is a slightly better one (i.e. less copy and pasting). The patch still feels a bit nasty, because it assumes that b->b_flags will not have B_WRITE set (which is true) but it doesn't state that. But those are 'just' details... your concerns were not (yet) about the exact shape of the code after all.

---

Testing:

When I said that I had tested the original patch I did slightly more than
you may have guessed.  When I said 'no data corruption', what I meant was
that I diff'd the chunks after a rebuild to check that I hadn't introduced
any. Obviously the chunks _did_ differ, but only in the metadata (see caveat below).

When rebuilding, I did write data to the array: so I am reasonable sure that while rebuilding the writes to the array get passed to all the members of the raid1.

What I didn't test, and may struggle to do so, was what happens with hardware failures during the rebuild (or normally). The only thing that tested was disk missing at boot. It _may_ be that I cannot hotplug my virtual harddisks, I'll find out later.

I also tested that I could fail any of the chunks without ill effect. This probably also checked that I hadn't messed up the roaming detection.

There is one way that the chunks will differ... for blocks that haven't
been written to the chunks may not be consistent.  The original rebuild
code would, as a side effect, cause one of good source chunks version of that block to be written to all disks. My patched code doesn't. So after a rebuild the original code would always create a perfectly consistent set of chunks, whereas the patched code only does that if the good chunks were already in that state.

---

Thought experiment time:

I think that the patch doesn't cause any problems. I have included my thoughts below, but I am a novice at this kernel hackery, so please tell me where I have made an error (if I have).

I haven't edited the reading half of the rebuild process. So I cannot have introduced any bugs there (except, perhaps, exposing race conditions).

Assuming that the chunks marked BIOC_SDONLINE are consistent (i.e. all
contain the same data) then the rebuild write (tagged SR_WUF_REBUILD) will
contain the same data as is already on the disks.  If the source chunks are
not consistent then, by definition, they haven't been written to ever. Therefore we _can_ safely ignore the request to write the data.

For normal writes (not tagged SR_WUF_REBUILD) the data needs to be written to all the disks as it used to be. (the data will probably differ!)

For disks marked BIOC_SDREBUILD or BIOC_SDSCRUB the data needs to be
written to all disks as it will (probably) differ - as the original code did.

For disks marked BIOC_SDOFFLINE (And BIOC_SDHOTSPARE) the write shouldn't be done, so it should be ignored - as the original code did.

The code that I use for either enacting or ignoring writes to a given disk is the same code that was used originally - so I don't think that I could have caused problems with colliding IO (or race conditions).

---

Slightly improved patch...

Index: softraid_raid1.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid_raid1.c,v
retrieving revision 1.23
diff -u softraid_raid1.c
--- softraid_raid1.c    26 Mar 2010 11:20:34 -0000      1.23
+++ softraid_raid1.c    15 Apr 2010 17:35:02 -0000
@@ -446,11 +446,18 @@
                                goto bad;
                        }
                } else {
-                       /* writes go on all working disks */
+                       /* 'normal' writes go on all working disks,
+                        * 'rebuild' writes go on all disks that need the data
+                        */
                        x = i;
                        scp = sd->sd_vol.sv_chunks[x];
+
                        switch (scp->src_meta.scm_status) {
                        case BIOC_SDONLINE:
+                               if (!(wu->swu_flags & SR_WUF_REBUILD))
+                                       b->b_flags |= B_WRITE;
+                               break;
+
                        case BIOC_SDSCRUB:
                        case BIOC_SDREBUILD:
                                b->b_flags |= B_WRITE;
@@ -458,14 +465,17 @@

                        case BIOC_SDHOTSPARE: /* should never happen */
                        case BIOC_SDOFFLINE:
-                               wu->swu_io_count--;
-                               sr_ccb_put(ccb);
-                               continue;
+                               break;

                        default:
                                goto bad;
                        }

+                       if (!(b->b_flags & B_WRITE)) {
+                               wu->swu_io_count--;
+                               sr_ccb_put(ccb);
+                               continue;
+                       }
                }
                ccb->ccb_target = x;
                b->b_dev = sd->sd_vol.sv_chunks[x]->src_dev_mm;

Reply via email to