On Thu, Feb 02, 2017 at 06:02:57PM +0000, Muneendra Kumar M wrote: > Hi Ben, > Thanks for the review. > So can I push my changes as mentioned by you in the below mail using git.
Sure. -Ben > > Regards, > Muneendra. > > > -----Original Message----- > From: Benjamin Marzinski [mailto:bmarz...@redhat.com] > Sent: Thursday, February 02, 2017 11:09 PM > To: Muneendra Kumar M <mmand...@brocade.com> > Cc: device-mapper development <dm-devel@redhat.com> > Subject: Re: [dm-devel] deterministic io throughput in multipath > > This looks fine. Thanks for all your work on this > > ACK > > -Ben > > On Thu, Feb 02, 2017 at 11:48:39AM +0000, Muneendra Kumar M wrote: > > Hi Ben, > > The below changes suggested by you are good. Thanks for it. > > I have taken your changes and made few changes to make the functionality > > working. > > I have tested the same on the setup which works fine. > > > > We need to increment the path_failures every time checker fails. > > if a device is down for a while, when it comes back up, it will get delayed > > only if the path failures exceeds the error threshold. > > Whether checker fails or kernel identifies the failures we need to capture > > those as it tells the state of the path and target. > > The below code has already taken care of this. > > > > Could you please review the attached patch and provide us your valuable > > comments . > > > > Below are the files that has been changed . > > > > libmultipath/config.c | 6 ++++++ > > libmultipath/config.h | 9 +++++++++ > > libmultipath/configure.c | 3 +++ > > libmultipath/defaults.h | 3 ++- > > libmultipath/dict.c | 86 > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------- > > libmultipath/dict.h | 3 +-- > > libmultipath/propsel.c | 48 > > ++++++++++++++++++++++++++++++++++++++++++++++-- > > libmultipath/propsel.h | 3 +++ > > libmultipath/structs.h | 14 ++++++++++---- > > multipath/multipath.conf.5 | 57 > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > multipathd/main.c | 83 > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 11 files changed, 281 insertions(+), 34 deletions(-) > > > > Regards, > > Muneendra. > > > > > > > > > > -----Original Message----- > > From: Benjamin Marzinski [mailto:bmarz...@redhat.com] > > Sent: Thursday, February 02, 2017 7:20 AM > > To: Muneendra Kumar M <mmand...@brocade.com> > > Cc: device-mapper development <dm-devel@redhat.com> > > Subject: RE: [dm-devel] deterministic io throughput in multipath > > > > This is certainly moving in the right direction. There are a couple of > > things I would change. check_path_reinstate_state() will automatically > > disable the path if there are configuration problems. If things aren't > > configured correctly, or the code can't get the current time, it seems like > > it should allow the path to get reinstated, to avoid keeping a perfectly > > good path down indefinitely. Also, if you look at the delay_*_checks code, > > it automatically reinstates a problematic path if there are no other paths > > to use. This seems like a good idea as well. > > > > Also, your code increments path_failures every time the checker fails. > > This means that if a device is down for a while, when it comes back up, it > > will get delayed. I'm not sure if this is intentional, or if you were > > trying to track the number of times the path was restored and then failed > > again, instead of the total time a path was failed for. > > > > Perhaps it would be easier to show the kind of changes I would make with a > > patch. What do you think about this? I haven't done much testing on it at > > all, but these are the changes I would make. > > > > Signed-off-by: Benjamin Marzinski <bmarz...@redhat.com> > > --- > > libmultipath/config.c | 3 + > > libmultipath/dict.c | 2 +- > > multipathd/main.c | 149 > > +++++++++++++++++++++++--------------------------- > > 3 files changed, 72 insertions(+), 82 deletions(-) > > > > diff --git a/libmultipath/config.c b/libmultipath/config.c index > > be384af..5837dc6 100644 > > --- a/libmultipath/config.c > > +++ b/libmultipath/config.c > > @@ -624,6 +624,9 @@ load_config (char * file) > > conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS; > > conf->remove_retries = 0; > > conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB; > > + conf->san_path_err_threshold = DEFAULT_ERR_CHECKS; > > + conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS; > > + conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS; > > > > /* > > * preload default hwtable > > diff --git a/libmultipath/dict.c b/libmultipath/dict.c index > > 4754572..ae94c88 100644 > > --- a/libmultipath/dict.c > > +++ b/libmultipath/dict.c > > @@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr) > > case NU_UNDEF: > > return 0; > > case NU_NO: > > - return snprintf(buff, len, "\"off\""); > > + return snprintf(buff, len, "\"no\""); > > default: > > return snprintf(buff, len, "%i", *int_ptr); > > } > > diff --git a/multipathd/main.c b/multipathd/main.c index d6d68a4..305e236 > > 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -1488,69 +1488,70 @@ void repair_path(struct path * pp) } > > > > static int check_path_reinstate_state(struct path * pp) { > > - struct timespec start_time; > > - int disable_reinstate = 1; > > - > > - if (!((pp->mpp->san_path_err_threshold > 0) && > > - (pp->mpp->san_path_err_forget_rate > 0) && > > - (pp->mpp->san_path_err_recovery_time >0))) { > > - return disable_reinstate; > > - } > > - > > - if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) { > > - return disable_reinstate; > > + struct timespec curr_time; > > + > > + if (pp->disable_reinstate) { > > + /* If we don't know how much time has passed, automatically > > + * reinstate the path, just to be safe. Also, if there are > > + * no other usable paths, reinstate the path */ > > + if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 || > > + pp->mpp->nr_active == 0) { > > + condlog(2, "%s : reinstating path early", pp->dev); > > + goto reinstate_path; > > + } > > + if ((curr_time.tv_sec - pp->dis_reinstate_time ) > > > pp->mpp->san_path_err_recovery_time) { > > + condlog(2,"%s : reinstate the path after err recovery > > time", pp->dev); > > + goto reinstate_path; > > + } > > + return 1; > > } > > > > - if ((start_time.tv_sec - pp->dis_reinstate_time ) > > > pp->mpp->san_path_err_recovery_time) { > > - disable_reinstate =0; > > - pp->path_failures = 0; > > - pp->disable_reinstate = 0; > > - pp->san_path_err_forget_rate = > > pp->mpp->san_path_err_forget_rate; > > - condlog(3,"\npath %s :reinstate the path after err recovery > > time\n",pp->dev); > > + /* forget errors on a working path */ > > + if ((pp->state == PATH_UP || pp->state == PATH_GHOST) && > > + pp->path_failures > 0) { > > + if (pp->san_path_err_forget_rate > 0) > > + pp->san_path_err_forget_rate--; > > + else { > > + /* for every san_path_err_forget_rate number of > > + * successful path checks decrement path_failures by 1 > > + */ > > + pp->path_failures--; > > + pp->san_path_err_forget_rate = > > pp->mpp->san_path_err_forget_rate; > > + } > > + return 0; > > } > > - return disable_reinstate; > > -} > > > > -static int check_path_validity_err (struct path * pp) { > > - struct timespec start_time; > > - int disable_reinstate = 0; > > + /* If the path isn't recovering from a failed state, do nothing */ > > + if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY && > > + pp->state != PATH_TIMEOUT) > > + return 0; > > > > - if (!((pp->mpp->san_path_err_threshold > 0) && > > - (pp->mpp->san_path_err_forget_rate > 0) && > > - (pp->mpp->san_path_err_recovery_time >0))) { > > - return disable_reinstate; > > - } > > + if (pp->path_failures == 0) > > + pp->san_path_err_forget_rate = > > pp->mpp->san_path_err_forget_rate; > > + pp->path_failures++; > > > > - if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) { > > - return disable_reinstate; > > - } > > - if (!pp->disable_reinstate) { > > - if (pp->path_failures) { > > - /*if the error threshold has hit hit within the > > san_path_err_forget_rate > > - *cycles donot reinstante the path till the > > san_path_err_recovery_time > > - *place the path in failed state till > > san_path_err_recovery_time so that the > > - *cutomer can rectify the issue within this time .Once > > the completion of > > - *san_path_err_recovery_time it should automatically > > reinstantate the path > > - */ > > - if ((pp->path_failures > > > pp->mpp->san_path_err_threshold) && > > - (pp->san_path_err_forget_rate > 0)) { > > - printf("\n%s:%d: %s hit error threshold > > \n",__func__,__LINE__,pp->dev); > > - pp->dis_reinstate_time = start_time.tv_sec ; > > - pp->disable_reinstate = 1; > > - disable_reinstate = 1; > > - } else if ((pp->san_path_err_forget_rate > 0)) { > > - pp->san_path_err_forget_rate--; > > - } else { > > - /*for every san_path_err_forget_rate number > > - *of successful path checks decrement > > path_failures by 1 > > - */ > > - pp->path_failures --; > > - pp->san_path_err_forget_rate = > > pp->mpp->san_path_err_forget_rate; > > - } > > - } > > + /* if we don't know the currently time, we don't know how long to > > + * delay the path, so there's no point in checking if we should */ > > + if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0) > > + return 0; > > + /* when path failures has exceeded the san_path_err_threshold > > + * place the path in delayed state till san_path_err_recovery_time > > + * so that the cutomer can rectify the issue within this time. After > > + * the completion of san_path_err_recovery_time it should > > + * automatically reinstate the path */ > > + if (pp->path_failures > pp->mpp->san_path_err_threshold) { > > + condlog(2, "%s : hit error threshold. Delaying path > > reinstatement", pp->dev); > > + pp->dis_reinstate_time = curr_time.tv_sec; > > + pp->disable_reinstate = 1; > > + return 1; > > } > > - return disable_reinstate; > > + return 0; > > +reinstate_path: > > + pp->path_failures = 0; > > + pp->disable_reinstate = 0; > > + return 0; > > } > > + > > /* > > * Returns '1' if the path has been checked, '-1' if it was blacklisted > > * and '0' otherwise > > @@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, > > int ticks) > > int oldchkrstate = pp->chkrstate; > > int retrigger_tries, checkint; > > struct config *conf; > > - int ret; > > + int ret; > > > > if ((pp->initialized == INIT_OK || > > pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) @@ -1664,16 > > +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > > if (!pp->mpp) > > return 0; > > > > + /* We only need to check if the path should be delayed when the > > + * the path is actually usable and san_path_err is configured */ > > if ((newstate == PATH_UP || newstate == PATH_GHOST) && > > - pp->disable_reinstate) { > > - /* > > - * check if the path is in failed state for more than > > san_path_err_recovery_time > > - * if not place the path in delayed state > > - */ > > - if (check_path_reinstate_state(pp)) { > > - pp->state = PATH_DELAYED; > > - return 1; > > - } > > + pp->mpp->san_path_err_threshold > 0 && > > + pp->mpp->san_path_err_forget_rate > 0 && > > + pp->mpp->san_path_err_recovery_time > 0 && > > + check_path_reinstate_state(pp)) { > > + pp->state = PATH_DELAYED; > > + return 1; > > } > > > > if ((newstate == PATH_UP || newstate == PATH_GHOST) && @@ -1685,31 > > +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > > } else > > pp->wait_checks = 0; > > } > > - if ((newstate == PATH_DOWN || newstate == PATH_GHOST || > > - pp->state == PATH_DOWN)) { > > - /*assigned the path_err_forget_rate when we see the first > > failure on the path*/ > > - if(pp->path_failures == 0){ > > - pp->san_path_err_forget_rate = > > pp->mpp->san_path_err_forget_rate; > > - } > > - pp->path_failures++; > > - } > > + > > /* > > * don't reinstate failed path, if its in stand-by > > * and if target supports only implicit tpgs mode. > > * this will prevent unnecessary i/o by dm on stand-by > > * paths if there are no other active paths in map. > > - * > > - * when path failures has exceeded the san_path_err_threshold > > - * within san_path_err_forget_rate then we don't reinstate > > - * failed path for san_path_err_recovery_time > > */ > > - disable_reinstate = ((newstate == PATH_GHOST && > > + disable_reinstate = (newstate == PATH_GHOST && > > pp->mpp->nr_active == 0 && > > - pp->tpgs == TPGS_IMPLICIT) ? 1 : > > - check_path_validity_err(pp)); > > + pp->tpgs == TPGS_IMPLICIT) ? 1 : 0; > > > > pp->chkrstate = newstate; > > - > > if (newstate != pp->state) { > > int oldstate = pp->state; > > pp->state = newstate; > > -- > > 1.8.3.1 > > > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel