Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-10-25 Thread Wu Guanghao


在 2022/10/24 15:59, Martin Wilck 写道:
> Hello Wu Guanghao,
> 
> On Mon, 2022-10-24 at 10:22 +0800, Wu Guanghao wrote:
>> Friendly ping ...
>> Is the series of patches for vecs->lock ready to be merged into
>> mainline?
>> Thanks
> 
> Ben's series has been merged in 0.9.1 already. There have been no
> reviewed patches on top of it, unless I am mistaken.

Yes, it has been merged, sorry.

> Btw, I never received an answer from you about my suggestion from 
> Sep. 20:
> 
>>> The version we are currently testing is 0.8.4, so we only merge the
>>> first 3 patches in this series of patches. Then after the actual
>>> test,
>>> it was found that the effect improvement is not very obvious.
>>>
>>
>> Which path checker are you using? 
>> If it's TUR, could you try to simply change the sync wait time?
>>
>> static void tur_timeout(struct timespec *tsp)
>> {
>>  get_monotonic_time(tsp);
>>  tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
>>  normalize_timespec(tsp);
>> }
>>

We are ready to start the test and will give you feedback on the
test results later.

>> Change the 1 millisecond above to e.g. one microsecond. That should
>> speed up the checker significantly. You will get a higher rate of
>> "pending" path states, but that shouldn't matter much.
>>
> 
> Regards,
> Martin
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-10-24 Thread Martin Wilck
Hello Wu Guanghao,

On Mon, 2022-10-24 at 10:22 +0800, Wu Guanghao wrote:
> Friendly ping ...
> Is the series of patches for vecs->lock ready to be merged into
> mainline?
> Thanks

Ben's series has been merged in 0.9.1 already. There have been no
reviewed patches on top of it, unless I am mistaken.

Btw, I never received an answer from you about my suggestion from 
Sep. 20:

> > The version we are currently testing is 0.8.4, so we only merge the
> > first 3 patches in this series of patches. Then after the actual
> > test,
> > it was found that the effect improvement is not very obvious.
> > 
> 
> Which path checker are you using? 
> If it's TUR, could you try to simply change the sync wait time?
> 
> static void tur_timeout(struct timespec *tsp)
> {
>   get_monotonic_time(tsp);
>   tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
>   normalize_timespec(tsp);
> }
> 
> Change the 1 millisecond above to e.g. one microsecond. That should
> speed up the checker significantly. You will get a higher rate of
> "pending" path states, but that shouldn't matter much.
> 

Regards,
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-10-23 Thread Wu Guanghao
Friendly ping ...
Is the series of patches for vecs->lock ready to be merged into mainline?
Thanks

在 2022/10/17 19:31, Wu Guanghao 写道:
> 
> 
> 在 2022/9/20 17:20, Wu Guanghao 写道:
>>
>>
>> 在 2022/9/16 2:17, Benjamin Marzinski 写道:
>>> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
 Sorry for the late feedback.

 The version we are currently testing is 0.8.4, so we only merge the
 first 3 patches in this series of patches. Then after the actual test,
 it was found that the effect improvement is not very obvious.

 Test environment: 1000 multipath devices, 16 paths per device
 Test command:  Aggregate multipath devices using multipathd add path
 Time consuming (time for 16 paths to aggregate 1 multipath device):
 1. Original:
< 4s:   76.9%;
4s~10s: 18.4%;
>10s:4.7%;
 2. After using the patches:
< 4s:   79.1%;
4s~10s: 18.2%;
>10s:2.7%;

 >From the results, the time-consuming improvement of the patch is not 
 >obvious,
 so we made a few changes to the patch and it worked as expected. The 
 modification
 of the patch is as follows:

 Paths_checked is changed to configurable, current setting n = 2.
 Removed judgment on diff_time.
 Sleep change from 10us to 5ms
>>>
>>> I worry that this is giving too much deference to the uevents. If there
>>> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
>>> worried that this will make it take significantly longer for the the
>>> path checker to complete a cycle.  Making paths_checked configurable, so
>>> that this doesn't trigger too often does help to avoid the issue that
>>> checking the time from chk_start_time was dealing with, and makes the
>>> mechanics of this simpler.  But 5ms seems like a long time to have to
>>> wait, just to make sure that another process had the time to grab the
>>> vecs lock.  Perhaps it would be better to sleep for a shorter length of
>>> time, but in a loop, where we can check to see if progress has been
>>> made, perhaps by checking some counter of events and user commands
>>> serviced. This way, we aren't sleeping too long for no good reason.
>>> I agree with this, we are also going to adjust the sleep time, and then
>> continue the test.
> 
> We have tested the delay times of 10us, 100us and 1ms, but the results
> weren't very good. More than 30% of the luns aggregation time is greater
> than 4s. Whether the delay time can also be used as a configurable item
> and configured according to the situation.
> 
> The following is the patch content after we have modified checker_loop().
> 
> Signed-off-by: wuguanghao 
> ---
>  libmultipath/config.c   |  3 ++
>  libmultipath/config.h   |  3 ++
>  libmultipath/defaults.h |  3 ++
>  libmultipath/dict.c | 58 +++
>  libmultipath/structs.h  |  1 +
>  multipathd/main.c   | 87 +
>  6 files changed, 138 insertions(+), 17 deletions(-)
> 
> diff --git a/libmultipath/config.c b/libmultipath/config.c
> index 7d0d711..7658626 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -752,6 +752,9 @@ load_config (char * file)
>   conf->remove_retries = 0;
>   conf->ghost_delay = DEFAULT_GHOST_DELAY;
>   conf->all_tg_pt = DEFAULT_ALL_TG_PT;
> + conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
> + conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
> +
>   /*
>* preload default hwtable
>*/
> diff --git a/libmultipath/config.h b/libmultipath/config.h
> index ceecff2..a26dd99 100644
> --- a/libmultipath/config.h
> +++ b/libmultipath/config.h
> @@ -229,6 +229,9 @@ struct config {
>   vector elist_property;
>   vector elist_protocol;
>   char *enable_foreign;
> +
> + int check_delay_time;
> + int check_path_num_per_splice;
>  };
> 
>  extern struct udev * udev;
> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> index 52fe05b..b3fb1bc 100644
> --- a/libmultipath/defaults.h
> +++ b/libmultipath/defaults.h
> @@ -50,6 +50,9 @@
>  #define DEFAULT_FIND_MULTIPATHS_TIMEOUT -10
>  #define DEFAULT_UNKNOWN_FIND_MULTIPATHS_TIMEOUT 1
>  #define DEFAULT_ALL_TG_PT ALL_TG_PT_OFF
> +#define DEFAULT_CHECK_DELAY_TIME 10
> +#define DEFAULT_CHECK_PATH_NUM_PER_SPLICE 1
> +
>  /* Enable all foreign libraries by default */
>  #define DEFAULT_ENABLE_FOREIGN ""
> 
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 2c8ea10..2262caa 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1407,6 +1407,46 @@ def_uxsock_timeout_handler(struct config *conf, vector 
> strvec)
>   free(buff);
>   return 0;
>  }
> +static int
> +def_check_delay_time_handler(struct config *conf, vector strvec)
> +{
> + int local_check_delay_time;
> + char *buff;
> +
> + buff = set_value(strvec);
> + if (!buff)
> + return 1;
> +
> + if (sscanf(buff, 

Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-10-17 Thread Wu Guanghao


在 2022/9/20 17:20, Wu Guanghao 写道:
> 
> 
> 在 2022/9/16 2:17, Benjamin Marzinski 写道:
>> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>>> Sorry for the late feedback.
>>>
>>> The version we are currently testing is 0.8.4, so we only merge the
>>> first 3 patches in this series of patches. Then after the actual test,
>>> it was found that the effect improvement is not very obvious.
>>>
>>> Test environment: 1000 multipath devices, 16 paths per device
>>> Test command:  Aggregate multipath devices using multipathd add path
>>> Time consuming (time for 16 paths to aggregate 1 multipath device):
>>> 1. Original:
>>> < 4s:   76.9%;
>>> 4s~10s: 18.4%;
>>> >10s:4.7%;
>>> 2. After using the patches:
>>> < 4s:   79.1%;
>>> 4s~10s: 18.2%;
>>> >10s:2.7%;
>>>
>>> >From the results, the time-consuming improvement of the patch is not 
>>> >obvious,
>>> so we made a few changes to the patch and it worked as expected. The 
>>> modification
>>> of the patch is as follows:
>>>
>>> Paths_checked is changed to configurable, current setting n = 2.
>>> Removed judgment on diff_time.
>>> Sleep change from 10us to 5ms
>>
>> I worry that this is giving too much deference to the uevents. If there
>> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
>> worried that this will make it take significantly longer for the the
>> path checker to complete a cycle.  Making paths_checked configurable, so
>> that this doesn't trigger too often does help to avoid the issue that
>> checking the time from chk_start_time was dealing with, and makes the
>> mechanics of this simpler.  But 5ms seems like a long time to have to
>> wait, just to make sure that another process had the time to grab the
>> vecs lock.  Perhaps it would be better to sleep for a shorter length of
>> time, but in a loop, where we can check to see if progress has been
>> made, perhaps by checking some counter of events and user commands
>> serviced. This way, we aren't sleeping too long for no good reason.
>> I agree with this, we are also going to adjust the sleep time, and then
> continue the test.

We have tested the delay times of 10us, 100us and 1ms, but the results
weren't very good. More than 30% of the luns aggregation time is greater
than 4s. Whether the delay time can also be used as a configurable item
and configured according to the situation.

The following is the patch content after we have modified checker_loop().

Signed-off-by: wuguanghao 
---
 libmultipath/config.c   |  3 ++
 libmultipath/config.h   |  3 ++
 libmultipath/defaults.h |  3 ++
 libmultipath/dict.c | 58 +++
 libmultipath/structs.h  |  1 +
 multipathd/main.c   | 87 +
 6 files changed, 138 insertions(+), 17 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 7d0d711..7658626 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -752,6 +752,9 @@ load_config (char * file)
conf->remove_retries = 0;
conf->ghost_delay = DEFAULT_GHOST_DELAY;
conf->all_tg_pt = DEFAULT_ALL_TG_PT;
+   conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
+   conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
+
/*
 * preload default hwtable
 */
diff --git a/libmultipath/config.h b/libmultipath/config.h
index ceecff2..a26dd99 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -229,6 +229,9 @@ struct config {
vector elist_property;
vector elist_protocol;
char *enable_foreign;
+
+   int check_delay_time;
+   int check_path_num_per_splice;
 };

 extern struct udev * udev;
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index 52fe05b..b3fb1bc 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -50,6 +50,9 @@
 #define DEFAULT_FIND_MULTIPATHS_TIMEOUT -10
 #define DEFAULT_UNKNOWN_FIND_MULTIPATHS_TIMEOUT 1
 #define DEFAULT_ALL_TG_PT ALL_TG_PT_OFF
+#define DEFAULT_CHECK_DELAY_TIME 10
+#define DEFAULT_CHECK_PATH_NUM_PER_SPLICE 1
+
 /* Enable all foreign libraries by default */
 #define DEFAULT_ENABLE_FOREIGN ""

diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 2c8ea10..2262caa 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1407,6 +1407,46 @@ def_uxsock_timeout_handler(struct config *conf, vector 
strvec)
free(buff);
return 0;
 }
+static int
+def_check_delay_time_handler(struct config *conf, vector strvec)
+{
+   int local_check_delay_time;
+   char *buff;
+
+   buff = set_value(strvec);
+   if (!buff)
+   return 1;
+
+   if (sscanf(buff, "%u", _check_delay_time) == 1 &&
+   local_check_delay_time > DEFAULT_CHECK_DELAY_TIME)
+   conf->check_delay_time = local_check_delay_time;
+   else
+   conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
+
+   free(buff);
+   return 0;
+}
+
+static int

Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-09-20 Thread Wu Guanghao


在 2022/9/16 2:17, Benjamin Marzinski 写道:
> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>> Sorry for the late feedback.
>>
>> The version we are currently testing is 0.8.4, so we only merge the
>> first 3 patches in this series of patches. Then after the actual test,
>> it was found that the effect improvement is not very obvious.
>>
>> Test environment: 1000 multipath devices, 16 paths per device
>> Test command:  Aggregate multipath devices using multipathd add path
>> Time consuming (time for 16 paths to aggregate 1 multipath device):
>> 1. Original:
>>  < 4s:   76.9%;
>>  4s~10s: 18.4%;
>>  >10s:4.7%;
>> 2. After using the patches:
>>  < 4s:   79.1%;
>>  4s~10s: 18.2%;
>>  >10s:2.7%;
>>
>> >From the results, the time-consuming improvement of the patch is not 
>> >obvious,
>> so we made a few changes to the patch and it worked as expected. The 
>> modification
>> of the patch is as follows:
>>
>> Paths_checked is changed to configurable, current setting n = 2.
>> Removed judgment on diff_time.
>> Sleep change from 10us to 5ms
> 
> I worry that this is giving too much deference to the uevents. If there
> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
> worried that this will make it take significantly longer for the the
> path checker to complete a cycle.  Making paths_checked configurable, so
> that this doesn't trigger too often does help to avoid the issue that
> checking the time from chk_start_time was dealing with, and makes the
> mechanics of this simpler.  But 5ms seems like a long time to have to
> wait, just to make sure that another process had the time to grab the
> vecs lock.  Perhaps it would be better to sleep for a shorter length of
> time, but in a loop, where we can check to see if progress has been
> made, perhaps by checking some counter of events and user commands
> serviced. This way, we aren't sleeping too long for no good reason.
> I agree with this, we are also going to adjust the sleep time, and then
continue the test.

>> if (++paths_checked % n == 0 &&
>>  lock_has_waiters(>lock)) {
>>  get_monotonic_time(_time);
>>  timespecsub(_time, _start_time,
>>  _time);
>>  if (diff_time.tv_sec > 0) // Delete the code, goto directly
>>  goto unlock;
>> }
>>
>> if (checker_state != CHECKER_FINISHED) {
>>  /* Yield to waiters */
>>  //struct timespec wait = { .tv_nsec = 1, };
>>  //nanosleep(, NULL);
>>  usleep(5000);  // sleep change from 10us to 5ms
>> }
>>
>> Time consuming(After making the above changes to the patch):
>> < 4s: 99.5%;
>> = 4s: 0.5%;
>>> 4s: 0%;
> 
> Since I believe that this is likely causing a real impact on the checker
> speed, I'm not sure that we're looking for results like this. Are you
> seeing That "path checkers took longer than ..." message? How long does
> it say the checker is taking (and what do you have max_polling_interval
> set to)?
> Yes, I saw this message. In the current test environment, using the original
code, all path checks take at least 50s to complete. We currently set
max_polling_interval=20s, so it must print this message. Adding sleep just
makes this message print more often.

> -Ben
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-09-20 Thread Martin Wilck
On Thu, 2022-09-15 at 14:56 +0800, Wu Guanghao wrote:
> Sorry for the late feedback.
> 
> The version we are currently testing is 0.8.4, so we only merge the
> first 3 patches in this series of patches. Then after the actual
> test,
> it was found that the effect improvement is not very obvious.
> 

Which path checker are you using? 
If it's TUR, could you try to simply change the sync wait time?

static void tur_timeout(struct timespec *tsp)
{
get_monotonic_time(tsp);
tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
normalize_timespec(tsp);
}

Change the 1 millisecond above to e.g. one microsecond. That should
speed up the checker significantly. You will get a higher rate of
"pending" path states, but that shouldn't matter much.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.

2022-09-15 Thread Benjamin Marzinski
On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
> Sorry for the late feedback.
> 
> The version we are currently testing is 0.8.4, so we only merge the
> first 3 patches in this series of patches. Then after the actual test,
> it was found that the effect improvement is not very obvious.
> 
> Test environment: 1000 multipath devices, 16 paths per device
> Test command:  Aggregate multipath devices using multipathd add path
> Time consuming (time for 16 paths to aggregate 1 multipath device):
> 1. Original:
>   < 4s:   76.9%;
>   4s~10s: 18.4%;
>   >10s:4.7%;
> 2. After using the patches:
>   < 4s:   79.1%;
>   4s~10s: 18.2%;
>   >10s:2.7%;
> 
> >From the results, the time-consuming improvement of the patch is not obvious,
> so we made a few changes to the patch and it worked as expected. The 
> modification
> of the patch is as follows:
> 
> Paths_checked is changed to configurable, current setting n = 2.
> Removed judgment on diff_time.
> Sleep change from 10us to 5ms

I worry that this is giving too much deference to the uevents. If there
is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
worried that this will make it take significantly longer for the the
path checker to complete a cycle.  Making paths_checked configurable, so
that this doesn't trigger too often does help to avoid the issue that
checking the time from chk_start_time was dealing with, and makes the
mechanics of this simpler.  But 5ms seems like a long time to have to
wait, just to make sure that another process had the time to grab the
vecs lock.  Perhaps it would be better to sleep for a shorter length of
time, but in a loop, where we can check to see if progress has been
made, perhaps by checking some counter of events and user commands
serviced. This way, we aren't sleeping too long for no good reason.

> if (++paths_checked % n == 0 &&
>   lock_has_waiters(>lock)) {
>   get_monotonic_time(_time);
>   timespecsub(_time, _start_time,
>   _time);
>   if (diff_time.tv_sec > 0) // Delete the code, goto directly
>   goto unlock;
> }
> 
> if (checker_state != CHECKER_FINISHED) {
>   /* Yield to waiters */
>   //struct timespec wait = { .tv_nsec = 1, };
>   //nanosleep(, NULL);
>   usleep(5000);  // sleep change from 10us to 5ms
> }
> 
> Time consuming(After making the above changes to the patch):
> < 4s: 99.5%;
> = 4s: 0.5%;
> > 4s: 0%;

Since I believe that this is likely causing a real impact on the checker
speed, I'm not sure that we're looking for results like this. Are you
seeing That "path checkers took longer than ..." message? How long does
it say the checker is taking (and what do you have max_polling_interval
set to)?

-Ben

> 在 2022/8/18 4:48, Benjamin Marzinski 写道:
> > When there are a huge number of paths (> 1) The amount of time that
> > the checkerloop can hold the vecs lock for while checking the paths can
> > get to be large enough that it starves other vecs lock users.  If path
> > checking takes long enough, it's possible that uxlsnr threads will never
> > run. To deal with this, this patchset makes it possible to drop the vecs
> > lock while checking the paths, and then reacquire it and continue with
> > the next path to check.
> > 
> > My choice of only checking if there are waiters every 128 paths checked
> > and only interrupting if path checking has taken more than a second are
> > arbitrary. I didn't want to slow down path checking in the common case
> > where this isn't an issue, and I wanted to avoid path checking getting
> > starved by other vecs->lock users. Having the checkerloop wait for 1
> > nsec was based on my own testing with a setup using 4K multipath devies
> > with 4 paths each. This was almost always long enough for the uevent or
> > uxlsnr client to grab the vecs lock, but I'm not sure how dependent this
> > is on details of the system. For instance with my setup in never took
> > more than 20 seconds to check the paths. and usually, a looping through
> > all the paths took well under 10 seconds, most often under 5. I would
> > only occasionally run into situations where a uxlsnr client would time
> > out.
> > 
> > V2 Changes:
> > 0003: Switched to a simpler method of determining the path to continue
> >   checking at, as suggested by Martin Wilck. Also fixed a bug when
> >   the previous index was larger than the current vector size.
> > 
> > Benjamin Marzinski (6):
> >   multipathd: Use regular pthread_mutex_t for waiter_lock
> >   multipathd: track waiters for mutex_lock
> >   multipathd: Occasionally allow waiters to interrupt checking paths
> >   multipathd: allow uxlsnr clients to interrupt checking paths
> >   multipathd: fix uxlsnr timeout
> >   multipathd: Don't check if timespec.tv_sec is zero
> > 
> >  libmultipath/lock.h|  16 +
> >  libmultipath/structs.h |   1 +
> >  multipathd/main.c  | 147