Re: [dm-devel] [RFC PATCH] dm service time: measure service time rather than approximate it

2016-04-08 Thread Mike Snitzer
On Fri, Apr 08 2016 at  3:03pm -0400,
Mike Snitzer  wrote:

> On Fri, Apr 08 2016 at  2:58pm -0400,
> Mike Snitzer  wrote:
> 
> > The DM multipath service-time path-selector has historically tracked the
> > amount of outstanding IO per path and used that to approximate the
> > service-time of each path.  In practice this has shown itself to work
> > fairly well but we can do better by measuring the actual service-time
> > during IO completion and using it as the basis for path selection.
> > 
> > Measuring the actual service-time is still prone to inaccuracies given
> > that service-times vary with IO size.  But to counter any potential for
> > drawing incorrect conclusions about the service-times of a given path
> > the measured service-times are reset periodically.
> > 
> > This approach has provided a 10% increase in the selection of a path
> > that was forcibly made to be less loaded than the alternative path.
> > 
> > Reported-by: Todd Gill 
> > Signed-off-by: Mike Snitzer 
> 
> It should be noted that I have not looked at the implications on actual
> throughput or system load.  But I wanted to get this RFC out to see what
> others thought about making dm-service-time more intuitive in its
> implementation.

I have notice fio's total and completion latency ('lat' and 'clat') go
up on this simple SAS testbed:

before:

 write: io=345920KB, bw=34379KB/s, iops=537, runt= 10062msec
slat (usec): min=10, max=47, avg=22.51, stdev= 3.71
clat (msec): min=1, max=146, avg=59.50, stdev=11.84
 lat (msec): min=1, max=146, avg=59.52, stdev=11.84

after:

  write: io=347456KB, bw=34545KB/s, iops=539, runt= 10058msec
slat (usec): min=6, max=46, avg=20.50, stdev= 3.68
clat (usec): min=385, max=146556, avg=59219.94, stdev=11580.00
 lat (usec): min=403, max=146573, avg=59240.87, stdev=11580.57

Which obviously isn't what we want (might speak to why Junichi decided
to approximate service-time)...

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH] dm service time: measure service time rather than approximate it

2016-04-08 Thread Mike Snitzer
On Fri, Apr 08 2016 at  2:58pm -0400,
Mike Snitzer  wrote:

> The DM multipath service-time path-selector has historically tracked the
> amount of outstanding IO per path and used that to approximate the
> service-time of each path.  In practice this has shown itself to work
> fairly well but we can do better by measuring the actual service-time
> during IO completion and using it as the basis for path selection.
> 
> Measuring the actual service-time is still prone to inaccuracies given
> that service-times vary with IO size.  But to counter any potential for
> drawing incorrect conclusions about the service-times of a given path
> the measured service-times are reset periodically.
> 
> This approach has provided a 10% increase in the selection of a path
> that was forcibly made to be less loaded than the alternative path.
> 
> Reported-by: Todd Gill 
> Signed-off-by: Mike Snitzer 

It should be noted that I have not looked at the implications on actual
throughput or system load.  But I wanted to get this RFC out to see what
others thought about making dm-service-time more intuitive in its
implementation.

Junichi, it'd also be great if you could provide historic context for
why you elected to approximate the service-time based on outstanding IO
rather than measure the actual service-time of each path.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [RFC PATCH] dm service time: measure service time rather than approximate it

2016-04-08 Thread Mike Snitzer
The DM multipath service-time path-selector has historically tracked the
amount of outstanding IO per path and used that to approximate the
service-time of each path.  In practice this has shown itself to work
fairly well but we can do better by measuring the actual service-time
during IO completion and using it as the basis for path selection.

Measuring the actual service-time is still prone to inaccuracies given
that service-times vary with IO size.  But to counter any potential for
drawing incorrect conclusions about the service-times of a given path
the measured service-times are reset periodically.

This approach has provided a 10% increase in the selection of a path
that was forcibly made to be less loaded than the alternative path.

Reported-by: Todd Gill 
Signed-off-by: Mike Snitzer 
---
 drivers/md/dm-mpath.c |   6 ++-
 drivers/md/dm-path-selector.h |   5 ++-
 drivers/md/dm-queue-length.c  |   4 +-
 drivers/md/dm-service-time.c  | 101 +++---
 4 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 52baf8a..fb37dff 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -105,6 +105,7 @@ struct multipath {
 struct dm_mpath_io {
struct pgpath *pgpath;
size_t nr_bytes;
+   ktime_t io_start_time;
 };
 
 typedef int (*action_fn) (struct pgpath *pgpath);
@@ -507,7 +508,7 @@ static int __multipath_map(struct dm_target *ti, struct 
request *clone,
if (pgpath->pg->ps.type->start_io)
pgpath->pg->ps.type->start_io(>pg->ps,
  >path,
- nr_bytes);
+ nr_bytes, >io_start_time);
return DM_MAPIO_REMAPPED;
 }
 
@@ -1374,7 +1375,8 @@ static int multipath_end_io(struct dm_target *ti, struct 
request *clone,
if (pgpath) {
ps = >pg->ps;
if (ps->type->end_io)
-   ps->type->end_io(ps, >path, mpio->nr_bytes);
+   ps->type->end_io(ps, >path, mpio->nr_bytes,
+>io_start_time);
}
clear_request_fn_mpio(m, map_context);
 
diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h
index b6eb536..c09d2bb 100644
--- a/drivers/md/dm-path-selector.h
+++ b/drivers/md/dm-path-selector.h
@@ -13,6 +13,7 @@
 #defineDM_PATH_SELECTOR_H
 
 #include 
+#include 
 
 #include "dm-mpath.h"
 
@@ -72,9 +73,9 @@ struct path_selector_type {
   status_type_t type, char *result, unsigned int maxlen);
 
int (*start_io) (struct path_selector *ps, struct dm_path *path,
-size_t nr_bytes);
+size_t nr_bytes, ktime_t *io_start_time);
int (*end_io) (struct path_selector *ps, struct dm_path *path,
-  size_t nr_bytes);
+  size_t nr_bytes, ktime_t *io_start_time);
 };
 
 /* Register a path selector */
diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-length.c
index 23f1786..4affb39 100644
--- a/drivers/md/dm-queue-length.c
+++ b/drivers/md/dm-queue-length.c
@@ -217,7 +217,7 @@ out:
 }
 
 static int ql_start_io(struct path_selector *ps, struct dm_path *path,
-  size_t nr_bytes)
+  size_t nr_bytes, ktime_t *io_start_time)
 {
struct path_info *pi = path->pscontext;
 
@@ -227,7 +227,7 @@ static int ql_start_io(struct path_selector *ps, struct 
dm_path *path,
 }
 
 static int ql_end_io(struct path_selector *ps, struct dm_path *path,
-size_t nr_bytes)
+size_t nr_bytes, ktime_t *io_start_time)
 {
struct path_info *pi = path->pscontext;
 
diff --git a/drivers/md/dm-service-time.c b/drivers/md/dm-service-time.c
index 7b86420..1b3e45a 100644
--- a/drivers/md/dm-service-time.c
+++ b/drivers/md/dm-service-time.c
@@ -19,7 +19,7 @@
 #define ST_MAX_RELATIVE_THROUGHPUT 100
 #define ST_MAX_RELATIVE_THROUGHPUT_SHIFT   7
 #define ST_MAX_INFLIGHT_SIZE   ((size_t)-1 >> ST_MAX_RELATIVE_THROUGHPUT_SHIFT)
-#define ST_VERSION "0.3.0"
+#define ST_VERSION "0.4.0"
 
 struct selector {
struct list_head valid_paths;
@@ -32,7 +32,8 @@ struct path_info {
struct dm_path *path;
unsigned repeat_count;
unsigned relative_throughput;
-   atomic_t in_flight_size;/* Total size of in-flight I/Os */
+   s64 service_time_usecs;
+   unsigned select_count;
 };
 
 static struct selector *alloc_selector(void)
@@ -92,7 +93,7 @@ static int st_status(struct path_selector *ps, struct dm_path 
*path,
 
switch (type) {
case STATUSTYPE_INFO:
-   DMEMIT("%d %u ", atomic_read(>in_flight_size),
+   DMEMIT("%u %u ", (unsigned)pi->service_time_usecs,