In a dm multipath environment, providing end user with an option of selecting preferred path for an I/O in the SAN based on path speed, health status and user preference is found to be useful. This allows a user to select a reliable path over flakey/bad paths thereby achieving higher I/O success rate. The specific scenario in which it is found to be useful is where a user has a need to eliminate the paths experiencing frequent I/O errors due to SAN failures and use the best performing path for I/O whenever it is available. Another scenario where it is found to be useful is in providing option for user to select a high speed path (say 16GB/8GB FC) over alternative low speed paths (4GB/2GB FC).
A new dm path selector kernel loadable module named "dm_pref_path" is introduced to handle preferred path load balance policy (pref-path) operations. The key operations of this policy is to select and return user specified path from the current discovered online/ healthy paths. If the user specified path do not exist in the online/ healthy paths list due to path being currently in failed state or user has mentioned wrong device information, it will fall back to round-robin policy, where all the online/ healthy paths are given equal preference. Functionality provided in this module is verified on wide variety of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets). Signed-off-by: Ravikanth Nalla <ravikanth.na...@hpe.com> --- Documentation/device-mapper/dm-pref-path.txt | 52 ++++++ drivers/md/Makefile | 6 +- drivers/md/dm-pref-path.c | 249 +++++++++++++++++++++++++++ 3 files changed, 304 insertions(+), 3 deletions(-) create mode 100644 Documentation/device-mapper/dm-pref-path.txt create mode 100644 drivers/md/dm-pref-path.c diff --git a/Documentation/device-mapper/dm-pref-path.txt b/Documentation/device-mapper/dm-pref-path.txt new file mode 100644 index 0000000..0efb156b --- /dev/null +++ b/Documentation/device-mapper/dm-pref-path.txt @@ -0,0 +1,52 @@ +dm-pref-path +============ + +dm-pref-path is a path selector module for device-mapper targets, which +selects a user specified path for the incoming I/O. + +The key operations of this policy to select and return user specified +path from the current discovered online/ healthy paths. If the user +specified path do not exist in the online/ healthy path list due to +path being currently in failed state or user has mentioned wrong device +information, it will fall back to round-robin policy, where all the +online/ healthy paths are given equal preference. + +The path selector name is 'pref-path'. + +Table parameters for each path: [<repeat_count>] + +Status for each path: <status> <fail-count> + <status>: 'A' if the path is active, 'F' if the path is failed. + <fail-count>: The number of path failures. + +Algorithm +========= +User is provided with an option to specify preferred path in DM +Multipath configuration file (/etc/multipath.conf) under multipath{} +section with a syntax "path_selector "pref-path 1 <device major>:<device minor>"". + + 1. The pref-path selector would search and return the matching user + preferred path from the online/ healthy path list for incoming I/O. + + 2. If the user preferred path do not exist in the online/ healthy + path list due to path being currently in failed state or user + has mentioned wrong device information, it will fall back to + round-robin policy, where all the online/ healthy paths are given + equal preference. + + 3. If the user preferred path comes back online/ healthy, pref-path + selector would find and return this path for incoming I/O. + +Examples +======== +Consider 4 paths sdq, sdam, sdbh and sdcc, if user prefers path sdbh +with major:minor number 67:176 which has throughput of 8GB/s over other +paths of 4GB/s, pref-path policy will chose this sdbh path for all the +incoming I/O's. + +# dmsetup table Test_Lun_2 +0 20971520 multipath 0 0 1 1 pref-path 0 4 1 66:80 10000 67:160 10000 +68:240 10000 8:240 10000 + +# dmsetup status Test_Lun_2 +0 20971520 multipath 2 0 0 0 1 1 A 0 4 0 66:80 A 0 67:160 A 0 68:240 A diff --git a/drivers/md/Makefile b/drivers/md/Makefile index f34979c..5c9f4e9 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -20,8 +20,8 @@ md-mod-y += md.o bitmap.o raid456-y += raid5.o raid5-cache.o # Note: link order is important. All raid personalities -# and must come before md.o, as they each initialise -# themselves, and md.o may use the personalities when it +# and must come before md.o, as they each initialise +# themselves, and md.o may use the personalities when it # auto-initialised. obj-$(CONFIG_MD_LINEAR) += linear.o @@ -41,7 +41,7 @@ obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o obj-$(CONFIG_DM_CRYPT) += dm-crypt.o obj-$(CONFIG_DM_DELAY) += dm-delay.o obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o -obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o +obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o dm-pref-path.o obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_SWITCH) += dm-switch.o diff --git a/drivers/md/dm-pref-path.c b/drivers/md/dm-pref-path.c new file mode 100644 index 0000000..6bf1c76 --- /dev/null +++ b/drivers/md/dm-pref-path.c @@ -0,0 +1,249 @@ +/* + * (C) Copyright 2015 Hewlett Packard Enterprise Development LP. + * + * dm-pref-path.c + * + * Module Author: Ravikanth Nalla + * + * This program is free software; you can redistribute it + * and/or modify it under the terms of the GNU General Public + * License, version 2 as published by the Free Software Foundation; + * either version 2 of the License, or (at your option) any later + * version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * dm-pref-path path selector: + * Handles preferred path load balance policy operations. The key + * operations of this policy is to select and return user specified + * path from the current discovered online/ healthy paths(valid_paths). + * If the user specified path do not exist in the valid_paths list due + * to path being currently in failed state or user has mentioned wrong + * device information, it will fall back to round-robin policy, where + * all the valid-paths are given equal preference. + * + */ + +#include "dm.h" +#include "dm-path-selector.h" + +#include <linux/slab.h> +#include <linux/ctype.h> +#include <linux/errno.h> +#include <linux/module.h> +#include <linux/atomic.h> + +#define DM_MSG_PREFIX "multipath pref-path" +#define PP_MIN_IO 10000 +#define PP_VERSION "1.0.0" +#define BUFF_LEN 16 + +/* Flag for pref_path enablement */ +unsigned pref_path_enabled; + +/* pref_path major:minor number */ +char pref_path[BUFF_LEN]; + +struct selector { + struct list_head valid_paths; + struct list_head failed_paths; +}; + +struct path_info { + struct list_head list; + struct dm_path *path; + unsigned repeat_count; +}; + +static struct selector *alloc_selector(void) +{ + struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (s) { + INIT_LIST_HEAD(&s->valid_paths); + INIT_LIST_HEAD(&s->failed_paths); + } + + return s; +} + +static int pf_create(struct path_selector *ps, unsigned argc, char +**argv) { + struct selector *s = alloc_selector(); + + if (!s) + return -ENOMEM; + + if ((argc == 1) && strlen(argv[0]) < BUFF_LEN) { + pref_path_enabled = 1; + snprintf(pref_path, (BUFF_LEN-1), "%s", argv[0]); + } + + ps->context = s; + return 0; +} + +static void pf_free_paths(struct list_head *paths) +{ + struct path_info *pi, *next; + + list_for_each_entry_safe(pi, next, paths, list) { + list_del(&pi->list); + kfree(pi); + } +} + +static void pf_destroy(struct path_selector *ps) +{ + struct selector *s = ps->context; + + pf_free_paths(&s->valid_paths); + pf_free_paths(&s->failed_paths); + kfree(s); + ps->context = NULL; +} + +static int pf_status(struct path_selector *ps, struct dm_path *path, + status_type_t type, char *result, unsigned maxlen) { + unsigned sz = 0; + struct path_info *pi; + + /* When called with NULL path, return selector status/args. */ + if (!path) + DMEMIT("0 "); + else { + pi = path->pscontext; + + if (type == STATUSTYPE_TABLE) + DMEMIT("%u ", pi->repeat_count); + } + + return sz; +} + +static int pf_add_path(struct path_selector *ps, struct dm_path *path, + int argc, char **argv, char **error) { + struct selector *s = ps->context; + struct path_info *pi; + + /* + * Arguments: [<pref-path>] + */ + if (argc > 1) { + *error = "pref-path ps: incorrect number of arguments"; + return -EINVAL; + } + + /* Allocate the path information structure */ + pi = kmalloc(sizeof(*pi), GFP_KERNEL); + if (!pi) { + *error = "pref-path ps: Error allocating path information"; + return -ENOMEM; + } + + pi->path = path; + pi->repeat_count = PP_MIN_IO; + + path->pscontext = pi; + + list_add_tail(&pi->list, &s->valid_paths); + + return 0; +} + +static void pf_fail_path(struct path_selector *ps, struct dm_path +*path) { + struct selector *s = ps->context; + struct path_info *pi = path->pscontext; + + list_move(&pi->list, &s->failed_paths); } + +static int pf_reinstate_path(struct path_selector *ps, struct dm_path +*path) { + struct selector *s = ps->context; + struct path_info *pi = path->pscontext; + + list_move_tail(&pi->list, &s->valid_paths); + + return 0; +} + +/* + * Return user preferred path for an I/O. + */ +static struct dm_path *pf_select_path(struct path_selector *ps, + unsigned *repeat_count, size_t nr_bytes) { + struct selector *s = ps->context; + struct path_info *pi = NULL, *best = NULL; + + if (list_empty(&s->valid_paths)) + return NULL; + + if (pref_path_enabled) { + /* search for preferred path in the + * valid list and then return. + */ + list_for_each_entry(pi, &s->valid_paths, list) { + if (!strcmp(pi->path->dev->name, pref_path)) { + best = pi; + *repeat_count = best->repeat_count; + break; + } + } + } + + /* If preferred path is not enabled/ not available/ + * offline chose the next path in the list. + */ + if (best == NULL && !list_empty(&s->valid_paths)) { + pi = list_entry(s->valid_paths.next, + struct path_info, list); + list_move_tail(&pi->list, &s->valid_paths); + best = pi; + *repeat_count = best->repeat_count; + } + + return best ? best->path : NULL; +} + +static struct path_selector_type pf_ps = { + .name = "pref-path", + .module = THIS_MODULE, + .table_args = 1, + .info_args = 0, + .create = pf_create, + .destroy = pf_destroy, + .status = pf_status, + .add_path = pf_add_path, + .fail_path = pf_fail_path, + .reinstate_path = pf_reinstate_path, + .select_path = pf_select_path, +}; + +static int __init dm_pf_init(void) +{ + int r = dm_register_path_selector(&pf_ps); + + if (r < 0) { + DMERR("register failed %d", r); + return r; + } + + DMINFO("version " PP_VERSION " loaded"); + return r; +} + +static void __exit dm_pf_exit(void) +{ + dm_unregister_path_selector(&pf_ps); +} + +module_init(dm_pf_init); +module_exit(dm_pf_exit); + +MODULE_DESCRIPTION(DM_NAME "pref-path multipath path selector"); +MODULE_AUTHOR("ravikanth.na...@hpe.com"); +MODULE_LICENSE("GPL"); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/