Re: [dm-devel] multipath-tools: add ANA support for NVMe device
On Mon, 2018-11-12 at 16:53 -0500, Mike Snitzer wrote: > > I (and others) think it makes sense to at least triple check with the > NVMe developers (now cc'd) to see if we could get agreement on the > nvme > driver providing the ANA state via sysfs (when modparam > nvme_core.multipath=N is set), like Hannes proposed here: > http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html > > Then the userspace multipath-tools ANA support could just read sysfs > rather than reinvent harvesting the ANA state via ioctl. If some kernels expose the sysfs attributes and some don't, what are we supposed to do? In order to be portable, multipath-tools (and other user space tools, for that matter) need to support both, and maintain multiple code paths. Just like SCSI :-) I'd really like to see this abstracted away with a "libnvme" (you name it), which user space tools could simply call without having to worry how it actually talks to the kernel. Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm integrity: Document size and format of superblock fields
On 11/9/18 12:21 PM, Andy Grover wrote: As mentioned elsewhere in dm-integrity.txt, creating a new integrity device requires creating a small integrity device on top of the base device that formats the base device, reading the provided data sectors out of the superblock, and then recreating the integrity device with the correct size. For this, userspace must know the offset, length, and endianness of the provided_data_sectors field in the superblock. Document all fields mentioned in the txt to include this, based on struct superblock in dm-integrity.c. Extra fields in struct superblock not already mentioned in the txt remain undocumented. In 4.19 I just noticed provided_data_sectors is now included in dm status. I'm assuming that is now the preferred way for userspace to discover this value? Thus making reading it from the on-disk superblock unnecessary, and thus *documenting* the superblock format unnecessary. Sounds good. So please disregard this patch, although some different documentation changes are probably now needed. Thanks -- Regards -- Andy p.s. I'd just run across an issue where creating an integrity device on a loopback device would result in the superblock still reading as all zeroes. Another reason to do it the new way :) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] multipath-tools: add ANA support for NVMe device
On Mon, Nov 12 2018 at 11:23am -0500, Martin Wilck wrote: > Hello Lijie, > > On Thu, 2018-11-08 at 14:09 +0800, lijie wrote: > > Add support for Asynchronous Namespace Access as specified in NVMe > > 1.3 > > TP 4004. The states are updated through reading the ANA log page. > > > > By default, the native nvme multipath takes over the nvme device. > > We can pass a false to the parameter 'multipath' of the nvme-core.ko > > module,when we want to use multipath-tools. > > Thank you for the patch. It looks quite good to me. I've tested it with > a Linux target and found no problems so far. > > I have a few questions and comments inline below. > > I suggest you also have a look at detect_prio(); it seems to make sense > to use the ana prioritizer for NVMe paths automatically if ANA is > supported (with your patch, "detect_prio no" and "prio ana" have to be > configured explicitly). But that can be done in a later patch. I (and others) think it makes sense to at least triple check with the NVMe developers (now cc'd) to see if we could get agreement on the nvme driver providing the ANA state via sysfs (when modparam nvme_core.multipath=N is set), like Hannes proposed here: http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html Then the userspace multipath-tools ANA support could just read sysfs rather than reinvent harvesting the ANA state via ioctl. But if we continue to hit a wall on that type of sharing of the nvme driver's state then I'm fine with reinventing ANA state inquiry and tracking like has been proposed here. Mike p.s. thanks for your review Martin, we really need to determine the way forward for full multipath-tools support of NVMe with ANA. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools: add ANA support for NVMe device
Hello Lijie, On Thu, 2018-11-08 at 14:09 +0800, lijie wrote: > Add support for Asynchronous Namespace Access as specified in NVMe > 1.3 > TP 4004. The states are updated through reading the ANA log page. > > By default, the native nvme multipath takes over the nvme device. > We can pass a false to the parameter 'multipath' of the nvme-core.ko > module,when we want to use multipath-tools. Thank you for the patch. It looks quite good to me. I've tested it with a Linux target and found no problems so far. I have a few questions and comments inline below. I suggest you also have a look at detect_prio(); it seems to make sense to use the ana prioritizer for NVMe paths automatically if ANA is supported (with your patch, "detect_prio no" and "prio ana" have to be configured explicitly). But that can be done in a later patch. Martin > --- > libmultipath/prio.h| 1 + > libmultipath/prioritizers/Makefile | 1 + > libmultipath/prioritizers/ana.c| 292 > + > libmultipath/prioritizers/ana.h| 221 > > multipath/multipath.conf.5 | 8 + > 5 files changed, 523 insertions(+) > create mode 100644 libmultipath/prioritizers/ana.c > create mode 100644 libmultipath/prioritizers/ana.h > > diff --git a/libmultipath/prio.h b/libmultipath/prio.h > index aa587cc..599d1d8 100644 > --- a/libmultipath/prio.h > +++ b/libmultipath/prio.h > @@ -30,6 +30,7 @@ struct path; > #define PRIO_WEIGHTED_PATH "weightedpath" > #define PRIO_SYSFS "sysfs" > #define PRIO_PATH_LATENCY"path_latency" > +#define PRIO_ANA "ana" > > /* > * Value used to mark the fact prio was not defined > diff --git a/libmultipath/prioritizers/Makefile > b/libmultipath/prioritizers/Makefile > index ab7bc07..15afaba 100644 > --- a/libmultipath/prioritizers/Makefile > +++ b/libmultipath/prioritizers/Makefile > @@ -19,6 +19,7 @@ LIBS = \ > libpriordac.so \ > libprioweightedpath.so \ > libpriopath_latency.so \ > + libprioana.so \ > libpriosysfs.so > > all: $(LIBS) > diff --git a/libmultipath/prioritizers/ana.c > b/libmultipath/prioritizers/ana.c > new file mode 100644 > index 000..c5aaa5f > --- /dev/null > +++ b/libmultipath/prioritizers/ana.c > @@ -0,0 +1,292 @@ > +/* > + * (C) Copyright HUAWEI Technology Corp. 2017 All Rights Reserved. > + * > + * ana.c > + * Version 1.00 > + * > + * Tool to make use of a NVMe-feature called Asymmetric Namespace > Access. > + * It determines the ANA state of a device and prints a priority > value to stdout. > + * > + * Author(s): Cheng Jike > + *Li Jie > + * > + * This file is released under the GPL version 2, or any later > version. > + */ > +#include > +#include > +#include > +#include > +#include > + > +#include "debug.h" > +#include "prio.h" > +#include "structs.h" > +#include "ana.h" > + > +enum { > + ANA_PRIO_OPTIMIZED = 50, > + ANA_PRIO_NONOPTIMIZED = 10, > + ANA_PRIO_INACCESSIBLE = 5, > + ANA_PRIO_PERSISTENT_LOSS= 1, > + ANA_PRIO_CHANGE = 0, > + ANA_PRIO_RESERVED = 0, > + ANA_PRIO_GETCTRL_FAILED = -1, > + ANA_PRIO_NOT_SUPPORTED = -2, > + ANA_PRIO_GETANAS_FAILED = -3, > + ANA_PRIO_GETANALOG_FAILED = -4, > + ANA_PRIO_GETNSID_FAILED = -5, > + ANA_PRIO_GETNS_FAILED = -6, > + ANA_PRIO_NO_MEMORY = -7, > + ANA_PRIO_NO_INFORMATION = -8, > +}; > + > +static const char * anas_string[] = { > + [NVME_ANA_OPTIMIZED]= "ANA Optimized > State", > + [NVME_ANA_NONOPTIMIZED] = "ANA Non-Optimized > State", > + [NVME_ANA_INACCESSIBLE] = "ANA Inaccessible > State", > + [NVME_ANA_PERSISTENT_LOSS] = "ANA Persistent > Loss State", > + [NVME_ANA_CHANGE] = "ANA Change state", > + [NVME_ANA_RESERVED] = "Invalid namespace > group state!", > +}; > + > +static const char *aas_print_string(int rc) > +{ > + rc &= 0xff; > + > + switch(rc) { > + case NVME_ANA_OPTIMIZED: > + case NVME_ANA_NONOPTIMIZED: > + case NVME_ANA_INACCESSIBLE: > + case NVME_ANA_PERSISTENT_LOSS: > + case NVME_ANA_CHANGE: > + return anas_string[rc]; > + default: > + return anas_string[NVME_ANA_RESERVED]; > + } > + > + return anas_string[NVME_ANA_RESERVED]; > +} nit: the last statement is superfluous. > + > +static int nvme_get_nsid(int fd, unsigned *nsid) > +{ > + static struct stat nvme_stat; > + int err = fstat(fd, &nvme_stat); > + if (err < 0) > + return 1; > + > + if (!S_ISBLK(nvme_stat.st_mode)) { > + condlog(0, "Error: requesting namespace-id from non- > block device\n"); > + return 1; > + } > + > + *nsid = ioctl(fd, NVME_IOCTL_ID); > + retu
[dm-devel] unnecessary "memset(n, 0, block_size); " in dm_btree_empty ?
Hi, Sorry to trouble you. I'm reading the code of persistent data, and noticed: dm_btree_empty() -> new_block() -> dm_tm_new_block() to get a new block allocated. And the new block is zeroed with write lock held. "memset(n, 0, block_size);" in dm_btree_empty() seems unnecessary. Right? Thanks, shenghui -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel