Re: [ndctl PATCH 6/8] ndctl/test: add a new unit test for inject-error
On Thu, 2017-10-05 at 19:54 -0600, Vishal Verma wrote: > +do_tests() > +{ > + # TODO > + return > +} Not sure how this happened, but I managed to include a stale version of this file or something, that only has the boilerplate. I'll fix up for the next rev. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT
On Mon, Oct 09, 2017 at 10:08:40AM -0700, Dan Williams wrote: > On Sun, Oct 8, 2017 at 8:40 PM, Dave Chinnerwrote: > >> > >> /* > >> * Clear the specified ranges to zero through either the pagecache or DAX. > >> @@ -1008,6 +1018,26 @@ xfs_file_llseek( > >> return vfs_setpos(file, offset, inode->i_sb->s_maxbytes); > >> } > >> > >> +static int > >> +xfs_vma_checks( > >> + struct vm_area_struct *vma, > >> + struct inode*inode) > > > > Exactly what are we checking for - function name doesn't tell me, > > and there's no comments, either? > > Ok, I'll improve this. > > > > >> +{ > >> + if (!is_xfs_map_direct(vma)) > >> + return 0; > >> + > >> + if (!is_map_direct_valid(vma->vm_private_data)) > >> + return VM_FAULT_SIGBUS; > >> + > >> + if (xfs_is_reflink_inode(XFS_I(inode))) > >> + return VM_FAULT_SIGBUS; > >> + > >> + if (!IS_DAX(inode)) > >> + return VM_FAULT_SIGBUS; > > > > And how do we get is_xfs_map_direct() set to true if we don't have a > > DAX inode or the inode has shared extents? > > So, this was my way of trying to satisfy the request you made here: > > https://lkml.org/lkml/2017/8/11/876 > > i.e. allow MAP_DIRECT on non-dax files to enable a use case of > freezing the block-map to examine which file extents are linked. If > you don't want to use MAP_DIRECT for this, we can move these checks to > mmap time. Ok, but I don't want to use mmap to deal with this, nor do I care whether DAX is in use or not. So I don't think this is really necessary for MAP_DIRECT. > >> +xfs_file_mmap_validate( > >> + struct file *filp, > >> + struct vm_area_struct *vma, > >> + unsigned long map_flags, > >> + int fd) > >> +{ > >> + struct inode*inode = file_inode(filp); > >> + struct xfs_inode*ip = XFS_I(inode); > >> + struct map_direct_state *mds; > >> + > >> + if (map_flags & ~(XFS_MAP_SUPPORTED)) > >> + return -EOPNOTSUPP; > >> + > >> + if ((map_flags & MAP_DIRECT) == 0) > >> + return xfs_file_mmap(filp, vma); > >> + > >> + file_accessed(filp); > >> + vma->vm_ops = _file_vm_direct_ops; > >> + if (IS_DAX(inode)) > >> + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; > > > > And if it isn't a DAX inode? what is MAP_DIRECT supposed to do then? > > In the non-DAX case it just takes the FL_LAYOUT file lease... although > we could also just have an fcntl for that purpose. The use case of > just freezing the block map does not need a mapping. RIght, so I think we should just add a fcntl for the non-DAX case I have in mind, and not complicate the MAP_DIRECT implementation right now. We can alsways extend the scope of MAP_DIRECT in future if we actually need to do so. > >> + mds = map_direct_register(fd, vma); > >> + if (IS_ERR(mds)) > >> + return PTR_ERR(mds); > >> + > >> + /* flush in-flight faults */ > >> + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); > >> + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); > > > > Urk. That's nasty. And why is it even necessary? Please explain why > > this is necessary in the comment, because it's not at all obvious to > > me... > > This is related to your other observation about i_mapdcount and adding > an iomap_can_allocate() helper. I think I can clean both of these up > by using a call to break_layout(inode, false) and bailing in > ->iomap_begin() if it returns EWOULDBLOCK. This would also fix the > current problem that allocating write-faults don't start the lease > break process. OK. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v3 2/4] nfit_test: add error injection DSMs
From: Dave JiangAdd nfit_test emulation for the new ACPI 6.2 error injectino DSMs. This will allow unit tests to selectively inject the errors they wish to test for. Signed-off-by: Dave Jiang [vishal: Move injection functions to ND_CMD_CALL] [vishal: Add support for the notification option] [vishal: move an nfit_test private definition into a local header] Signed-off-by: Vishal Verma --- tools/testing/nvdimm/test/nfit.c | 187 +- tools/testing/nvdimm/test/nfit_test.h | 5 + 2 files changed, 168 insertions(+), 24 deletions(-) diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index 435d298..43f948e 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -168,8 +168,12 @@ struct nfit_test { spinlock_t lock; } ars_state; struct device *dimm_dev[NUM_DCR]; + struct badrange badrange; + struct work_struct work; }; +static struct workqueue_struct *nfit_wq; + static struct nfit_test *to_nfit_test(struct device *dev) { struct platform_device *pdev = to_platform_device(dev); @@ -234,48 +238,68 @@ static int nfit_test_cmd_set_config_data(struct nd_cmd_set_config_hdr *nd_cmd, return rc; } -#define NFIT_TEST_ARS_RECORDS 4 #define NFIT_TEST_CLEAR_ERR_UNIT 256 static int nfit_test_cmd_ars_cap(struct nd_cmd_ars_cap *nd_cmd, unsigned int buf_len) { + int ars_recs; + if (buf_len < sizeof(*nd_cmd)) return -EINVAL; + /* for testing, only store up to n records fit withint 4k */ + ars_recs = SZ_4K / sizeof(struct nd_ars_record); + nd_cmd->max_ars_out = sizeof(struct nd_cmd_ars_status) - + NFIT_TEST_ARS_RECORDS * sizeof(struct nd_ars_record); + + ars_recs * sizeof(struct nd_ars_record); nd_cmd->status = (ND_ARS_PERSISTENT | ND_ARS_VOLATILE) << 16; nd_cmd->clear_err_unit = NFIT_TEST_CLEAR_ERR_UNIT; return 0; } -/* - * Initialize the ars_state to return an ars_result 1 second in the future with - * a 4K error range in the middle of the requested address range. - */ -static void post_ars_status(struct ars_state *ars_state, u64 addr, u64 len) +static void post_ars_status(struct ars_state *ars_state, + struct badrange *badrange, u64 addr, u64 len) { struct nd_cmd_ars_status *ars_status; struct nd_ars_record *ars_record; + struct badrange_entry *be; + u64 end = addr + len - 1; + int i = 0; ars_state->deadline = jiffies + 1*HZ; ars_status = ars_state->ars_status; ars_status->status = 0; - ars_status->out_length = sizeof(struct nd_cmd_ars_status) - + sizeof(struct nd_ars_record); ars_status->address = addr; ars_status->length = len; ars_status->type = ND_ARS_PERSISTENT; - ars_status->num_records = 1; - ars_record = _status->records[0]; - ars_record->handle = 0; - ars_record->err_address = addr + len / 2; - ars_record->length = SZ_4K; + + spin_lock(>lock); + list_for_each_entry(be, >list, list) { + u64 be_end = be->start + be->length - 1; + u64 rstart, rend; + + /* skip entries outside the range */ + if (be_end < addr || be->start > end) + continue; + + rstart = (be->start < addr) ? addr : be->start; + rend = (be_end < end) ? be_end : end; + ars_record = _status->records[i]; + ars_record->handle = 0; + ars_record->err_address = rstart; + ars_record->length = rend - rstart + 1; + i++; + } + spin_unlock(>lock); + ars_status->num_records = i; + ars_status->out_length = sizeof(struct nd_cmd_ars_status) + + i * sizeof(struct nd_ars_record); } -static int nfit_test_cmd_ars_start(struct ars_state *ars_state, +static int nfit_test_cmd_ars_start(struct nfit_test *t, + struct ars_state *ars_state, struct nd_cmd_ars_start *ars_start, unsigned int buf_len, int *cmd_rc) { @@ -289,7 +313,7 @@ static int nfit_test_cmd_ars_start(struct ars_state *ars_state, } else { ars_start->status = 0; ars_start->scrub_time = 1; - post_ars_status(ars_state, ars_start->address, + post_ars_status(ars_state, >badrange, ars_start->address, ars_start->length); *cmd_rc = 0; } @@ -456,6 +480,93 @@ static int nfit_test_cmd_smart_threshold(struct nd_cmd_smart_threshold *smart_t, return 0; } +static void uc_error_notify(struct work_struct *work) +{ + struct nfit_test *t = container_of(work, typeof(*t), work); + + __acpi_nfit_notify(>pdev.dev,
[PATCH v3 1/4] libnvdimm: move poison list functions to a new 'badrange' file
From: Dave Jiangnfit_test needs to use the poison list manipulation code as well. Make it more generic and in the process rename poison to badrange, and move all the related helpers to a new file. Signed-off-by: Dave Jiang [vishal: Add badrange.o to nfit_test's Kbuild] [vishal: add a missed include in bus.c for the new badrange functions] [vishal: rename all instances of 'be' to 'bre'] Signed-off-by: Vishal Verma --- drivers/acpi/nfit/core.c| 2 +- drivers/acpi/nfit/mce.c | 2 +- drivers/nvdimm/Makefile | 1 + drivers/nvdimm/badrange.c | 294 drivers/nvdimm/bus.c| 24 ++-- drivers/nvdimm/core.c | 260 +-- drivers/nvdimm/nd-core.h| 3 +- drivers/nvdimm/nd.h | 6 - include/linux/libnvdimm.h | 21 +++- tools/testing/nvdimm/Kbuild | 1 + 10 files changed, 332 insertions(+), 282 deletions(-) create mode 100644 drivers/nvdimm/badrange.c diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index a3ecd5e..4b157f8 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct acpi_nfit_desc *acpi_desc, if (ars_status->out_length < 44 + sizeof(struct nd_ars_record) * (i + 1)) break; - rc = nvdimm_bus_add_poison(nvdimm_bus, + rc = nvdimm_bus_add_badrange(nvdimm_bus, ars_status->records[i].err_address, ars_status->records[i].length); if (rc) diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c index feeb95d..b929214 100644 --- a/drivers/acpi/nfit/mce.c +++ b/drivers/acpi/nfit/mce.c @@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val, continue; /* If this fails due to an -ENOMEM, there is little we can do */ - nvdimm_bus_add_poison(acpi_desc->nvdimm_bus, + nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus, ALIGN(mce->addr, L1_CACHE_BYTES), L1_CACHE_BYTES); nvdimm_region_notify(nfit_spa->nd_region, diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile index 909554c..ca6d325 100644 --- a/drivers/nvdimm/Makefile +++ b/drivers/nvdimm/Makefile @@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o libnvdimm-y += region.o libnvdimm-y += namespace_devs.o libnvdimm-y += label.o +libnvdimm-y += badrange.o libnvdimm-$(CONFIG_ND_CLAIM) += claim.o libnvdimm-$(CONFIG_BTT) += btt_devs.o libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c new file mode 100644 index 000..0b67dcf --- /dev/null +++ b/drivers/nvdimm/badrange.c @@ -0,0 +1,294 @@ +/* + * Copyright(c) 2017 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "nd-core.h" +#include "nd.h" + +void badrange_init(struct badrange *badrange) +{ + INIT_LIST_HEAD(>list); + spin_lock_init(>lock); +} +EXPORT_SYMBOL_GPL(badrange_init); + +static void append_badrange_entry(struct badrange *badrange, + struct badrange_entry *bre, u64 addr, u64 length) +{ + lockdep_assert_held(>lock); + bre->start = addr; + bre->length = length; + list_add_tail(>list, >list); +} + +static int alloc_and_append_badrange_entry(struct badrange *badrange, + u64 addr, u64 length, gfp_t flags) +{ + struct badrange_entry *bre; + + bre = kzalloc(sizeof(*bre), flags); + if (!bre) + return -ENOMEM; + + append_badrange_entry(badrange, bre, addr, length); + return 0; +} + +static int add_badrange(struct badrange *badrange, u64 addr, u64 length) +{ + struct badrange_entry *bre, *bre_new; + + spin_unlock(>lock); + bre_new = kzalloc(sizeof(*bre_new), GFP_KERNEL); + spin_lock(>lock); + + if (list_empty(>list)) { + if (!bre_new) + return -ENOMEM; + append_badrange_entry(badrange, bre_new, addr, length); + return 0; + } + + /* +* There is a chance this is a duplicate, check for those first. +* This
[PATCH v3 4/4] nfit_test: when clearing poison, also remove badrange entries
The injected badrange entries can only be cleared from the kernel's accounting by writing to the affected blocks, so when such a write sends the clear errror DSM to nfit_test, also clear the ranges from nfit_test's badrange list. This lets an 'ARS Inject error status' DSM to return the correct status, omitting the cleared ranges. Cc: Dave JiangCc: Dan Williams Signed-off-by: Vishal Verma Reviewed-by: Dave Jiang --- tools/testing/nvdimm/test/nfit.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index 43f948e..e400418 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -344,7 +344,8 @@ static int nfit_test_cmd_ars_status(struct ars_state *ars_state, return 0; } -static int nfit_test_cmd_clear_error(struct nd_cmd_clear_error *clear_err, +static int nfit_test_cmd_clear_error(struct nfit_test *t, + struct nd_cmd_clear_error *clear_err, unsigned int buf_len, int *cmd_rc) { const u64 mask = NFIT_TEST_CLEAR_ERR_UNIT - 1; @@ -354,12 +355,7 @@ static int nfit_test_cmd_clear_error(struct nd_cmd_clear_error *clear_err, if ((clear_err->address & mask) || (clear_err->length & mask)) return -EINVAL; - /* -* Report 'all clear' success for all commands even though a new -* scrub will find errors again. This is enough to have the -* error removed from the 'badblocks' tracking in the pmem -* driver. -*/ + badrange_forget(>badrange, clear_err->address, clear_err->length); clear_err->status = 0; clear_err->cleared = clear_err->length; *cmd_rc = 0; @@ -687,7 +683,7 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc, cmd_rc); break; case ND_CMD_CLEAR_ERROR: - rc = nfit_test_cmd_clear_error(buf, buf_len, cmd_rc); + rc = nfit_test_cmd_clear_error(t, buf, buf_len, cmd_rc); break; default: return -ENOTTY; -- 2.9.5 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v3 0/4] add error injection commands to nfit_test
v3: patch 1,2: - move the nfit_test kbuild update for badrange to patch 1 (Dan) v2: patch 1: - change all instances of 'be' to 'bre' to avoid confusion with big endian (Dan) patch 2: - move an injection related define to a local nfit_test header since it is not used outside of nfit_test (Dan) These patches add error injection support to nfit_test by emulating the ACPI6.2 ARS error injection commands. The commands are sent via the ND_CMD_CALL interface, so only nfit_test knows of the various definitions related to this. Note that this patch set will break ndctl unit tests unless the ndctl patches for error injection are also applied. Dave Jiang (2): libnvdimm: move poison list functions to a new 'badrange' file nfit_test: add error injection DSMs Vishal Verma (2): libnvdimm, badrange: remove a WARN for list_empty nfit_test: when clearing poison, also remove badrange entries drivers/acpi/nfit/core.c | 2 +- drivers/acpi/nfit/mce.c | 2 +- drivers/nvdimm/Makefile | 1 + drivers/nvdimm/badrange.c | 293 ++ drivers/nvdimm/bus.c | 24 +-- drivers/nvdimm/core.c | 260 +- drivers/nvdimm/nd-core.h | 3 +- drivers/nvdimm/nd.h | 6 - include/linux/libnvdimm.h | 21 ++- tools/testing/nvdimm/Kbuild | 1 + tools/testing/nvdimm/test/nfit.c | 199 +++ tools/testing/nvdimm/test/nfit_test.h | 5 + 12 files changed, 503 insertions(+), 314 deletions(-) create mode 100644 drivers/nvdimm/badrange.c -- 2.9.5 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v3 3/4] libnvdimm, badrange: remove a WARN for list_empty
Now that we're reusing the badrange functions for nfit_test, and that exposes badrange injection/clearing to userspace via the DSM paths, it is plausible that a user may call the clear DSM multiple times. Since it is harmless to do so, we can remove the WARN in badrange_forget. Cc: Dave JiangCc: Dan Williams Signed-off-by: Vishal Verma --- drivers/nvdimm/badrange.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c index 0b67dcf..e068d72 100644 --- a/drivers/nvdimm/badrange.c +++ b/drivers/nvdimm/badrange.c @@ -114,7 +114,6 @@ void badrange_forget(struct badrange *badrange, phys_addr_t start, struct badrange_entry *bre, *next; spin_lock(>lock); - WARN_ON_ONCE(list_empty(badrange_list)); /* * [start, clr_end] is the badrange interval being cleared. -- 2.9.5 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 2/4] nfit_test: add error injection DSMs
From: Dave JiangAdd nfit_test emulation for the new ACPI 6.2 error injectino DSMs. This will allow unit tests to selectively inject the errors they wish to test for. Signed-off-by: Dave Jiang [vishal: Add badrange.o to nfit_test's Kbuild] [vishal: Move injection functions to ND_CMD_CALL] [vishal: Add support for the notification option] [vishal: move an nfit_test private definition into a local header] Signed-off-by: Vishal Verma --- tools/testing/nvdimm/Kbuild | 1 + tools/testing/nvdimm/test/nfit.c | 187 +- tools/testing/nvdimm/test/nfit_test.h | 5 + 3 files changed, 169 insertions(+), 24 deletions(-) diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild index d870520..3272ab5 100644 --- a/tools/testing/nvdimm/Kbuild +++ b/tools/testing/nvdimm/Kbuild @@ -69,6 +69,7 @@ libnvdimm-y += $(NVDIMM_SRC)/region_devs.o libnvdimm-y += $(NVDIMM_SRC)/region.o libnvdimm-y += $(NVDIMM_SRC)/namespace_devs.o libnvdimm-y += $(NVDIMM_SRC)/label.o +libnvdimm-y += $(NVDIMM_SRC)/badrange.o libnvdimm-$(CONFIG_ND_CLAIM) += $(NVDIMM_SRC)/claim.o libnvdimm-$(CONFIG_BTT) += $(NVDIMM_SRC)/btt_devs.o libnvdimm-$(CONFIG_NVDIMM_PFN) += $(NVDIMM_SRC)/pfn_devs.o diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index 435d298..43f948e 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -168,8 +168,12 @@ struct nfit_test { spinlock_t lock; } ars_state; struct device *dimm_dev[NUM_DCR]; + struct badrange badrange; + struct work_struct work; }; +static struct workqueue_struct *nfit_wq; + static struct nfit_test *to_nfit_test(struct device *dev) { struct platform_device *pdev = to_platform_device(dev); @@ -234,48 +238,68 @@ static int nfit_test_cmd_set_config_data(struct nd_cmd_set_config_hdr *nd_cmd, return rc; } -#define NFIT_TEST_ARS_RECORDS 4 #define NFIT_TEST_CLEAR_ERR_UNIT 256 static int nfit_test_cmd_ars_cap(struct nd_cmd_ars_cap *nd_cmd, unsigned int buf_len) { + int ars_recs; + if (buf_len < sizeof(*nd_cmd)) return -EINVAL; + /* for testing, only store up to n records fit withint 4k */ + ars_recs = SZ_4K / sizeof(struct nd_ars_record); + nd_cmd->max_ars_out = sizeof(struct nd_cmd_ars_status) - + NFIT_TEST_ARS_RECORDS * sizeof(struct nd_ars_record); + + ars_recs * sizeof(struct nd_ars_record); nd_cmd->status = (ND_ARS_PERSISTENT | ND_ARS_VOLATILE) << 16; nd_cmd->clear_err_unit = NFIT_TEST_CLEAR_ERR_UNIT; return 0; } -/* - * Initialize the ars_state to return an ars_result 1 second in the future with - * a 4K error range in the middle of the requested address range. - */ -static void post_ars_status(struct ars_state *ars_state, u64 addr, u64 len) +static void post_ars_status(struct ars_state *ars_state, + struct badrange *badrange, u64 addr, u64 len) { struct nd_cmd_ars_status *ars_status; struct nd_ars_record *ars_record; + struct badrange_entry *be; + u64 end = addr + len - 1; + int i = 0; ars_state->deadline = jiffies + 1*HZ; ars_status = ars_state->ars_status; ars_status->status = 0; - ars_status->out_length = sizeof(struct nd_cmd_ars_status) - + sizeof(struct nd_ars_record); ars_status->address = addr; ars_status->length = len; ars_status->type = ND_ARS_PERSISTENT; - ars_status->num_records = 1; - ars_record = _status->records[0]; - ars_record->handle = 0; - ars_record->err_address = addr + len / 2; - ars_record->length = SZ_4K; + + spin_lock(>lock); + list_for_each_entry(be, >list, list) { + u64 be_end = be->start + be->length - 1; + u64 rstart, rend; + + /* skip entries outside the range */ + if (be_end < addr || be->start > end) + continue; + + rstart = (be->start < addr) ? addr : be->start; + rend = (be_end < end) ? be_end : end; + ars_record = _status->records[i]; + ars_record->handle = 0; + ars_record->err_address = rstart; + ars_record->length = rend - rstart + 1; + i++; + } + spin_unlock(>lock); + ars_status->num_records = i; + ars_status->out_length = sizeof(struct nd_cmd_ars_status) + + i * sizeof(struct nd_ars_record); } -static int nfit_test_cmd_ars_start(struct ars_state *ars_state, +static int nfit_test_cmd_ars_start(struct nfit_test *t, + struct ars_state *ars_state, struct nd_cmd_ars_start *ars_start, unsigned int buf_len, int *cmd_rc) { @@ -289,7 +313,7 @@ static int
[PATCH v2 0/4] add error injection commands to nfit_test
v2: patch 1: - change all instances of 'be' to 'bre' to avoid confusion with big endian (Dan) patch 2: - move an injection related define to a local nfit_test header since it is not used outside of nfit_test (Dan) These patches add error injection support to nfit_test by emulating the ACPI6.2 ARS error injection commands. The commands are sent via the ND_CMD_CALL interface, so only nfit_test knows of the various definitions related to this. Note that this patch set will break ndctl unit tests unless the ndctl patches for error injection are also applied. Dave Jiang (2): libnvdimm: move poison list functions to a new 'badrange' file nfit_test: add error injection DSMs Vishal Verma (2): libnvdimm, badrange: remove a WARN for list_empty nfit_test: when clearing poison, also remove badrange entries drivers/acpi/nfit/core.c | 2 +- drivers/acpi/nfit/mce.c | 2 +- drivers/nvdimm/Makefile | 1 + drivers/nvdimm/badrange.c | 293 ++ drivers/nvdimm/bus.c | 24 +-- drivers/nvdimm/core.c | 260 +- drivers/nvdimm/nd-core.h | 3 +- drivers/nvdimm/nd.h | 6 - include/linux/libnvdimm.h | 21 ++- tools/testing/nvdimm/Kbuild | 1 + tools/testing/nvdimm/test/nfit.c | 199 +++ tools/testing/nvdimm/test/nfit_test.h | 5 + 12 files changed, 503 insertions(+), 314 deletions(-) create mode 100644 drivers/nvdimm/badrange.c -- 2.9.5 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 1/4] libnvdimm: move poison list functions to a new 'badrange' file
From: Dave Jiangnfit_test needs to use the poison list manipulation code as well. Make it more generic and in the process rename poison to badrange, and move all the related helpers to a new file. Signed-off-by: Dave Jiang [vishal: add a missed include in bus.c for the new badrange functions] [vishal: rename all instances of 'be' to 'bre'] Signed-off-by: Vishal Verma --- drivers/acpi/nfit/core.c | 2 +- drivers/acpi/nfit/mce.c | 2 +- drivers/nvdimm/Makefile | 1 + drivers/nvdimm/badrange.c | 294 ++ drivers/nvdimm/bus.c | 24 ++-- drivers/nvdimm/core.c | 260 +--- drivers/nvdimm/nd-core.h | 3 +- drivers/nvdimm/nd.h | 6 - include/linux/libnvdimm.h | 21 +++- 9 files changed, 331 insertions(+), 282 deletions(-) create mode 100644 drivers/nvdimm/badrange.c diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index a3ecd5e..4b157f8 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct acpi_nfit_desc *acpi_desc, if (ars_status->out_length < 44 + sizeof(struct nd_ars_record) * (i + 1)) break; - rc = nvdimm_bus_add_poison(nvdimm_bus, + rc = nvdimm_bus_add_badrange(nvdimm_bus, ars_status->records[i].err_address, ars_status->records[i].length); if (rc) diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c index feeb95d..b929214 100644 --- a/drivers/acpi/nfit/mce.c +++ b/drivers/acpi/nfit/mce.c @@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val, continue; /* If this fails due to an -ENOMEM, there is little we can do */ - nvdimm_bus_add_poison(acpi_desc->nvdimm_bus, + nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus, ALIGN(mce->addr, L1_CACHE_BYTES), L1_CACHE_BYTES); nvdimm_region_notify(nfit_spa->nd_region, diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile index 909554c..ca6d325 100644 --- a/drivers/nvdimm/Makefile +++ b/drivers/nvdimm/Makefile @@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o libnvdimm-y += region.o libnvdimm-y += namespace_devs.o libnvdimm-y += label.o +libnvdimm-y += badrange.o libnvdimm-$(CONFIG_ND_CLAIM) += claim.o libnvdimm-$(CONFIG_BTT) += btt_devs.o libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c new file mode 100644 index 000..0b67dcf --- /dev/null +++ b/drivers/nvdimm/badrange.c @@ -0,0 +1,294 @@ +/* + * Copyright(c) 2017 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "nd-core.h" +#include "nd.h" + +void badrange_init(struct badrange *badrange) +{ + INIT_LIST_HEAD(>list); + spin_lock_init(>lock); +} +EXPORT_SYMBOL_GPL(badrange_init); + +static void append_badrange_entry(struct badrange *badrange, + struct badrange_entry *bre, u64 addr, u64 length) +{ + lockdep_assert_held(>lock); + bre->start = addr; + bre->length = length; + list_add_tail(>list, >list); +} + +static int alloc_and_append_badrange_entry(struct badrange *badrange, + u64 addr, u64 length, gfp_t flags) +{ + struct badrange_entry *bre; + + bre = kzalloc(sizeof(*bre), flags); + if (!bre) + return -ENOMEM; + + append_badrange_entry(badrange, bre, addr, length); + return 0; +} + +static int add_badrange(struct badrange *badrange, u64 addr, u64 length) +{ + struct badrange_entry *bre, *bre_new; + + spin_unlock(>lock); + bre_new = kzalloc(sizeof(*bre_new), GFP_KERNEL); + spin_lock(>lock); + + if (list_empty(>list)) { + if (!bre_new) + return -ENOMEM; + append_badrange_entry(badrange, bre_new, addr, length); + return 0; + } + + /* +* There is a chance this is a duplicate, check for those first. +* This will be the common case as ARS_STATUS returns all known +* errors in the SPA space, and we
[PATCH v2 3/4] libnvdimm, badrange: remove a WARN for list_empty
Now that we're reusing the badrange functions for nfit_test, and that exposes badrange injection/clearing to userspace via the DSM paths, it is plausible that a user may call the clear DSM multiple times. Since it is harmless to do so, we can remove the WARN in badrange_forget. Cc: Dave JiangCc: Dan Williams Signed-off-by: Vishal Verma --- drivers/nvdimm/badrange.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c index 0b67dcf..e068d72 100644 --- a/drivers/nvdimm/badrange.c +++ b/drivers/nvdimm/badrange.c @@ -114,7 +114,6 @@ void badrange_forget(struct badrange *badrange, phys_addr_t start, struct badrange_entry *bre, *next; spin_lock(>lock); - WARN_ON_ONCE(list_empty(badrange_list)); /* * [start, clr_end] is the badrange interval being cleared. -- 2.9.5 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/4] nfit_test: add error injection DSMs
On Mon, Oct 9, 2017 at 1:25 PM, Verma, Vishal Lwrote: > On Sat, 2017-10-07 at 10:07 -0700, Dan Williams wrote: >> On Thu, Oct 5, 2017 at 6:53 PM, Vishal Verma > > wrote: > > [] > >> > NFIT_ARS_TIMEOUT = 90, >> > + NFIT_ARS_INJECT_INVALID = 2, >> >> If there is no usage of this define outside of nfit_test then it >> should be defined in tools/testing/nvdimm/test/nfit.h. > > Yes moved. > >> >> > }; >> > >> > enum nfit_root_notifiers { >> > diff --git a/tools/testing/nvdimm/Kbuild >> > b/tools/testing/nvdimm/Kbuild >> > index d870520..3272ab5 100644 >> > --- a/tools/testing/nvdimm/Kbuild >> > +++ b/tools/testing/nvdimm/Kbuild >> > @@ -69,6 +69,7 @@ libnvdimm-y += $(NVDIMM_SRC)/region_devs.o >> > libnvdimm-y += $(NVDIMM_SRC)/region.o >> > libnvdimm-y += $(NVDIMM_SRC)/namespace_devs.o >> > libnvdimm-y += $(NVDIMM_SRC)/label.o >> > +libnvdimm-y += $(NVDIMM_SRC)/badrange.o >> >> This hunk should go in the same patch that changes >> drivers/nvdimm/Makefile to add badrange.o so we don't have an >> nfit_test compile breakage gap between commits. > > The kernel's normal usage already has the kbuild change in the right > place (patch 1). This adds the use of badrange functions to nfit_test > for the first time, and hence the kbuild addition here. It's not the nfit_test usages I'm worried about. The internal calls in libnvdimm will break. Try building tools/testing/nvdimm/libnvdimm.ko with only patch1 applied. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 1/4] libnvdimm: move poison list functions to a new 'badrange' file
On Fri, 2017-10-06 at 11:52 -0700, Dan Williams wrote: > On Thu, Oct 5, 2017 at 6:53 PM, Vishal Verma> wrote: > > From: Dave Jiang > > > > From: Dave Jiang > > > > nfit_test needs to use the poison list manipulation code as well. > > Make > > it more generic and in the process rename poison to badrange, and > > move > > all the related helpers to a new file. > > > > Signed-off-by: Dave Jiang > > [vishal: add a missed include in bus.c for the new badrange > > functions] > > Signed-off-by: Vishal Verma > > --- > > drivers/acpi/nfit/core.c | 2 +- > > drivers/acpi/nfit/mce.c | 2 +- > > drivers/nvdimm/Makefile | 1 + > > drivers/nvdimm/badrange.c | 294 > > ++ > > drivers/nvdimm/bus.c | 24 ++-- > > drivers/nvdimm/core.c | 260 + > > --- > > drivers/nvdimm/nd-core.h | 3 +- > > drivers/nvdimm/nd.h | 6 - > > include/linux/libnvdimm.h | 21 +++- > > 9 files changed, 331 insertions(+), 282 deletions(-) > > create mode 100644 drivers/nvdimm/badrange.c > > > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > > index a3ecd5e..4b157f8 100644 > > --- a/drivers/acpi/nfit/core.c > > +++ b/drivers/acpi/nfit/core.c > > @@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct > > acpi_nfit_desc *acpi_desc, > > if (ars_status->out_length > > < 44 + sizeof(struct nd_ars_record) > > * (i + 1)) > > break; > > - rc = nvdimm_bus_add_poison(nvdimm_bus, > > + rc = nvdimm_bus_add_badrange(nvdimm_bus, > > ars_status->records[i].err_address, > > ars_status->records[i].length); > > if (rc) > > diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c > > index feeb95d..b929214 100644 > > --- a/drivers/acpi/nfit/mce.c > > +++ b/drivers/acpi/nfit/mce.c > > @@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block > > *nb, unsigned long val, > > continue; > > > > /* If this fails due to an -ENOMEM, there is little > > we can do */ > > - nvdimm_bus_add_poison(acpi_desc->nvdimm_bus, > > + nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus, > > ALIGN(mce->addr, L1_CACHE_BYTES), > > L1_CACHE_BYTES); > > nvdimm_region_notify(nfit_spa->nd_region, > > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile > > index 909554c..ca6d325 100644 > > --- a/drivers/nvdimm/Makefile > > +++ b/drivers/nvdimm/Makefile > > @@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o > > libnvdimm-y += region.o > > libnvdimm-y += namespace_devs.o > > libnvdimm-y += label.o > > +libnvdimm-y += badrange.o > > libnvdimm-$(CONFIG_ND_CLAIM) += claim.o > > libnvdimm-$(CONFIG_BTT) += btt_devs.o > > libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o > > diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c > > new file mode 100644 > > index 000..6ad782f > > --- /dev/null > > +++ b/drivers/nvdimm/badrange.c > > @@ -0,0 +1,294 @@ > > +/* > > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > > + * > > + * This program is free software; you can redistribute it and/or > > modify > > + * it under the terms of version 2 of the GNU General Public > > License as > > + * published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > but > > + * WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > GNU > > + * General Public License for more details. > > + */ > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "nd-core.h" > > +#include "nd.h" > > + > > +void badrange_init(struct badrange *badrange) > > +{ > > + INIT_LIST_HEAD(>list); > > + spin_lock_init(>lock); > > +} > > +EXPORT_SYMBOL_GPL(badrange_init); > > + > > +static void append_badrange_entry(struct badrange *badrange, > > + struct badrange_entry *be, u64 addr, u64 length) > > +{ > > + lockdep_assert_held(>lock); > > + be->start = addr; > > + be->length = length; > > + list_add_tail(>list, >list); > > +} > > Small nit, can we rename the instance variable from 'be' to 'bre'? > 'be' triggers all my 'big endian' neurons to fire. Other than that, > looks good. Yep done. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()
On Mon, Oct 9, 2017 at 12:18 PM, Jason Gunthorpewrote: > On Mon, Oct 09, 2017 at 12:05:30PM -0700, Dan Williams wrote: >> On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpe >> wrote: >> > On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote: >> >> otherwise be quiesced. The need for this knowledge is driven by a need >> >> to make RDMA transfers to DAX mappings safe. If the DAX file's block map >> >> changes we need to be to reliably stop accesses to blocks that have been >> >> freed or re-assigned to a new file. >> > >> > If RDMA is driving this need, why not invalidate backing RDMA MRs >> > instead of requiring a IOMMU to do it? RDMA MR are finer grained and >> > do not suffer from the re-use problem David W. brought up with IOVAs.. >> >> Sounds promising. All I want in the end is to be sure that the kernel >> is enabled to stop any in-flight RDMA at will without asking >> userspace. Does this require per-RDMA driver opt-in or is there a >> common call that can be made? > > I don't think this has ever come up in the context of an all-device MR > invalidate requirement. Drivers already have code to invalidate > specifc MRs, but to find all MRs that touch certain pages and then > invalidate them would be new code. > > We also have ODP aware drivers that can retarget a MR to new > physical pages. If the block map changes DAX should synchronously > retarget the ODP MR, not halt DMA. Have a look at the patch [1], I don't touch the ODP path. > Most likely ODP & DAX would need to be used together to get robust > user applications, as having the user QP's go to an error state at > random times (due to DMA failures) during operation is never going to > be acceptable... It's not random. The process that set up the mapping and registered the memory gets SIGIO when someone else tries to modify the file map. That process then gets /proc/sys/fs/lease-break-time seconds to fix the problem before the kernel force revokes the DMA access. It's otherwise not acceptable to allow DMA into random locations when the file map changes. > Perhaps you might want to initially only support ODP MR mappings with > DAX and then the DMA fencing issue goes away? I'd rather try to fix the non-ODP DAX case instead of just turning it off. [1]: https://patchwork.kernel.org/patch/9991681/ ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()
On Mon, Oct 09, 2017 at 12:05:30PM -0700, Dan Williams wrote: > On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpe >wrote: > > On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote: > >> otherwise be quiesced. The need for this knowledge is driven by a need > >> to make RDMA transfers to DAX mappings safe. If the DAX file's block map > >> changes we need to be to reliably stop accesses to blocks that have been > >> freed or re-assigned to a new file. > > > > If RDMA is driving this need, why not invalidate backing RDMA MRs > > instead of requiring a IOMMU to do it? RDMA MR are finer grained and > > do not suffer from the re-use problem David W. brought up with IOVAs.. > > Sounds promising. All I want in the end is to be sure that the kernel > is enabled to stop any in-flight RDMA at will without asking > userspace. Does this require per-RDMA driver opt-in or is there a > common call that can be made? I don't think this has ever come up in the context of an all-device MR invalidate requirement. Drivers already have code to invalidate specifc MRs, but to find all MRs that touch certain pages and then invalidate them would be new code. We also have ODP aware drivers that can retarget a MR to new physical pages. If the block map changes DAX should synchronously retarget the ODP MR, not halt DMA. Most likely ODP & DAX would need to be used together to get robust user applications, as having the user QP's go to an error state at random times (due to DMA failures) during operation is never going to be acceptable... Perhaps you might want to initially only support ODP MR mappings with DAX and then the DMA fencing issue goes away? Cheers, Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()
On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpewrote: > On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote: >> otherwise be quiesced. The need for this knowledge is driven by a need >> to make RDMA transfers to DAX mappings safe. If the DAX file's block map >> changes we need to be to reliably stop accesses to blocks that have been >> freed or re-assigned to a new file. > > If RDMA is driving this need, why not invalidate backing RDMA MRs > instead of requiring a IOMMU to do it? RDMA MR are finer grained and > do not suffer from the re-use problem David W. brought up with IOVAs.. Sounds promising. All I want in the end is to be sure that the kernel is enabled to stop any in-flight RDMA at will without asking userspace. Does this require per-RDMA driver opt-in or is there a common call that can be made? Outside of that the re-use problem is already solved by just unmapping (iommu_unmap()) the IOVA, but keeping it allocated until the eventual dma_unmap_sg() at memory un-registration time frees it. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()
On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote: > otherwise be quiesced. The need for this knowledge is driven by a need > to make RDMA transfers to DAX mappings safe. If the DAX file's block map > changes we need to be to reliably stop accesses to blocks that have been > freed or re-assigned to a new file. If RDMA is driving this need, why not invalidate backing RDMA MRs instead of requiring a IOMMU to do it? RDMA MR are finer grained and do not suffer from the re-use problem David W. brought up with IOVAs.. Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v8] dma-mapping: introduce dma_get_iommu_domain()
On Mon, Oct 9, 2017 at 3:37 AM, Robin Murphywrote: > Hi Dan, > > On 08/10/17 04:45, Dan Williams wrote: >> Add a dma-mapping api helper to retrieve the generic iommu_domain for a >> device. >> The motivation for this interface is making RDMA transfers to DAX mappings >> safe. If the DAX file's block map changes we need to be to reliably stop >> accesses to blocks that have been freed or re-assigned to a new file. > > ...which is also going to require some way to force the IOMMU drivers > (on x86 at least) to do a fully-synchronous unmap, instead of just > throwing the IOVA onto a flush queue to invalidate the TLBs at some > point in the future. Isn't that the difference between iommu_unmap() and iommu_unmap_fast()? As far as I can tell amd-iommu and intel-iommu both flush iotlbs on iommu_unmap() and don't support fast unmaps. > Assuming of course that there's an IOMMU both > present and performing DMA translation in the first place. That's why I want to call through the dma api to see if the iommu is being used to satisfy dma mappings. >> With the >> iommu_domain and a callback from the DAX filesystem the kernel can safely >> revoke access to a DMA device. The process that performed the RDMA memory >> registration is also notified of this revocation event, but the kernel can >> not >> otherwise be in the position of waiting for userspace to quiesce the device. > > OK, but why reinvent iommu_get_domain_for_dev()? How do I know if the iommu returned from that routine is the one being used for dma mapping operations for the device? Specifically, how would I discover that the result of dma_map_sg() can be passed as an IOVA range to iommu_unmap()? >> Since PMEM+DAX is currently only enabled for x86, we only update the x86 >> iommu drivers. > > Note in particular that those two drivers happen to be the *only* place > this approach could work - everyone else is going to have to fall back > to the generic IOMMU API function anyway. I want to make this functionality generic, but I'm not familiar with the iommu sub-system. How are dma mapping operations routed to the iommu driver in those other imlementations? ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 09/12] xfs: wire up ->lease_direct()
On Sun, Oct 8, 2017 at 8:45 PM, Dave Chinnerwrote: > On Fri, Oct 06, 2017 at 03:36:06PM -0700, Dan Williams wrote: >> A 'lease_direct' lease requires that the vma have a valid MAP_DIRECT >> mapping established. For xfs we establish a new lease and then check if >> the MAP_DIRECT mapping has been broken. We want to be sure that the >> process will receive notification that the MAP_DIRECT mapping is being >> torn down so it knows why other code paths are throwing failures. >> >> For example in the RDMA/ibverbs case we want ibv_reg_mr() to fail if the >> MAP_DIRECT mapping is invalid or in the process of being invalidated. >> >> Cc: Jan Kara >> Cc: Jeff Moyer >> Cc: Christoph Hellwig >> Cc: Dave Chinner >> Cc: "Darrick J. Wong" >> Cc: Ross Zwisler >> Cc: Jeff Layton >> Cc: "J. Bruce Fields" >> Signed-off-by: Dan Williams >> --- >> fs/xfs/xfs_file.c | 28 >> 1 file changed, 28 insertions(+) >> >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index e35518600e28..823b65f17429 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -1166,6 +1166,33 @@ xfs_filemap_direct_close( >> put_map_direct_vma(vma->vm_private_data); >> } >> >> +static struct lease_direct * >> +xfs_filemap_direct_lease( >> + struct vm_area_struct *vma, >> + void(*break_fn)(void *), >> + void*owner) >> +{ >> + struct lease_direct *ld; >> + >> + ld = map_direct_lease(vma, break_fn, owner); >> + >> + if (IS_ERR(ld)) >> + return ld; >> + >> + /* >> + * We now have an established lease while the base MAP_DIRECT >> + * lease was not broken. So, we know that the "lease holder" will >> + * receive a SIGIO notification when the lease is broken and >> + * take any necessary cleanup actions. >> + */ >> + if (!is_map_direct_broken(vma->vm_private_data)) >> + return ld; >> + >> + map_direct_lease_destroy(ld); >> + >> + return ERR_PTR(-ENXIO); > > What's any of this got to do with XFS? Shouldn't it be in generic > code, and called generic_filemap_direct_lease()? True, I can move this to generic code. The filesystem is in charge of where it wants to store the 'struct map_direct_state' context, but for generic_filemap_direct_lease() it can just assume that it is stored in ->vm_private_data. I'll add comments to this effect on the new routine. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT
On Sun, Oct 8, 2017 at 8:40 PM, Dave Chinnerwrote: Thanks for the review Dave. > On Fri, Oct 06, 2017 at 03:35:49PM -0700, Dan Williams wrote: >> MAP_DIRECT is an mmap(2) flag with the following semantics: >> >> MAP_DIRECT >> When specified with MAP_SHARED_VALIDATE, sets up a file lease with the >> same lifetime as the mapping. Unlike a typical F_RDLCK lease this lease >> is broken when a "lease breaker" attempts to write(2), change the block >> map (fallocate), or change the size of the file. Otherwise the mechanism >> of a lease break is identical to the typical lease break case where the >> lease needs to be removed (munmap) within the number of seconds >> specified by /proc/sys/fs/lease-break-time. If the lease holder fails to >> remove the lease in time the kernel will invalidate the mapping and >> force all future accesses to the mapping to trigger SIGBUS. >> >> In addition to lease break timeouts causing faults in the mapping to >> result in SIGBUS, other states of the file will trigger SIGBUS at fault >> time: >> >> * The file is not DAX capable >> * The file has reflinked (copy-on-write) blocks >> * The fault would trigger the filesystem to allocate blocks >> * The fault would trigger the filesystem to perform extent conversion >> >> In other words, MAP_DIRECT expects and enforces a fully allocated file >> where faults can be satisfied without modifying block map metadata. >> >> An unprivileged process may establish a MAP_DIRECT mapping on a file >> whose UID (owner) matches the filesystem UID of the process. A process >> with the CAP_LEASE capability may establish a MAP_DIRECT mapping on >> arbitrary files >> >> ERRORS >> EACCES Beyond the typical mmap(2) conditions that trigger EACCES >> MAP_DIRECT also requires the permission to set a file lease. >> >> EOPNOTSUPP The filesystem explicitly does not support the flag >> >> SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that >> might require block-map updates, or the lease timed out and the >> kernel invalidated the mapping. >> >> Cc: Jan Kara >> Cc: Arnd Bergmann >> Cc: Jeff Moyer >> Cc: Christoph Hellwig >> Cc: Dave Chinner >> Cc: Alexander Viro >> Cc: "Darrick J. Wong" >> Cc: Ross Zwisler >> Cc: Jeff Layton >> Cc: "J. Bruce Fields" >> Signed-off-by: Dan Williams >> --- >> fs/xfs/Kconfig |2 - >> fs/xfs/xfs_file.c | 102 >> +++ >> include/linux/mman.h|3 + >> include/uapi/asm-generic/mman.h |1 >> 4 files changed, 106 insertions(+), 2 deletions(-) >> >> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig >> index f62fc6629abb..f8765653a438 100644 >> --- a/fs/xfs/Kconfig >> +++ b/fs/xfs/Kconfig >> @@ -112,4 +112,4 @@ config XFS_ASSERT_FATAL >> >> config XFS_LAYOUT >> def_bool y >> - depends on EXPORTFS_BLOCK_OPS >> + depends on EXPORTFS_BLOCK_OPS || FS_DAX >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index ebdd0bd2b261..e35518600e28 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -40,12 +40,22 @@ >> #include "xfs_iomap.h" >> #include "xfs_reflink.h" >> >> +#include >> #include >> #include >> #include >> +#include >> #include >> >> static const struct vm_operations_struct xfs_file_vm_ops; >> +static const struct vm_operations_struct xfs_file_vm_direct_ops; >> + >> +static inline bool >> +is_xfs_map_direct( >> + struct vm_area_struct *vma) >> +{ >> + return vma->vm_ops == _file_vm_direct_ops; >> +} > > Namespacing (xfs_vma_is_direct) and whitespace damage. Will fix. > >> >> /* >> * Clear the specified ranges to zero through either the pagecache or DAX. >> @@ -1008,6 +1018,26 @@ xfs_file_llseek( >> return vfs_setpos(file, offset, inode->i_sb->s_maxbytes); >> } >> >> +static int >> +xfs_vma_checks( >> + struct vm_area_struct *vma, >> + struct inode*inode) > > Exactly what are we checking for - function name doesn't tell me, > and there's no comments, either? Ok, I'll improve this. > >> +{ >> + if (!is_xfs_map_direct(vma)) >> + return 0; >> + >> + if (!is_map_direct_valid(vma->vm_private_data)) >> + return VM_FAULT_SIGBUS; >> + >> + if (xfs_is_reflink_inode(XFS_I(inode))) >> + return VM_FAULT_SIGBUS; >> + >> + if (!IS_DAX(inode)) >> + return VM_FAULT_SIGBUS; > > And how do we get is_xfs_map_direct() set to true if we don't have a > DAX inode or the inode has shared extents? So, this was my way of trying to satisfy the request you made here: https://lkml.org/lkml/2017/8/11/876 i.e. allow MAP_DIRECT on non-dax files
Re: [ndctl PATCH 5/8] ndctl: add an inject-error command
On Thu, Oct 5, 2017 at 6:54 PM, Vishal Vermawrote: > Add an inject-error command to ndctl. This uses the error injection DSMs > in ACPI6.2 to provide a generic error injection and management > interface. Once can inject errors, and view as well as clear injected > errors using these commands. > > Cc: Dan Williams > Signed-off-by: Vishal Verma > --- > Documentation/ndctl/Makefile.am| 1 + > Documentation/ndctl/ndctl-inject-error.txt | 108 + > Documentation/ndctl/ndctl.txt | 1 + > builtin.h | 1 + > contrib/ndctl | 5 +- > ndctl/Makefile.am | 3 +- > ndctl/inject-error.c | 745 > + > ndctl/libndctl-nfit.h | 8 + > ndctl/ndctl.c | 1 + > util/json.c| 26 + > util/json.h| 3 + > util/size.h| 1 + > 12 files changed, 901 insertions(+), 2 deletions(-) > create mode 100644 Documentation/ndctl/ndctl-inject-error.txt > create mode 100644 ndctl/inject-error.c > > diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am > index 229d908..615baf0 100644 > --- a/Documentation/ndctl/Makefile.am > +++ b/Documentation/ndctl/Makefile.am > @@ -30,6 +30,7 @@ man1_MANS = \ > ndctl-create-namespace.1 \ > ndctl-destroy-namespace.1 \ > ndctl-check-namespace.1 \ > + ndctl-inject-error.1 \ > ndctl-list.1 > > CLEANFILES = $(man1_MANS) > diff --git a/Documentation/ndctl/ndctl-inject-error.txt > b/Documentation/ndctl/ndctl-inject-error.txt > new file mode 100644 > index 000..bd9e197 > --- /dev/null > +++ b/Documentation/ndctl/ndctl-inject-error.txt > @@ -0,0 +1,108 @@ > +ndctl-inject-error(1) > += > + > +NAME > + > +ndctl-inject-error - inject media errors at a namespace offset > + > +SYNOPSIS > + > +[verse] > +'ndctl inject-error' [] > + > +include::namespace-description.txt[] > + > +ndctl-inject-error can be used to ask the platform to simulate media errors > +in the nvdimm address space to aid debugging and development of features > +related to error handling. > + > +WARNING: These commands are DANGEROUS and can cause data loss. They are > +only provided for testing and debugging purposes. > + > +EXAMPLES > + > + > +Inject errors in namespace0.0 at sector 12 for a 2 sectors (i.e. 12, 13) > +[verse] > +ndctl inject-error --sector=12 --count=2 namespace0.0 > + > +Check status of injected errors on namespace0.0 > +[verse] > +ndctl inject-error --status namespacce0.0 > + > +Clear the injected errors at sector 12 for 2 sectors on namespace0.0 > +[verse] > +ndctl inject-error --clear --sector=12 --count=2 namespacce0.0 > + > +OPTIONS > +--- > +-S:: > +--sector=:: > + Namespace sector offset in 512 byte sized sectors where the error is > + to be injected. Let's use the term "block" instead of "sector" since the --media-error json in ndctl list reports bad 'blocks' and the kernel interfaces use 'block'. > + > + NOTE: The offset is interpreted in different ways based on the "mode" > + of the namespace. For "raw" mode, the offset is the base namespace > + offset. For "memory" mode (i.e. a "pfn" namespace), the offset is > + relative to the user-visible part of the namespace, and the offset > + introduced by the kernel's metadata will be accounted for. For a > + "sector" mode namespace (i.e. a "BTT" namespace), the offset is > + relative to the base namespace, as the BTT translation details are > + internal to the kernel, and can't be accounted for while injecting > + errors. > + > +-c:: > +--count=:: > + Number of sectors to inject as errors. This is also in terms of fixed, > + 512 byte sectors. > + > +-d:: > +--clear:: How about "--uninject"? > + This option will ask the platform to clear any injected errors for the > + specified sector offset, and count. > + > + WARNING: This will not clear the kernel's internal "badrange" and > + "badblock" tracking - those can only be cleared by doing a write to badrange is a kernel internal implementation detail. So we can just say "This will not clear the kernel's internal bad block tracking" > + the affected locations. Hence use the --clear option only if you know > + exactly what you are doing. For normal usage, injected errors should > + only be cleared by doing writes. Do not expect have the original data > + intact after injecting an error, and clearing it using --clear - it > + will be lost, as the only "real" way to clear the error location is > + to write to it or zero it (truncate/hole-punch). > + > +-t:: > +--status:: "--query"?
Re: [PATCH 4/4] nfit_test: when clearing poison, also remove badrange entries
On 10/05/2017 06:53 PM, Vishal Verma wrote: > The injected badrange entries can only be cleared from the kernel's > accounting by writing to the affected blocks, so when such a write sends > the clear error DSM to nfit_test, also clear the ranges from > nfit_test's badrange list. This lets an 'ARS Inject error status' DSM to > return the correct status, omitting the cleared ranges. > > Cc: Dave Jiang> Cc: Dan Williams > Signed-off-by: Vishal Verma Reviewed-by: Dave Jiang > --- > tools/testing/nvdimm/test/nfit.c | 12 > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/tools/testing/nvdimm/test/nfit.c > b/tools/testing/nvdimm/test/nfit.c > index 43f948e..e400418 100644 > --- a/tools/testing/nvdimm/test/nfit.c > +++ b/tools/testing/nvdimm/test/nfit.c > @@ -344,7 +344,8 @@ static int nfit_test_cmd_ars_status(struct ars_state > *ars_state, > return 0; > } > > -static int nfit_test_cmd_clear_error(struct nd_cmd_clear_error *clear_err, > +static int nfit_test_cmd_clear_error(struct nfit_test *t, > + struct nd_cmd_clear_error *clear_err, > unsigned int buf_len, int *cmd_rc) > { > const u64 mask = NFIT_TEST_CLEAR_ERR_UNIT - 1; > @@ -354,12 +355,7 @@ static int nfit_test_cmd_clear_error(struct > nd_cmd_clear_error *clear_err, > if ((clear_err->address & mask) || (clear_err->length & mask)) > return -EINVAL; > > - /* > - * Report 'all clear' success for all commands even though a new > - * scrub will find errors again. This is enough to have the > - * error removed from the 'badblocks' tracking in the pmem > - * driver. > - */ > + badrange_forget(>badrange, clear_err->address, clear_err->length); > clear_err->status = 0; > clear_err->cleared = clear_err->length; > *cmd_rc = 0; > @@ -687,7 +683,7 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor > *nd_desc, > cmd_rc); > break; > case ND_CMD_CLEAR_ERROR: > - rc = nfit_test_cmd_clear_error(buf, buf_len, cmd_rc); > + rc = nfit_test_cmd_clear_error(t, buf, buf_len, cmd_rc); > break; > default: > return -ENOTTY; > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [ndctl PATCH] ndctl, test: rdma vs dax
On Mon, Oct 9, 2017 at 1:07 AM, Johannes Thumshirnwrote: > On Sat, Oct 07, 2017 at 08:14:42AM -0700, Dan Williams wrote: > [...] > >> +rxe_cfg stop >> +rxe_cfg start >> +if ! rxe_cfg status | grep -n rxe0; then >> + rxe_cfg add eth0 >> +fi > > Can we maybe skip the dependency on rxe_cfg? All that is needed is modprobe > and echo. Sure, I'll take a look. > Also hard coding eth0 might be problematic in this case. This works > on your test-setup but surely isn't portable. Yes, which is part of the reason I have this listed under the "destructive" tests. Any advice on how to make it portable would be appreciated. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RFC 00/16] NOVA: a new file system for persistent memory
On Thu, Aug 3, 2017 at 9:48 AM, Steven Swansonwrote: > This is an RFC patch series that impements NOVA (NOn-Volatile memory > Accelerated file system), a new file system built for PMEM. Hi, Thanks for posting. I read the paper and the design looks nice. Then I looked at the patches, but could not find a place to start, nor something I could actually try out. So let me suggest some ways to make this more reviewer/tester friendly: 1) try starting with something very simple yet working and supporting the final layout - no optimizations (one big lock, no per-cpu data, rcu, numa, etc support) - no support for optional features (checksumming, NFS export, etc) - missing mandatory features (e.g. just readdir and getattr support) - try and get it down to <5k lines, preferably 2-3k 2) pointer to sources and instructions for trying it out without special hardware 3) build on this minimal working version by - adding mandatory features - then adding optimizations 4) each patch should leave the tree in a compiling and working state but should be small and easily reviewed 5) leave optional features and unimportant optimizations for a later submission; try to make the patchset as small as you meaningfully can (i.e. it should be fully working and demonstrate the capabilities and performance, but nothing more). Thanks, Miklos ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] Fix mpage_writepage() for pages with buffers
Acked-by: Johannes Thumshirn-- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v8] dma-mapping: introduce dma_get_iommu_domain()
Hi Dan, On 08/10/17 04:45, Dan Williams wrote: > Add a dma-mapping api helper to retrieve the generic iommu_domain for a > device. > The motivation for this interface is making RDMA transfers to DAX mappings > safe. If the DAX file's block map changes we need to be to reliably stop > accesses to blocks that have been freed or re-assigned to a new file. ...which is also going to require some way to force the IOMMU drivers (on x86 at least) to do a fully-synchronous unmap, instead of just throwing the IOVA onto a flush queue to invalidate the TLBs at some point in the future. Assuming of course that there's an IOMMU both present and performing DMA translation in the first place. > With the > iommu_domain and a callback from the DAX filesystem the kernel can safely > revoke access to a DMA device. The process that performed the RDMA memory > registration is also notified of this revocation event, but the kernel can not > otherwise be in the position of waiting for userspace to quiesce the device. OK, but why reinvent iommu_get_domain_for_dev()? > Since PMEM+DAX is currently only enabled for x86, we only update the x86 > iommu drivers. Note in particular that those two drivers happen to be the *only* place this approach could work - everyone else is going to have to fall back to the generic IOMMU API function anyway. Robin. > Cc: Marek Szyprowski> Cc: Robin Murphy > Cc: Greg Kroah-Hartman > Cc: Joerg Roedel > Cc: David Woodhouse > Cc: Ashok Raj > Cc: Jan Kara > Cc: Jeff Moyer > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: "Darrick J. Wong" > Cc: Ross Zwisler > Signed-off-by: Dan Williams > --- > Changes since v7: > * retrieve the iommu_domain so that we can later pass the results of > dma_map_* to iommu_unmap() in advance of the actual dma_unmap_*. > > drivers/base/dma-mapping.c | 10 ++ > drivers/iommu/amd_iommu.c | 10 ++ > drivers/iommu/intel-iommu.c | 15 +++ > include/linux/dma-mapping.h |3 +++ > 4 files changed, 38 insertions(+) > > diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c > index e584eddef0a7..fdb9764f95a4 100644 > --- a/drivers/base/dma-mapping.c > +++ b/drivers/base/dma-mapping.c > @@ -369,3 +369,13 @@ void dma_deconfigure(struct device *dev) > of_dma_deconfigure(dev); > acpi_dma_deconfigure(dev); > } > + > +struct iommu_domain *dma_get_iommu_domain(struct device *dev) > +{ > + const struct dma_map_ops *ops = get_dma_ops(dev); > + > + if (ops && ops->get_iommu) > + return ops->get_iommu(dev); > + return NULL; > +} > +EXPORT_SYMBOL(dma_get_iommu_domain); > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 51f8215877f5..c8e1a45af182 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2271,6 +2271,15 @@ static struct protection_domain *get_domain(struct > device *dev) > return domain; > } > > +static struct iommu_domain *amd_dma_get_iommu(struct device *dev) > +{ > + struct protection_domain *domain = get_domain(dev); > + > + if (IS_ERR(domain)) > + return NULL; > + return >domain; > +} > + > static void update_device_table(struct protection_domain *domain) > { > struct iommu_dev_data *dev_data; > @@ -2689,6 +2698,7 @@ static const struct dma_map_ops amd_iommu_dma_ops = { > .unmap_sg = unmap_sg, > .dma_supported = amd_iommu_dma_supported, > .mapping_error = amd_iommu_mapping_error, > + .get_iommu = amd_dma_get_iommu, > }; > > static int init_reserved_iova_ranges(void) > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 6784a05dd6b2..f3f4939cebad 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -3578,6 +3578,20 @@ static int iommu_no_mapping(struct device *dev) > return 0; > } > > +static struct iommu_domain *intel_dma_get_iommu(struct device *dev) > +{ > + struct dmar_domain *domain; > + > + if (iommu_no_mapping(dev)) > + return NULL; > + > + domain = get_valid_domain_for_dev(dev); > + if (!domain) > + return NULL; > + > + return >domain; > +} > + > static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, >size_t size, int dir, u64 dma_mask) > { > @@ -3872,6 +3886,7 @@ const struct dma_map_ops intel_dma_ops = { > .map_page = intel_map_page, > .unmap_page = intel_unmap_page, > .mapping_error = intel_mapping_error, > + .get_iommu = intel_dma_get_iommu, > #ifdef CONFIG_X86 > .dma_supported = x86_dma_supported, > #endif > diff --git a/include/linux/dma-mapping.h
Re: [ndctl PATCH] ndctl, test: rdma vs dax
On Sat, Oct 07, 2017 at 08:14:42AM -0700, Dan Williams wrote: [...] > +rxe_cfg stop > +rxe_cfg start > +if ! rxe_cfg status | grep -n rxe0; then > + rxe_cfg add eth0 > +fi Can we maybe skip the dependency on rxe_cfg? All that is needed is modprobe and echo. Also hard coding eth0 might be problematic in this case. This works on your test-setup but surely isn't portable. Byte, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Returned mail: see transcript for details
The original message was received at Mon, 9 Oct 2017 15:19:27 +0800 from lists.01.org [150.243.241.146] - The following addresses had permanent fatal errors -___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm