Re: [ndctl PATCH 6/8] ndctl/test: add a new unit test for inject-error

2017-10-09 Thread Verma, Vishal L
On Thu, 2017-10-05 at 19:54 -0600, Vishal Verma wrote:
> +do_tests()
> +{
> +   # TODO
> +   return
> +}

Not sure how this happened, but I managed to include a stale version of
this file or something, that only has the boilerplate. I'll fix up for
the next rev.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT

2017-10-09 Thread Dave Chinner
On Mon, Oct 09, 2017 at 10:08:40AM -0700, Dan Williams wrote:
> On Sun, Oct 8, 2017 at 8:40 PM, Dave Chinner  wrote:
> >>
> >>  /*
> >>   * Clear the specified ranges to zero through either the pagecache or DAX.
> >> @@ -1008,6 +1018,26 @@ xfs_file_llseek(
> >>   return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> >>  }
> >>
> >> +static int
> >> +xfs_vma_checks(
> >> + struct vm_area_struct   *vma,
> >> + struct inode*inode)
> >
> > Exactly what are we checking for - function name doesn't tell me,
> > and there's no comments, either?
> 
> Ok, I'll improve this.
> 
> >
> >> +{
> >> + if (!is_xfs_map_direct(vma))
> >> + return 0;
> >> +
> >> + if (!is_map_direct_valid(vma->vm_private_data))
> >> + return VM_FAULT_SIGBUS;
> >> +
> >> + if (xfs_is_reflink_inode(XFS_I(inode)))
> >> + return VM_FAULT_SIGBUS;
> >> +
> >> + if (!IS_DAX(inode))
> >> + return VM_FAULT_SIGBUS;
> >
> > And how do we get is_xfs_map_direct() set to true if we don't have a
> > DAX inode or the inode has shared extents?
> 
> So, this was my way of trying to satisfy the request you made here:
> 
> https://lkml.org/lkml/2017/8/11/876
> 
> i.e. allow MAP_DIRECT on non-dax files to enable a use case of
> freezing the block-map to examine which file extents are linked. If
> you don't want to use MAP_DIRECT for this, we can move these checks to
> mmap time.

Ok, but I don't want to use mmap to deal with this, nor do I care
whether DAX is in use or not. So I don't think this is really
necessary for MAP_DIRECT.


> >> +xfs_file_mmap_validate(
> >> + struct file *filp,
> >> + struct vm_area_struct   *vma,
> >> + unsigned long   map_flags,
> >> + int fd)
> >> +{
> >> + struct inode*inode = file_inode(filp);
> >> + struct xfs_inode*ip = XFS_I(inode);
> >> + struct map_direct_state *mds;
> >> +
> >> + if (map_flags & ~(XFS_MAP_SUPPORTED))
> >> + return -EOPNOTSUPP;
> >> +
> >> + if ((map_flags & MAP_DIRECT) == 0)
> >> + return xfs_file_mmap(filp, vma);
> >> +
> >> + file_accessed(filp);
> >> + vma->vm_ops = _file_vm_direct_ops;
> >> + if (IS_DAX(inode))
> >> + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
> >
> > And if it isn't a DAX inode? what is MAP_DIRECT supposed to do then?
> 
> In the non-DAX case it just takes the FL_LAYOUT file lease... although
> we could also just have an fcntl for that purpose. The use case of
> just freezing the block map does not need a mapping.

RIght, so I think we should just add a fcntl for the non-DAX case I
have in mind, and not complicate the MAP_DIRECT implementation right
now.  We can alsways extend the scope of MAP_DIRECT in future if we
actually need to do so.

> >> + mds = map_direct_register(fd, vma);
> >> + if (IS_ERR(mds))
> >> + return PTR_ERR(mds);
> >> +
> >> + /* flush in-flight faults */
> >> + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> >> + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> >
> > Urk. That's nasty. And why is it even necessary? Please explain why
> > this is necessary in the comment, because it's not at all obvious to
> > me...
> 
> This is related to your other observation about i_mapdcount and adding
> an iomap_can_allocate() helper. I think I can clean both of these up
> by using a call to break_layout(inode, false) and bailing in
> ->iomap_begin() if it returns EWOULDBLOCK. This would also fix the
> current problem that allocating write-faults don't start the lease
> break process.

OK.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 2/4] nfit_test: add error injection DSMs

2017-10-09 Thread Vishal Verma
From: Dave Jiang 

Add nfit_test emulation for the new ACPI 6.2 error injectino DSMs.
This will allow unit tests to selectively inject the errors they wish to
test for.

Signed-off-by: Dave Jiang 
[vishal: Move injection functions to ND_CMD_CALL]
[vishal: Add support for the notification option]
[vishal: move an nfit_test private definition into a local header]
Signed-off-by: Vishal Verma 
---
 tools/testing/nvdimm/test/nfit.c  | 187 +-
 tools/testing/nvdimm/test/nfit_test.h |   5 +
 2 files changed, 168 insertions(+), 24 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 435d298..43f948e 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -168,8 +168,12 @@ struct nfit_test {
spinlock_t lock;
} ars_state;
struct device *dimm_dev[NUM_DCR];
+   struct badrange badrange;
+   struct work_struct work;
 };
 
+static struct workqueue_struct *nfit_wq;
+
 static struct nfit_test *to_nfit_test(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -234,48 +238,68 @@ static int nfit_test_cmd_set_config_data(struct 
nd_cmd_set_config_hdr *nd_cmd,
return rc;
 }
 
-#define NFIT_TEST_ARS_RECORDS 4
 #define NFIT_TEST_CLEAR_ERR_UNIT 256
 
 static int nfit_test_cmd_ars_cap(struct nd_cmd_ars_cap *nd_cmd,
unsigned int buf_len)
 {
+   int ars_recs;
+
if (buf_len < sizeof(*nd_cmd))
return -EINVAL;
 
+   /* for testing, only store up to n records fit withint 4k */
+   ars_recs = SZ_4K / sizeof(struct nd_ars_record);
+
nd_cmd->max_ars_out = sizeof(struct nd_cmd_ars_status)
-   + NFIT_TEST_ARS_RECORDS * sizeof(struct nd_ars_record);
+   + ars_recs * sizeof(struct nd_ars_record);
nd_cmd->status = (ND_ARS_PERSISTENT | ND_ARS_VOLATILE) << 16;
nd_cmd->clear_err_unit = NFIT_TEST_CLEAR_ERR_UNIT;
 
return 0;
 }
 
-/*
- * Initialize the ars_state to return an ars_result 1 second in the future with
- * a 4K error range in the middle of the requested address range.
- */
-static void post_ars_status(struct ars_state *ars_state, u64 addr, u64 len)
+static void post_ars_status(struct ars_state *ars_state,
+   struct badrange *badrange, u64 addr, u64 len)
 {
struct nd_cmd_ars_status *ars_status;
struct nd_ars_record *ars_record;
+   struct badrange_entry *be;
+   u64 end = addr + len - 1;
+   int i = 0;
 
ars_state->deadline = jiffies + 1*HZ;
ars_status = ars_state->ars_status;
ars_status->status = 0;
-   ars_status->out_length = sizeof(struct nd_cmd_ars_status)
-   + sizeof(struct nd_ars_record);
ars_status->address = addr;
ars_status->length = len;
ars_status->type = ND_ARS_PERSISTENT;
-   ars_status->num_records = 1;
-   ars_record = _status->records[0];
-   ars_record->handle = 0;
-   ars_record->err_address = addr + len / 2;
-   ars_record->length = SZ_4K;
+
+   spin_lock(>lock);
+   list_for_each_entry(be, >list, list) {
+   u64 be_end = be->start + be->length - 1;
+   u64 rstart, rend;
+
+   /* skip entries outside the range */
+   if (be_end < addr || be->start > end)
+   continue;
+
+   rstart = (be->start < addr) ? addr : be->start;
+   rend = (be_end < end) ? be_end : end;
+   ars_record = _status->records[i];
+   ars_record->handle = 0;
+   ars_record->err_address = rstart;
+   ars_record->length = rend - rstart + 1;
+   i++;
+   }
+   spin_unlock(>lock);
+   ars_status->num_records = i;
+   ars_status->out_length = sizeof(struct nd_cmd_ars_status)
+   + i * sizeof(struct nd_ars_record);
 }
 
-static int nfit_test_cmd_ars_start(struct ars_state *ars_state,
+static int nfit_test_cmd_ars_start(struct nfit_test *t,
+   struct ars_state *ars_state,
struct nd_cmd_ars_start *ars_start, unsigned int buf_len,
int *cmd_rc)
 {
@@ -289,7 +313,7 @@ static int nfit_test_cmd_ars_start(struct ars_state 
*ars_state,
} else {
ars_start->status = 0;
ars_start->scrub_time = 1;
-   post_ars_status(ars_state, ars_start->address,
+   post_ars_status(ars_state, >badrange, ars_start->address,
ars_start->length);
*cmd_rc = 0;
}
@@ -456,6 +480,93 @@ static int nfit_test_cmd_smart_threshold(struct 
nd_cmd_smart_threshold *smart_t,
return 0;
 }
 
+static void uc_error_notify(struct work_struct *work)
+{
+   struct nfit_test *t = container_of(work, typeof(*t), work);
+
+   __acpi_nfit_notify(>pdev.dev, 

[PATCH v3 1/4] libnvdimm: move poison list functions to a new 'badrange' file

2017-10-09 Thread Vishal Verma
From: Dave Jiang 

nfit_test needs to use the poison list manipulation code as well. Make
it more generic and in the process rename poison to badrange, and move
all the related helpers to a new file.

Signed-off-by: Dave Jiang 
[vishal: Add badrange.o to nfit_test's Kbuild]
[vishal: add a missed include in bus.c for the new badrange functions]
[vishal: rename all instances of 'be' to 'bre']
Signed-off-by: Vishal Verma 
---
 drivers/acpi/nfit/core.c|   2 +-
 drivers/acpi/nfit/mce.c |   2 +-
 drivers/nvdimm/Makefile |   1 +
 drivers/nvdimm/badrange.c   | 294 
 drivers/nvdimm/bus.c|  24 ++--
 drivers/nvdimm/core.c   | 260 +--
 drivers/nvdimm/nd-core.h|   3 +-
 drivers/nvdimm/nd.h |   6 -
 include/linux/libnvdimm.h   |  21 +++-
 tools/testing/nvdimm/Kbuild |   1 +
 10 files changed, 332 insertions(+), 282 deletions(-)
 create mode 100644 drivers/nvdimm/badrange.c

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index a3ecd5e..4b157f8 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct 
acpi_nfit_desc *acpi_desc,
if (ars_status->out_length
< 44 + sizeof(struct nd_ars_record) * (i + 1))
break;
-   rc = nvdimm_bus_add_poison(nvdimm_bus,
+   rc = nvdimm_bus_add_badrange(nvdimm_bus,
ars_status->records[i].err_address,
ars_status->records[i].length);
if (rc)
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index feeb95d..b929214 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block *nb, 
unsigned long val,
continue;
 
/* If this fails due to an -ENOMEM, there is little we can do */
-   nvdimm_bus_add_poison(acpi_desc->nvdimm_bus,
+   nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
ALIGN(mce->addr, L1_CACHE_BYTES),
L1_CACHE_BYTES);
nvdimm_region_notify(nfit_spa->nd_region,
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 909554c..ca6d325 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
 libnvdimm-y += namespace_devs.o
 libnvdimm-y += label.o
+libnvdimm-y += badrange.o
 libnvdimm-$(CONFIG_ND_CLAIM) += claim.o
 libnvdimm-$(CONFIG_BTT) += btt_devs.o
 libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o
diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
new file mode 100644
index 000..0b67dcf
--- /dev/null
+++ b/drivers/nvdimm/badrange.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "nd-core.h"
+#include "nd.h"
+
+void badrange_init(struct badrange *badrange)
+{
+   INIT_LIST_HEAD(>list);
+   spin_lock_init(>lock);
+}
+EXPORT_SYMBOL_GPL(badrange_init);
+
+static void append_badrange_entry(struct badrange *badrange,
+   struct badrange_entry *bre, u64 addr, u64 length)
+{
+   lockdep_assert_held(>lock);
+   bre->start = addr;
+   bre->length = length;
+   list_add_tail(>list, >list);
+}
+
+static int alloc_and_append_badrange_entry(struct badrange *badrange,
+   u64 addr, u64 length, gfp_t flags)
+{
+   struct badrange_entry *bre;
+
+   bre = kzalloc(sizeof(*bre), flags);
+   if (!bre)
+   return -ENOMEM;
+
+   append_badrange_entry(badrange, bre, addr, length);
+   return 0;
+}
+
+static int add_badrange(struct badrange *badrange, u64 addr, u64 length)
+{
+   struct badrange_entry *bre, *bre_new;
+
+   spin_unlock(>lock);
+   bre_new = kzalloc(sizeof(*bre_new), GFP_KERNEL);
+   spin_lock(>lock);
+
+   if (list_empty(>list)) {
+   if (!bre_new)
+   return -ENOMEM;
+   append_badrange_entry(badrange, bre_new, addr, length);
+   return 0;
+   }
+
+   /*
+* There is a chance this is a duplicate, check for those first.
+* This 

[PATCH v3 4/4] nfit_test: when clearing poison, also remove badrange entries

2017-10-09 Thread Vishal Verma
The injected badrange entries can only be cleared from the kernel's
accounting by writing to the affected blocks, so when such a write sends
the clear errror DSM to nfit_test, also clear the ranges from
nfit_test's badrange list. This lets an 'ARS Inject error status' DSM to
return the correct status, omitting the cleared ranges.

Cc: Dave Jiang 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
Reviewed-by: Dave Jiang 
---
 tools/testing/nvdimm/test/nfit.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 43f948e..e400418 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -344,7 +344,8 @@ static int nfit_test_cmd_ars_status(struct ars_state 
*ars_state,
return 0;
 }
 
-static int nfit_test_cmd_clear_error(struct nd_cmd_clear_error *clear_err,
+static int nfit_test_cmd_clear_error(struct nfit_test *t,
+   struct nd_cmd_clear_error *clear_err,
unsigned int buf_len, int *cmd_rc)
 {
const u64 mask = NFIT_TEST_CLEAR_ERR_UNIT - 1;
@@ -354,12 +355,7 @@ static int nfit_test_cmd_clear_error(struct 
nd_cmd_clear_error *clear_err,
if ((clear_err->address & mask) || (clear_err->length & mask))
return -EINVAL;
 
-   /*
-* Report 'all clear' success for all commands even though a new
-* scrub will find errors again.  This is enough to have the
-* error removed from the 'badblocks' tracking in the pmem
-* driver.
-*/
+   badrange_forget(>badrange, clear_err->address, clear_err->length);
clear_err->status = 0;
clear_err->cleared = clear_err->length;
*cmd_rc = 0;
@@ -687,7 +683,7 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor 
*nd_desc,
cmd_rc);
break;
case ND_CMD_CLEAR_ERROR:
-   rc = nfit_test_cmd_clear_error(buf, buf_len, cmd_rc);
+   rc = nfit_test_cmd_clear_error(t, buf, buf_len, cmd_rc);
break;
default:
return -ENOTTY;
-- 
2.9.5

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 0/4] add error injection commands to nfit_test

2017-10-09 Thread Vishal Verma
v3:
  patch 1,2:
- move the nfit_test kbuild update for badrange to patch 1 (Dan)

v2:
  patch 1:
- change all instances of 'be' to 'bre' to avoid confusion with
  big endian (Dan)
  patch 2:
- move an injection related define to a local nfit_test header
  since it is not used outside of nfit_test (Dan)

These patches add error injection support to nfit_test by emulating the
ACPI6.2 ARS error injection commands. The commands are sent via the
ND_CMD_CALL interface, so only nfit_test knows of the various
definitions related to this.

Note that this patch set will break ndctl unit tests unless the ndctl
patches for error injection are also applied.

Dave Jiang (2):
  libnvdimm: move poison list functions to a new 'badrange' file
  nfit_test: add error injection DSMs

Vishal Verma (2):
  libnvdimm, badrange: remove a WARN for list_empty
  nfit_test: when clearing poison, also remove badrange entries

 drivers/acpi/nfit/core.c  |   2 +-
 drivers/acpi/nfit/mce.c   |   2 +-
 drivers/nvdimm/Makefile   |   1 +
 drivers/nvdimm/badrange.c | 293 ++
 drivers/nvdimm/bus.c  |  24 +--
 drivers/nvdimm/core.c | 260 +-
 drivers/nvdimm/nd-core.h  |   3 +-
 drivers/nvdimm/nd.h   |   6 -
 include/linux/libnvdimm.h |  21 ++-
 tools/testing/nvdimm/Kbuild   |   1 +
 tools/testing/nvdimm/test/nfit.c  | 199 +++
 tools/testing/nvdimm/test/nfit_test.h |   5 +
 12 files changed, 503 insertions(+), 314 deletions(-)
 create mode 100644 drivers/nvdimm/badrange.c

-- 
2.9.5

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 3/4] libnvdimm, badrange: remove a WARN for list_empty

2017-10-09 Thread Vishal Verma
Now that we're reusing the badrange functions for nfit_test, and that
exposes badrange injection/clearing to userspace via the DSM paths, it
is plausible that a user may call the clear DSM multiple times. Since it
is harmless to do so, we can remove the WARN in badrange_forget.

Cc: Dave Jiang 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 drivers/nvdimm/badrange.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
index 0b67dcf..e068d72 100644
--- a/drivers/nvdimm/badrange.c
+++ b/drivers/nvdimm/badrange.c
@@ -114,7 +114,6 @@ void badrange_forget(struct badrange *badrange, phys_addr_t 
start,
struct badrange_entry *bre, *next;
 
spin_lock(>lock);
-   WARN_ON_ONCE(list_empty(badrange_list));
 
/*
 * [start, clr_end] is the badrange interval being cleared.
-- 
2.9.5

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 2/4] nfit_test: add error injection DSMs

2017-10-09 Thread Vishal Verma
From: Dave Jiang 

Add nfit_test emulation for the new ACPI 6.2 error injectino DSMs.
This will allow unit tests to selectively inject the errors they wish to
test for.

Signed-off-by: Dave Jiang 
[vishal: Add badrange.o to nfit_test's Kbuild]
[vishal: Move injection functions to ND_CMD_CALL]
[vishal: Add support for the notification option]
[vishal: move an nfit_test private definition into a local header]
Signed-off-by: Vishal Verma 
---
 tools/testing/nvdimm/Kbuild   |   1 +
 tools/testing/nvdimm/test/nfit.c  | 187 +-
 tools/testing/nvdimm/test/nfit_test.h |   5 +
 3 files changed, 169 insertions(+), 24 deletions(-)

diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index d870520..3272ab5 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -69,6 +69,7 @@ libnvdimm-y += $(NVDIMM_SRC)/region_devs.o
 libnvdimm-y += $(NVDIMM_SRC)/region.o
 libnvdimm-y += $(NVDIMM_SRC)/namespace_devs.o
 libnvdimm-y += $(NVDIMM_SRC)/label.o
+libnvdimm-y += $(NVDIMM_SRC)/badrange.o
 libnvdimm-$(CONFIG_ND_CLAIM) += $(NVDIMM_SRC)/claim.o
 libnvdimm-$(CONFIG_BTT) += $(NVDIMM_SRC)/btt_devs.o
 libnvdimm-$(CONFIG_NVDIMM_PFN) += $(NVDIMM_SRC)/pfn_devs.o
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 435d298..43f948e 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -168,8 +168,12 @@ struct nfit_test {
spinlock_t lock;
} ars_state;
struct device *dimm_dev[NUM_DCR];
+   struct badrange badrange;
+   struct work_struct work;
 };
 
+static struct workqueue_struct *nfit_wq;
+
 static struct nfit_test *to_nfit_test(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -234,48 +238,68 @@ static int nfit_test_cmd_set_config_data(struct 
nd_cmd_set_config_hdr *nd_cmd,
return rc;
 }
 
-#define NFIT_TEST_ARS_RECORDS 4
 #define NFIT_TEST_CLEAR_ERR_UNIT 256
 
 static int nfit_test_cmd_ars_cap(struct nd_cmd_ars_cap *nd_cmd,
unsigned int buf_len)
 {
+   int ars_recs;
+
if (buf_len < sizeof(*nd_cmd))
return -EINVAL;
 
+   /* for testing, only store up to n records fit withint 4k */
+   ars_recs = SZ_4K / sizeof(struct nd_ars_record);
+
nd_cmd->max_ars_out = sizeof(struct nd_cmd_ars_status)
-   + NFIT_TEST_ARS_RECORDS * sizeof(struct nd_ars_record);
+   + ars_recs * sizeof(struct nd_ars_record);
nd_cmd->status = (ND_ARS_PERSISTENT | ND_ARS_VOLATILE) << 16;
nd_cmd->clear_err_unit = NFIT_TEST_CLEAR_ERR_UNIT;
 
return 0;
 }
 
-/*
- * Initialize the ars_state to return an ars_result 1 second in the future with
- * a 4K error range in the middle of the requested address range.
- */
-static void post_ars_status(struct ars_state *ars_state, u64 addr, u64 len)
+static void post_ars_status(struct ars_state *ars_state,
+   struct badrange *badrange, u64 addr, u64 len)
 {
struct nd_cmd_ars_status *ars_status;
struct nd_ars_record *ars_record;
+   struct badrange_entry *be;
+   u64 end = addr + len - 1;
+   int i = 0;
 
ars_state->deadline = jiffies + 1*HZ;
ars_status = ars_state->ars_status;
ars_status->status = 0;
-   ars_status->out_length = sizeof(struct nd_cmd_ars_status)
-   + sizeof(struct nd_ars_record);
ars_status->address = addr;
ars_status->length = len;
ars_status->type = ND_ARS_PERSISTENT;
-   ars_status->num_records = 1;
-   ars_record = _status->records[0];
-   ars_record->handle = 0;
-   ars_record->err_address = addr + len / 2;
-   ars_record->length = SZ_4K;
+
+   spin_lock(>lock);
+   list_for_each_entry(be, >list, list) {
+   u64 be_end = be->start + be->length - 1;
+   u64 rstart, rend;
+
+   /* skip entries outside the range */
+   if (be_end < addr || be->start > end)
+   continue;
+
+   rstart = (be->start < addr) ? addr : be->start;
+   rend = (be_end < end) ? be_end : end;
+   ars_record = _status->records[i];
+   ars_record->handle = 0;
+   ars_record->err_address = rstart;
+   ars_record->length = rend - rstart + 1;
+   i++;
+   }
+   spin_unlock(>lock);
+   ars_status->num_records = i;
+   ars_status->out_length = sizeof(struct nd_cmd_ars_status)
+   + i * sizeof(struct nd_ars_record);
 }
 
-static int nfit_test_cmd_ars_start(struct ars_state *ars_state,
+static int nfit_test_cmd_ars_start(struct nfit_test *t,
+   struct ars_state *ars_state,
struct nd_cmd_ars_start *ars_start, unsigned int buf_len,
int *cmd_rc)
 {
@@ -289,7 +313,7 @@ static int 

[PATCH v2 0/4] add error injection commands to nfit_test

2017-10-09 Thread Vishal Verma
v2:
  patch 1:
- change all instances of 'be' to 'bre' to avoid confusion with
  big endian (Dan)
  patch 2:
- move an injection related define to a local nfit_test header
  since it is not used outside of nfit_test (Dan)

These patches add error injection support to nfit_test by emulating the
ACPI6.2 ARS error injection commands. The commands are sent via the
ND_CMD_CALL interface, so only nfit_test knows of the various
definitions related to this.

Note that this patch set will break ndctl unit tests unless the ndctl
patches for error injection are also applied.

Dave Jiang (2):
  libnvdimm: move poison list functions to a new 'badrange' file
  nfit_test: add error injection DSMs

Vishal Verma (2):
  libnvdimm, badrange: remove a WARN for list_empty
  nfit_test: when clearing poison, also remove badrange entries

 drivers/acpi/nfit/core.c  |   2 +-
 drivers/acpi/nfit/mce.c   |   2 +-
 drivers/nvdimm/Makefile   |   1 +
 drivers/nvdimm/badrange.c | 293 ++
 drivers/nvdimm/bus.c  |  24 +--
 drivers/nvdimm/core.c | 260 +-
 drivers/nvdimm/nd-core.h  |   3 +-
 drivers/nvdimm/nd.h   |   6 -
 include/linux/libnvdimm.h |  21 ++-
 tools/testing/nvdimm/Kbuild   |   1 +
 tools/testing/nvdimm/test/nfit.c  | 199 +++
 tools/testing/nvdimm/test/nfit_test.h |   5 +
 12 files changed, 503 insertions(+), 314 deletions(-)
 create mode 100644 drivers/nvdimm/badrange.c

-- 
2.9.5

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 1/4] libnvdimm: move poison list functions to a new 'badrange' file

2017-10-09 Thread Vishal Verma
From: Dave Jiang 

nfit_test needs to use the poison list manipulation code as well. Make
it more generic and in the process rename poison to badrange, and move
all the related helpers to a new file.

Signed-off-by: Dave Jiang 
[vishal: add a missed include in bus.c for the new badrange functions]
[vishal: rename all instances of 'be' to 'bre']
Signed-off-by: Vishal Verma 
---
 drivers/acpi/nfit/core.c  |   2 +-
 drivers/acpi/nfit/mce.c   |   2 +-
 drivers/nvdimm/Makefile   |   1 +
 drivers/nvdimm/badrange.c | 294 ++
 drivers/nvdimm/bus.c  |  24 ++--
 drivers/nvdimm/core.c | 260 +---
 drivers/nvdimm/nd-core.h  |   3 +-
 drivers/nvdimm/nd.h   |   6 -
 include/linux/libnvdimm.h |  21 +++-
 9 files changed, 331 insertions(+), 282 deletions(-)
 create mode 100644 drivers/nvdimm/badrange.c

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index a3ecd5e..4b157f8 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct 
acpi_nfit_desc *acpi_desc,
if (ars_status->out_length
< 44 + sizeof(struct nd_ars_record) * (i + 1))
break;
-   rc = nvdimm_bus_add_poison(nvdimm_bus,
+   rc = nvdimm_bus_add_badrange(nvdimm_bus,
ars_status->records[i].err_address,
ars_status->records[i].length);
if (rc)
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index feeb95d..b929214 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block *nb, 
unsigned long val,
continue;
 
/* If this fails due to an -ENOMEM, there is little we can do */
-   nvdimm_bus_add_poison(acpi_desc->nvdimm_bus,
+   nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
ALIGN(mce->addr, L1_CACHE_BYTES),
L1_CACHE_BYTES);
nvdimm_region_notify(nfit_spa->nd_region,
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 909554c..ca6d325 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
 libnvdimm-y += namespace_devs.o
 libnvdimm-y += label.o
+libnvdimm-y += badrange.o
 libnvdimm-$(CONFIG_ND_CLAIM) += claim.o
 libnvdimm-$(CONFIG_BTT) += btt_devs.o
 libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o
diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
new file mode 100644
index 000..0b67dcf
--- /dev/null
+++ b/drivers/nvdimm/badrange.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "nd-core.h"
+#include "nd.h"
+
+void badrange_init(struct badrange *badrange)
+{
+   INIT_LIST_HEAD(>list);
+   spin_lock_init(>lock);
+}
+EXPORT_SYMBOL_GPL(badrange_init);
+
+static void append_badrange_entry(struct badrange *badrange,
+   struct badrange_entry *bre, u64 addr, u64 length)
+{
+   lockdep_assert_held(>lock);
+   bre->start = addr;
+   bre->length = length;
+   list_add_tail(>list, >list);
+}
+
+static int alloc_and_append_badrange_entry(struct badrange *badrange,
+   u64 addr, u64 length, gfp_t flags)
+{
+   struct badrange_entry *bre;
+
+   bre = kzalloc(sizeof(*bre), flags);
+   if (!bre)
+   return -ENOMEM;
+
+   append_badrange_entry(badrange, bre, addr, length);
+   return 0;
+}
+
+static int add_badrange(struct badrange *badrange, u64 addr, u64 length)
+{
+   struct badrange_entry *bre, *bre_new;
+
+   spin_unlock(>lock);
+   bre_new = kzalloc(sizeof(*bre_new), GFP_KERNEL);
+   spin_lock(>lock);
+
+   if (list_empty(>list)) {
+   if (!bre_new)
+   return -ENOMEM;
+   append_badrange_entry(badrange, bre_new, addr, length);
+   return 0;
+   }
+
+   /*
+* There is a chance this is a duplicate, check for those first.
+* This will be the common case as ARS_STATUS returns all known
+* errors in the SPA space, and we 

[PATCH v2 3/4] libnvdimm, badrange: remove a WARN for list_empty

2017-10-09 Thread Vishal Verma
Now that we're reusing the badrange functions for nfit_test, and that
exposes badrange injection/clearing to userspace via the DSM paths, it
is plausible that a user may call the clear DSM multiple times. Since it
is harmless to do so, we can remove the WARN in badrange_forget.

Cc: Dave Jiang 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 drivers/nvdimm/badrange.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
index 0b67dcf..e068d72 100644
--- a/drivers/nvdimm/badrange.c
+++ b/drivers/nvdimm/badrange.c
@@ -114,7 +114,6 @@ void badrange_forget(struct badrange *badrange, phys_addr_t 
start,
struct badrange_entry *bre, *next;
 
spin_lock(>lock);
-   WARN_ON_ONCE(list_empty(badrange_list));
 
/*
 * [start, clr_end] is the badrange interval being cleared.
-- 
2.9.5

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/4] nfit_test: add error injection DSMs

2017-10-09 Thread Dan Williams
On Mon, Oct 9, 2017 at 1:25 PM, Verma, Vishal L
 wrote:
> On Sat, 2017-10-07 at 10:07 -0700, Dan Williams wrote:
>> On Thu, Oct 5, 2017 at 6:53 PM, Vishal Verma > > wrote:
>
> []
>
>> > NFIT_ARS_TIMEOUT = 90,
>> > +   NFIT_ARS_INJECT_INVALID = 2,
>>
>> If there is no usage of this define outside of nfit_test then it
>> should be defined in tools/testing/nvdimm/test/nfit.h.
>
> Yes moved.
>
>>
>> >  };
>> >
>> >  enum nfit_root_notifiers {
>> > diff --git a/tools/testing/nvdimm/Kbuild
>> > b/tools/testing/nvdimm/Kbuild
>> > index d870520..3272ab5 100644
>> > --- a/tools/testing/nvdimm/Kbuild
>> > +++ b/tools/testing/nvdimm/Kbuild
>> > @@ -69,6 +69,7 @@ libnvdimm-y += $(NVDIMM_SRC)/region_devs.o
>> >  libnvdimm-y += $(NVDIMM_SRC)/region.o
>> >  libnvdimm-y += $(NVDIMM_SRC)/namespace_devs.o
>> >  libnvdimm-y += $(NVDIMM_SRC)/label.o
>> > +libnvdimm-y += $(NVDIMM_SRC)/badrange.o
>>
>> This hunk should go in the same patch that changes
>> drivers/nvdimm/Makefile to add badrange.o so we don't have an
>> nfit_test compile breakage gap between commits.
>
> The kernel's normal usage already has the kbuild change in the right
> place (patch 1). This adds the use of badrange functions to nfit_test
> for the first time, and hence the kbuild addition here.

It's not the nfit_test usages I'm worried about. The internal calls in
libnvdimm will break. Try building tools/testing/nvdimm/libnvdimm.ko
with only patch1 applied.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 1/4] libnvdimm: move poison list functions to a new 'badrange' file

2017-10-09 Thread Verma, Vishal L
On Fri, 2017-10-06 at 11:52 -0700, Dan Williams wrote:
> On Thu, Oct 5, 2017 at 6:53 PM, Vishal Verma  > wrote:
> > From: Dave Jiang 
> > 
> > From: Dave Jiang 
> > 
> > nfit_test needs to use the poison list manipulation code as well.
> > Make
> > it more generic and in the process rename poison to badrange, and
> > move
> > all the related helpers to a new file.
> > 
> > Signed-off-by: Dave Jiang 
> > [vishal: add a missed include in bus.c for the new badrange
> > functions]
> > Signed-off-by: Vishal Verma 
> > ---
> >  drivers/acpi/nfit/core.c  |   2 +-
> >  drivers/acpi/nfit/mce.c   |   2 +-
> >  drivers/nvdimm/Makefile   |   1 +
> >  drivers/nvdimm/badrange.c | 294
> > ++
> >  drivers/nvdimm/bus.c  |  24 ++--
> >  drivers/nvdimm/core.c | 260 +
> > ---
> >  drivers/nvdimm/nd-core.h  |   3 +-
> >  drivers/nvdimm/nd.h   |   6 -
> >  include/linux/libnvdimm.h |  21 +++-
> >  9 files changed, 331 insertions(+), 282 deletions(-)
> >  create mode 100644 drivers/nvdimm/badrange.c
> > 
> > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> > index a3ecd5e..4b157f8 100644
> > --- a/drivers/acpi/nfit/core.c
> > +++ b/drivers/acpi/nfit/core.c
> > @@ -2240,7 +2240,7 @@ static int ars_status_process_records(struct
> > acpi_nfit_desc *acpi_desc,
> > if (ars_status->out_length
> > < 44 + sizeof(struct nd_ars_record)
> > * (i + 1))
> > break;
> > -   rc = nvdimm_bus_add_poison(nvdimm_bus,
> > +   rc = nvdimm_bus_add_badrange(nvdimm_bus,
> > ars_status->records[i].err_address,
> > ars_status->records[i].length);
> > if (rc)
> > diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> > index feeb95d..b929214 100644
> > --- a/drivers/acpi/nfit/mce.c
> > +++ b/drivers/acpi/nfit/mce.c
> > @@ -67,7 +67,7 @@ static int nfit_handle_mce(struct notifier_block
> > *nb, unsigned long val,
> > continue;
> > 
> > /* If this fails due to an -ENOMEM, there is little
> > we can do */
> > -   nvdimm_bus_add_poison(acpi_desc->nvdimm_bus,
> > +   nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> > ALIGN(mce->addr, L1_CACHE_BYTES),
> > L1_CACHE_BYTES);
> > nvdimm_region_notify(nfit_spa->nd_region,
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 909554c..ca6d325 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -20,6 +20,7 @@ libnvdimm-y += region_devs.o
> >  libnvdimm-y += region.o
> >  libnvdimm-y += namespace_devs.o
> >  libnvdimm-y += label.o
> > +libnvdimm-y += badrange.o
> >  libnvdimm-$(CONFIG_ND_CLAIM) += claim.o
> >  libnvdimm-$(CONFIG_BTT) += btt_devs.o
> >  libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o
> > diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
> > new file mode 100644
> > index 000..6ad782f
> > --- /dev/null
> > +++ b/drivers/nvdimm/badrange.c
> > @@ -0,0 +1,294 @@
> > +/*
> > + * Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > modify
> > + * it under the terms of version 2 of the GNU General Public
> > License as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > GNU
> > + * General Public License for more details.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "nd-core.h"
> > +#include "nd.h"
> > +
> > +void badrange_init(struct badrange *badrange)
> > +{
> > +   INIT_LIST_HEAD(>list);
> > +   spin_lock_init(>lock);
> > +}
> > +EXPORT_SYMBOL_GPL(badrange_init);
> > +
> > +static void append_badrange_entry(struct badrange *badrange,
> > +   struct badrange_entry *be, u64 addr, u64 length)
> > +{
> > +   lockdep_assert_held(>lock);
> > +   be->start = addr;
> > +   be->length = length;
> > +   list_add_tail(>list, >list);
> > +}
> 
> Small nit, can we rename the instance variable from 'be' to 'bre'?
> 'be' triggers all my 'big endian' neurons to fire. Other than that,
> looks good.

Yep done.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()

2017-10-09 Thread Dan Williams
On Mon, Oct 9, 2017 at 12:18 PM, Jason Gunthorpe
 wrote:
> On Mon, Oct 09, 2017 at 12:05:30PM -0700, Dan Williams wrote:
>> On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpe
>>  wrote:
>> > On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote:
>> >> otherwise be quiesced. The need for this knowledge is driven by a need
>> >> to make RDMA transfers to DAX mappings safe. If the DAX file's block map
>> >> changes we need to be to reliably stop accesses to blocks that have been
>> >> freed or re-assigned to a new file.
>> >
>> > If RDMA is driving this need, why not invalidate backing RDMA MRs
>> > instead of requiring a IOMMU to do it? RDMA MR are finer grained and
>> > do not suffer from the re-use problem David W. brought up with IOVAs..
>>
>> Sounds promising. All I want in the end is to be sure that the kernel
>> is enabled to stop any in-flight RDMA at will without asking
>> userspace. Does this require per-RDMA driver opt-in or is there a
>> common call that can be made?
>
> I don't think this has ever come up in the context of an all-device MR
> invalidate requirement. Drivers already have code to invalidate
> specifc MRs, but to find all MRs that touch certain pages and then
> invalidate them would be new code.
>
> We also have ODP aware drivers that can retarget a MR to new
> physical pages. If the block map changes DAX should synchronously
> retarget the ODP MR, not halt DMA.

Have a look at the patch [1], I don't touch the ODP path.

> Most likely ODP & DAX would need to be used together to get robust
> user applications, as having the user QP's go to an error state at
> random times (due to DMA failures) during operation is never going to
> be acceptable...

It's not random. The process that set up the mapping and registered
the memory gets SIGIO when someone else tries to modify the file map.
That process then gets /proc/sys/fs/lease-break-time seconds to fix
the problem before the kernel force revokes the DMA access.

It's otherwise not acceptable to allow DMA into random locations when
the file map changes.

> Perhaps you might want to initially only support ODP MR mappings with
> DAX and then the DMA fencing issue goes away?

I'd rather try to fix the non-ODP DAX case instead of just turning it off.

[1]: https://patchwork.kernel.org/patch/9991681/
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()

2017-10-09 Thread Jason Gunthorpe
On Mon, Oct 09, 2017 at 12:05:30PM -0700, Dan Williams wrote:
> On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpe
>  wrote:
> > On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote:
> >> otherwise be quiesced. The need for this knowledge is driven by a need
> >> to make RDMA transfers to DAX mappings safe. If the DAX file's block map
> >> changes we need to be to reliably stop accesses to blocks that have been
> >> freed or re-assigned to a new file.
> >
> > If RDMA is driving this need, why not invalidate backing RDMA MRs
> > instead of requiring a IOMMU to do it? RDMA MR are finer grained and
> > do not suffer from the re-use problem David W. brought up with IOVAs..
> 
> Sounds promising. All I want in the end is to be sure that the kernel
> is enabled to stop any in-flight RDMA at will without asking
> userspace. Does this require per-RDMA driver opt-in or is there a
> common call that can be made?

I don't think this has ever come up in the context of an all-device MR
invalidate requirement. Drivers already have code to invalidate
specifc MRs, but to find all MRs that touch certain pages and then
invalidate them would be new code.

We also have ODP aware drivers that can retarget a MR to new
physical pages. If the block map changes DAX should synchronously
retarget the ODP MR, not halt DMA.

Most likely ODP & DAX would need to be used together to get robust
user applications, as having the user QP's go to an error state at
random times (due to DMA failures) during operation is never going to
be acceptable...

Perhaps you might want to initially only support ODP MR mappings with
DAX and then the DMA fencing issue goes away?

Cheers,
Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()

2017-10-09 Thread Dan Williams
On Mon, Oct 9, 2017 at 11:58 AM, Jason Gunthorpe
 wrote:
> On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote:
>> otherwise be quiesced. The need for this knowledge is driven by a need
>> to make RDMA transfers to DAX mappings safe. If the DAX file's block map
>> changes we need to be to reliably stop accesses to blocks that have been
>> freed or re-assigned to a new file.
>
> If RDMA is driving this need, why not invalidate backing RDMA MRs
> instead of requiring a IOMMU to do it? RDMA MR are finer grained and
> do not suffer from the re-use problem David W. brought up with IOVAs..

Sounds promising. All I want in the end is to be sure that the kernel
is enabled to stop any in-flight RDMA at will without asking
userspace. Does this require per-RDMA driver opt-in or is there a
common call that can be made?

Outside of that the re-use problem is already solved by just unmapping
(iommu_unmap()) the IOVA, but keeping it allocated until the eventual
dma_unmap_sg() at memory un-registration time frees it.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu()

2017-10-09 Thread Jason Gunthorpe
On Fri, Oct 06, 2017 at 03:35:54PM -0700, Dan Williams wrote:
> otherwise be quiesced. The need for this knowledge is driven by a need
> to make RDMA transfers to DAX mappings safe. If the DAX file's block map
> changes we need to be to reliably stop accesses to blocks that have been
> freed or re-assigned to a new file.

If RDMA is driving this need, why not invalidate backing RDMA MRs
instead of requiring a IOMMU to do it? RDMA MR are finer grained and
do not suffer from the re-use problem David W. brought up with IOVAs..

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v8] dma-mapping: introduce dma_get_iommu_domain()

2017-10-09 Thread Dan Williams
On Mon, Oct 9, 2017 at 3:37 AM, Robin Murphy  wrote:
> Hi Dan,
>
> On 08/10/17 04:45, Dan Williams wrote:
>> Add a dma-mapping api helper to retrieve the generic iommu_domain for a 
>> device.
>> The motivation for this interface is making RDMA transfers to DAX mappings
>> safe. If the DAX file's block map changes we need to be to reliably stop
>> accesses to blocks that have been freed or re-assigned to a new file.
>
> ...which is also going to require some way to force the IOMMU drivers
> (on x86 at least) to do a fully-synchronous unmap, instead of just
> throwing the IOVA onto a flush queue to invalidate the TLBs at some
> point in the future.

Isn't that the difference between iommu_unmap() and
iommu_unmap_fast()? As far as I can tell amd-iommu and intel-iommu
both flush iotlbs on iommu_unmap() and don't support fast unmaps.

> Assuming of course that there's an IOMMU both
> present and performing DMA translation in the first place.

That's why I want to call through the dma api to see if the iommu is
being used to satisfy dma mappings.

>> With the
>> iommu_domain and a callback from the DAX filesystem the kernel can safely
>> revoke access to a DMA device. The process that performed the RDMA memory
>> registration is also notified of this revocation event, but the kernel can 
>> not
>> otherwise be in the position of waiting for userspace to quiesce the device.
>
> OK, but why reinvent iommu_get_domain_for_dev()?

How do I know if the iommu returned from that routine is the one being
used for dma mapping operations for the device? Specifically, how
would I discover that the result of dma_map_sg() can be passed as an
IOVA range to iommu_unmap()?

>> Since PMEM+DAX is currently only enabled for x86, we only update the x86
>> iommu drivers.
>
> Note in particular that those two drivers happen to be the *only* place
> this approach could work - everyone else is going to have to fall back
> to the generic IOMMU API function anyway.

I want to make this functionality generic, but I'm not familiar with
the iommu sub-system. How are dma mapping operations routed to the
iommu driver in those other imlementations?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 09/12] xfs: wire up ->lease_direct()

2017-10-09 Thread Dan Williams
On Sun, Oct 8, 2017 at 8:45 PM, Dave Chinner  wrote:
> On Fri, Oct 06, 2017 at 03:36:06PM -0700, Dan Williams wrote:
>> A 'lease_direct' lease requires that the vma have a valid MAP_DIRECT
>> mapping established. For xfs we establish a new lease and then check if
>> the MAP_DIRECT mapping has been broken. We want to be sure that the
>> process will receive notification that the MAP_DIRECT mapping is being
>> torn down so it knows why other code paths are throwing failures.
>>
>> For example in the RDMA/ibverbs case we want ibv_reg_mr() to fail if the
>> MAP_DIRECT mapping is invalid or in the process of being invalidated.
>>
>> Cc: Jan Kara 
>> Cc: Jeff Moyer 
>> Cc: Christoph Hellwig 
>> Cc: Dave Chinner 
>> Cc: "Darrick J. Wong" 
>> Cc: Ross Zwisler 
>> Cc: Jeff Layton 
>> Cc: "J. Bruce Fields" 
>> Signed-off-by: Dan Williams 
>> ---
>>  fs/xfs/xfs_file.c |   28 
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index e35518600e28..823b65f17429 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -1166,6 +1166,33 @@ xfs_filemap_direct_close(
>>   put_map_direct_vma(vma->vm_private_data);
>>  }
>>
>> +static struct lease_direct *
>> +xfs_filemap_direct_lease(
>> + struct vm_area_struct   *vma,
>> + void(*break_fn)(void *),
>> + void*owner)
>> +{
>> + struct lease_direct *ld;
>> +
>> + ld = map_direct_lease(vma, break_fn, owner);
>> +
>> + if (IS_ERR(ld))
>> + return ld;
>> +
>> + /*
>> +  * We now have an established lease while the base MAP_DIRECT
>> +  * lease was not broken. So, we know that the "lease holder" will
>> +  * receive a SIGIO notification when the lease is broken and
>> +  * take any necessary cleanup actions.
>> +  */
>> + if (!is_map_direct_broken(vma->vm_private_data))
>> + return ld;
>> +
>> + map_direct_lease_destroy(ld);
>> +
>> + return ERR_PTR(-ENXIO);
>
> What's any of this got to do with XFS? Shouldn't it be in generic
> code, and called generic_filemap_direct_lease()?

True, I can move this to generic code. The filesystem is in charge of
where it wants to store the 'struct map_direct_state' context, but for
generic_filemap_direct_lease() it can just assume that it is stored in
->vm_private_data. I'll add comments to this effect on the new
routine.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT

2017-10-09 Thread Dan Williams
On Sun, Oct 8, 2017 at 8:40 PM, Dave Chinner  wrote:

Thanks for the review Dave.

> On Fri, Oct 06, 2017 at 03:35:49PM -0700, Dan Williams wrote:
>> MAP_DIRECT is an mmap(2) flag with the following semantics:
>>
>>   MAP_DIRECT
>>   When specified with MAP_SHARED_VALIDATE, sets up a file lease with the
>>   same lifetime as the mapping. Unlike a typical F_RDLCK lease this lease
>>   is broken when a "lease breaker" attempts to write(2), change the block
>>   map (fallocate), or change the size of the file. Otherwise the mechanism
>>   of a lease break is identical to the typical lease break case where the
>>   lease needs to be removed (munmap) within the number of seconds
>>   specified by /proc/sys/fs/lease-break-time. If the lease holder fails to
>>   remove the lease in time the kernel will invalidate the mapping and
>>   force all future accesses to the mapping to trigger SIGBUS.
>>
>>   In addition to lease break timeouts causing faults in the mapping to
>>   result in SIGBUS, other states of the file will trigger SIGBUS at fault
>>   time:
>>
>>   * The file is not DAX capable
>>   * The file has reflinked (copy-on-write) blocks
>>   * The fault would trigger the filesystem to allocate blocks
>>   * The fault would trigger the filesystem to perform extent conversion
>>
>>   In other words, MAP_DIRECT expects and enforces a fully allocated file
>>   where faults can be satisfied without modifying block map metadata.
>>
>>   An unprivileged process may establish a MAP_DIRECT mapping on a file
>>   whose UID (owner) matches the filesystem UID of the  process. A process
>>   with the CAP_LEASE capability may establish a MAP_DIRECT mapping on
>>   arbitrary files
>>
>>   ERRORS
>>   EACCES Beyond the typical mmap(2) conditions that trigger EACCES
>>   MAP_DIRECT also requires the permission to set a file lease.
>>
>>   EOPNOTSUPP The filesystem explicitly does not support the flag
>>
>>   SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
>>  might require block-map updates, or the lease timed out and the
>>  kernel invalidated the mapping.
>>
>> Cc: Jan Kara 
>> Cc: Arnd Bergmann 
>> Cc: Jeff Moyer 
>> Cc: Christoph Hellwig 
>> Cc: Dave Chinner 
>> Cc: Alexander Viro 
>> Cc: "Darrick J. Wong" 
>> Cc: Ross Zwisler 
>> Cc: Jeff Layton 
>> Cc: "J. Bruce Fields" 
>> Signed-off-by: Dan Williams 
>> ---
>>  fs/xfs/Kconfig  |2 -
>>  fs/xfs/xfs_file.c   |  102 
>> +++
>>  include/linux/mman.h|3 +
>>  include/uapi/asm-generic/mman.h |1
>>  4 files changed, 106 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
>> index f62fc6629abb..f8765653a438 100644
>> --- a/fs/xfs/Kconfig
>> +++ b/fs/xfs/Kconfig
>> @@ -112,4 +112,4 @@ config XFS_ASSERT_FATAL
>>
>>  config XFS_LAYOUT
>>   def_bool y
>> - depends on EXPORTFS_BLOCK_OPS
>> + depends on EXPORTFS_BLOCK_OPS || FS_DAX
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index ebdd0bd2b261..e35518600e28 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -40,12 +40,22 @@
>>  #include "xfs_iomap.h"
>>  #include "xfs_reflink.h"
>>
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>
>>  static const struct vm_operations_struct xfs_file_vm_ops;
>> +static const struct vm_operations_struct xfs_file_vm_direct_ops;
>> +
>> +static inline bool
>> +is_xfs_map_direct(
>> + struct vm_area_struct *vma)
>> +{
>> + return vma->vm_ops == _file_vm_direct_ops;
>> +}
>
> Namespacing (xfs_vma_is_direct) and whitespace damage.

Will fix.

>
>>
>>  /*
>>   * Clear the specified ranges to zero through either the pagecache or DAX.
>> @@ -1008,6 +1018,26 @@ xfs_file_llseek(
>>   return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
>>  }
>>
>> +static int
>> +xfs_vma_checks(
>> + struct vm_area_struct   *vma,
>> + struct inode*inode)
>
> Exactly what are we checking for - function name doesn't tell me,
> and there's no comments, either?

Ok, I'll improve this.

>
>> +{
>> + if (!is_xfs_map_direct(vma))
>> + return 0;
>> +
>> + if (!is_map_direct_valid(vma->vm_private_data))
>> + return VM_FAULT_SIGBUS;
>> +
>> + if (xfs_is_reflink_inode(XFS_I(inode)))
>> + return VM_FAULT_SIGBUS;
>> +
>> + if (!IS_DAX(inode))
>> + return VM_FAULT_SIGBUS;
>
> And how do we get is_xfs_map_direct() set to true if we don't have a
> DAX inode or the inode has shared extents?

So, this was my way of trying to satisfy the request you made here:

https://lkml.org/lkml/2017/8/11/876

i.e. allow MAP_DIRECT on non-dax files 

Re: [ndctl PATCH 5/8] ndctl: add an inject-error command

2017-10-09 Thread Dan Williams
On Thu, Oct 5, 2017 at 6:54 PM, Vishal Verma  wrote:
> Add an inject-error command to ndctl. This uses the error injection DSMs
> in ACPI6.2 to provide a generic error injection and management
> interface. Once can inject errors, and view as well as clear injected
> errors using these commands.
>
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  Documentation/ndctl/Makefile.am|   1 +
>  Documentation/ndctl/ndctl-inject-error.txt | 108 +
>  Documentation/ndctl/ndctl.txt  |   1 +
>  builtin.h  |   1 +
>  contrib/ndctl  |   5 +-
>  ndctl/Makefile.am  |   3 +-
>  ndctl/inject-error.c   | 745 
> +
>  ndctl/libndctl-nfit.h  |   8 +
>  ndctl/ndctl.c  |   1 +
>  util/json.c|  26 +
>  util/json.h|   3 +
>  util/size.h|   1 +
>  12 files changed, 901 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ndctl/ndctl-inject-error.txt
>  create mode 100644 ndctl/inject-error.c
>
> diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
> index 229d908..615baf0 100644
> --- a/Documentation/ndctl/Makefile.am
> +++ b/Documentation/ndctl/Makefile.am
> @@ -30,6 +30,7 @@ man1_MANS = \
> ndctl-create-namespace.1 \
> ndctl-destroy-namespace.1 \
> ndctl-check-namespace.1 \
> +   ndctl-inject-error.1 \
> ndctl-list.1
>
>  CLEANFILES = $(man1_MANS)
> diff --git a/Documentation/ndctl/ndctl-inject-error.txt 
> b/Documentation/ndctl/ndctl-inject-error.txt
> new file mode 100644
> index 000..bd9e197
> --- /dev/null
> +++ b/Documentation/ndctl/ndctl-inject-error.txt
> @@ -0,0 +1,108 @@
> +ndctl-inject-error(1)
> +=
> +
> +NAME
> +
> +ndctl-inject-error - inject media errors at a namespace offset
> +
> +SYNOPSIS
> +
> +[verse]
> +'ndctl inject-error'  []
> +
> +include::namespace-description.txt[]
> +
> +ndctl-inject-error can be used to ask the platform to simulate media errors
> +in the nvdimm address space to aid debugging and development of features
> +related to error handling.
> +
> +WARNING: These commands are DANGEROUS and can cause data loss. They are
> +only provided for testing and debugging purposes.
> +
> +EXAMPLES
> +
> +
> +Inject errors in namespace0.0 at sector 12 for a 2 sectors (i.e. 12, 13)
> +[verse]
> +ndctl inject-error --sector=12 --count=2 namespace0.0
> +
> +Check status of injected errors on namespace0.0
> +[verse]
> +ndctl inject-error --status namespacce0.0
> +
> +Clear the injected errors at sector 12 for 2 sectors on namespace0.0
> +[verse]
> +ndctl inject-error --clear --sector=12 --count=2 namespacce0.0
> +
> +OPTIONS
> +---
> +-S::
> +--sector=::
> +   Namespace sector offset in 512 byte sized sectors where the error is
> +   to be injected.

Let's use the term "block" instead of "sector" since the --media-error
json in ndctl list reports bad 'blocks' and the kernel interfaces use
'block'.

> +
> +   NOTE: The offset is interpreted in different ways based on the "mode"
> +   of the namespace. For "raw" mode, the offset is the base namespace
> +   offset. For "memory" mode (i.e. a "pfn" namespace), the offset is
> +   relative to the user-visible part of the namespace, and the offset
> +   introduced by the kernel's metadata will be accounted for. For a
> +   "sector" mode namespace (i.e. a "BTT" namespace), the offset is
> +   relative to the base namespace, as the BTT translation details are
> +   internal to the kernel, and can't be accounted for while injecting
> +   errors.
> +
> +-c::
> +--count=::
> +   Number of sectors to inject as errors. This is also in terms of fixed,
> +   512 byte sectors.
> +
> +-d::
> +--clear::

How about "--uninject"?

> +   This option will ask the platform to clear any injected errors for the
> +   specified sector offset, and count.
> +
> +   WARNING: This will not clear the kernel's internal "badrange" and
> +   "badblock" tracking - those can only be cleared by doing a write to

badrange is a kernel internal implementation detail. So we can just
say "This will not clear the kernel's internal bad block tracking"

> +   the affected locations. Hence use the --clear option only if you know
> +   exactly what you are doing. For normal usage, injected errors should
> +   only be cleared by doing writes. Do not expect have the original data
> +   intact after injecting an error, and clearing it using --clear - it
> +   will be lost, as the only "real" way to clear the error location is
> +   to write to it or zero it (truncate/hole-punch).
> +
> +-t::
> +--status::

"--query"?


Re: [PATCH 4/4] nfit_test: when clearing poison, also remove badrange entries

2017-10-09 Thread Dave Jiang


On 10/05/2017 06:53 PM, Vishal Verma wrote:
> The injected badrange entries can only be cleared from the kernel's
> accounting by writing to the affected blocks, so when such a write sends
> the clear error DSM to nfit_test, also clear the ranges from
> nfit_test's badrange list. This lets an 'ARS Inject error status' DSM to
> return the correct status, omitting the cleared ranges.
> 
> Cc: Dave Jiang 
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 

Reviewed-by: Dave Jiang 

> ---
>  tools/testing/nvdimm/test/nfit.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/nvdimm/test/nfit.c 
> b/tools/testing/nvdimm/test/nfit.c
> index 43f948e..e400418 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -344,7 +344,8 @@ static int nfit_test_cmd_ars_status(struct ars_state 
> *ars_state,
>   return 0;
>  }
>  
> -static int nfit_test_cmd_clear_error(struct nd_cmd_clear_error *clear_err,
> +static int nfit_test_cmd_clear_error(struct nfit_test *t,
> + struct nd_cmd_clear_error *clear_err,
>   unsigned int buf_len, int *cmd_rc)
>  {
>   const u64 mask = NFIT_TEST_CLEAR_ERR_UNIT - 1;
> @@ -354,12 +355,7 @@ static int nfit_test_cmd_clear_error(struct 
> nd_cmd_clear_error *clear_err,
>   if ((clear_err->address & mask) || (clear_err->length & mask))
>   return -EINVAL;
>  
> - /*
> -  * Report 'all clear' success for all commands even though a new
> -  * scrub will find errors again.  This is enough to have the
> -  * error removed from the 'badblocks' tracking in the pmem
> -  * driver.
> -  */
> + badrange_forget(>badrange, clear_err->address, clear_err->length);
>   clear_err->status = 0;
>   clear_err->cleared = clear_err->length;
>   *cmd_rc = 0;
> @@ -687,7 +683,7 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor 
> *nd_desc,
>   cmd_rc);
>   break;
>   case ND_CMD_CLEAR_ERROR:
> - rc = nfit_test_cmd_clear_error(buf, buf_len, cmd_rc);
> + rc = nfit_test_cmd_clear_error(t, buf, buf_len, cmd_rc);
>   break;
>   default:
>   return -ENOTTY;
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH] ndctl, test: rdma vs dax

2017-10-09 Thread Dan Williams
On Mon, Oct 9, 2017 at 1:07 AM, Johannes Thumshirn  wrote:
> On Sat, Oct 07, 2017 at 08:14:42AM -0700, Dan Williams wrote:
> [...]
>
>> +rxe_cfg stop
>> +rxe_cfg start
>> +if ! rxe_cfg status | grep -n rxe0; then
>> + rxe_cfg add eth0
>> +fi
>
> Can we maybe skip the dependency on rxe_cfg? All that is needed is modprobe
> and echo.

Sure, I'll take a look.

> Also hard coding eth0 might be problematic in this case. This works
> on your test-setup but surely isn't portable.

Yes,  which is part of the reason I have this listed under the
"destructive" tests. Any advice on how to make it portable would be
appreciated.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC 00/16] NOVA: a new file system for persistent memory

2017-10-09 Thread Miklos Szeredi
On Thu, Aug 3, 2017 at 9:48 AM, Steven Swanson  wrote:
> This is an RFC patch series that impements NOVA (NOn-Volatile memory
> Accelerated file system), a new file system built for PMEM.

Hi,

Thanks for posting.

I read the paper and the design looks nice.  Then I  looked at the
patches, but could not find a place to start, nor something I could
actually try out.  So let me suggest some ways to make this more
reviewer/tester friendly:

1) try starting with something very simple yet working and supporting
the final layout
   - no optimizations (one big lock, no per-cpu data, rcu, numa, etc support)
   - no support for optional features (checksumming, NFS export, etc)
   - missing mandatory features (e.g. just readdir and getattr support)
   - try and get it down to <5k lines, preferably 2-3k

2) pointer to sources and instructions for trying it out without
special hardware

3) build on this minimal working version by
   - adding mandatory features
   - then adding optimizations

4) each patch should leave the tree in a compiling and working state
but should be small and easily reviewed

5) leave optional features and unimportant optimizations for a later
submission; try to make the patchset as small as you meaningfully can
(i.e. it should be fully working and demonstrate the capabilities and
performance, but nothing more).

Thanks,
Miklos
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] Fix mpage_writepage() for pages with buffers

2017-10-09 Thread Johannes Thumshirn

Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v8] dma-mapping: introduce dma_get_iommu_domain()

2017-10-09 Thread Robin Murphy
Hi Dan,

On 08/10/17 04:45, Dan Williams wrote:
> Add a dma-mapping api helper to retrieve the generic iommu_domain for a 
> device.
> The motivation for this interface is making RDMA transfers to DAX mappings
> safe. If the DAX file's block map changes we need to be to reliably stop
> accesses to blocks that have been freed or re-assigned to a new file.

...which is also going to require some way to force the IOMMU drivers
(on x86 at least) to do a fully-synchronous unmap, instead of just
throwing the IOVA onto a flush queue to invalidate the TLBs at some
point in the future. Assuming of course that there's an IOMMU both
present and performing DMA translation in the first place.

> With the
> iommu_domain and a callback from the DAX filesystem the kernel can safely
> revoke access to a DMA device. The process that performed the RDMA memory
> registration is also notified of this revocation event, but the kernel can not
> otherwise be in the position of waiting for userspace to quiesce the device.

OK, but why reinvent iommu_get_domain_for_dev()?

> Since PMEM+DAX is currently only enabled for x86, we only update the x86
> iommu drivers.

Note in particular that those two drivers happen to be the *only* place
this approach could work - everyone else is going to have to fall back
to the generic IOMMU API function anyway.

Robin.

> Cc: Marek Szyprowski 
> Cc: Robin Murphy 
> Cc: Greg Kroah-Hartman 
> Cc: Joerg Roedel 
> Cc: David Woodhouse 
> Cc: Ashok Raj 
> Cc: Jan Kara 
> Cc: Jeff Moyer 
> Cc: Christoph Hellwig 
> Cc: Dave Chinner 
> Cc: "Darrick J. Wong" 
> Cc: Ross Zwisler 
> Signed-off-by: Dan Williams 
> ---
> Changes since v7:
> * retrieve the iommu_domain so that we can later pass the results of
>   dma_map_* to iommu_unmap() in advance of the actual dma_unmap_*.
> 
>  drivers/base/dma-mapping.c  |   10 ++
>  drivers/iommu/amd_iommu.c   |   10 ++
>  drivers/iommu/intel-iommu.c |   15 +++
>  include/linux/dma-mapping.h |3 +++
>  4 files changed, 38 insertions(+)
> 
> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
> index e584eddef0a7..fdb9764f95a4 100644
> --- a/drivers/base/dma-mapping.c
> +++ b/drivers/base/dma-mapping.c
> @@ -369,3 +369,13 @@ void dma_deconfigure(struct device *dev)
>   of_dma_deconfigure(dev);
>   acpi_dma_deconfigure(dev);
>  }
> +
> +struct iommu_domain *dma_get_iommu_domain(struct device *dev)
> +{
> + const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> + if (ops && ops->get_iommu)
> + return ops->get_iommu(dev);
> + return NULL;
> +}
> +EXPORT_SYMBOL(dma_get_iommu_domain);
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 51f8215877f5..c8e1a45af182 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2271,6 +2271,15 @@ static struct protection_domain *get_domain(struct 
> device *dev)
>   return domain;
>  }
>  
> +static struct iommu_domain *amd_dma_get_iommu(struct device *dev)
> +{
> + struct protection_domain *domain = get_domain(dev);
> +
> + if (IS_ERR(domain))
> + return NULL;
> + return >domain;
> +}
> +
>  static void update_device_table(struct protection_domain *domain)
>  {
>   struct iommu_dev_data *dev_data;
> @@ -2689,6 +2698,7 @@ static const struct dma_map_ops amd_iommu_dma_ops = {
>   .unmap_sg   = unmap_sg,
>   .dma_supported  = amd_iommu_dma_supported,
>   .mapping_error  = amd_iommu_mapping_error,
> + .get_iommu  = amd_dma_get_iommu,
>  };
>  
>  static int init_reserved_iova_ranges(void)
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6784a05dd6b2..f3f4939cebad 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3578,6 +3578,20 @@ static int iommu_no_mapping(struct device *dev)
>   return 0;
>  }
>  
> +static struct iommu_domain *intel_dma_get_iommu(struct device *dev)
> +{
> + struct dmar_domain *domain;
> +
> + if (iommu_no_mapping(dev))
> + return NULL;
> +
> + domain = get_valid_domain_for_dev(dev);
> + if (!domain)
> + return NULL;
> +
> + return >domain;
> +}
> +
>  static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
>size_t size, int dir, u64 dma_mask)
>  {
> @@ -3872,6 +3886,7 @@ const struct dma_map_ops intel_dma_ops = {
>   .map_page = intel_map_page,
>   .unmap_page = intel_unmap_page,
>   .mapping_error = intel_mapping_error,
> + .get_iommu = intel_dma_get_iommu,
>  #ifdef CONFIG_X86
>   .dma_supported = x86_dma_supported,
>  #endif
> diff --git a/include/linux/dma-mapping.h 

Re: [ndctl PATCH] ndctl, test: rdma vs dax

2017-10-09 Thread Johannes Thumshirn
On Sat, Oct 07, 2017 at 08:14:42AM -0700, Dan Williams wrote:
[...]

> +rxe_cfg stop
> +rxe_cfg start
> +if ! rxe_cfg status | grep -n rxe0; then
> + rxe_cfg add eth0
> +fi

Can we maybe skip the dependency on rxe_cfg? All that is needed is modprobe
and echo. Also hard coding eth0 might be problematic in this case. This works
on your test-setup but surely isn't portable.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Returned mail: see transcript for details

2017-10-09 Thread The Post Office
The original message was received at Mon, 9 Oct 2017 15:19:27 +0800
from lists.01.org [150.243.241.146]

- The following addresses had permanent fatal errors -




___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm