[no subject]

2017-11-30 Thread Post Office
This Message was undeliverable due to the following reason:

Your message was not delivered because the destination computer was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.

Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.

Your message was not delivered within 4 days:
Host 44.159.81.28 is not responding.

The following recipients did not receive this message:


Please reply to postmas...@lists.01.org
if you feel this message to be in error.



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] acpi, nfit: fix health event notification

2017-11-30 Thread Dan Williams
Integration testing with a BIOS that generates injected health event
notifications fails to communicate those events to userspace. The nfit
driver neglects to link the ACPI DIMM device with the necessary driver
data so acpi_nvdimm_notify() fails this lookup:

nfit_mem = dev_get_drvdata(dev);
if (nfit_mem && nfit_mem->flags_attr)
sysfs_notify_dirent(nfit_mem->flags_attr);

Add the necessary linkage when installing the notification handler and
clean it up when the nfit driver instance is torn down.

Cc: 
Cc: Toshi Kani 
Cc: Vishal Verma 
Fixes: ba9c8dd3c222 ("acpi, nfit: add dimm device notification support")
Reported-by: Daniel Osawa 
Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit/core.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index ff2580e7611d..947ea8a92761 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1670,6 +1670,11 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dev_name(_dimm->dev));
return -ENXIO;
}
+   /*
+* Record nfit_mem for the notification path to track back to
+* the nfit sysfs attributes for this dimm device object.
+*/
+   dev_set_drvdata(_dimm->dev, nfit_mem);
 
/*
 * Until standardization materializes we need to consider 4
@@ -1755,6 +1760,7 @@ static void shutdown_dimm_notify(void *data)
if (adev_dimm)
acpi_remove_notify_handler(adev_dimm->handle,
ACPI_DEVICE_NOTIFY, acpi_nvdimm_notify);
+   dev_set_drvdata(_dimm->dev, NULL);
}
mutex_unlock(_desc->init_mutex);
 }

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[RFC PATCH 2/4] firmware: dmi: Add function to look up a handle and return DIMM size

2017-11-30 Thread Tony Luck
When we first scan the SMBIOS table, save the size of the DIMM.

Provide a function for other code (EDAC driver) to look up the size
of a DIMM from its SMBIOS handle.

Signed-off-by: Tony Luck 
---
 drivers/firmware/dmi_scan.c | 29 +
 include/linux/dmi.h |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 783041964439..946e86fb1ec6 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -37,6 +37,7 @@ static char dmi_ids_string[128] __initdata;
 static struct dmi_memdev_info {
const char *device;
const char *bank;
+   u64 size;
u16 handle;
 } *dmi_memdev;
 static int dmi_memdev_nr;
@@ -395,6 +396,8 @@ static void __init save_mem_devices(const struct dmi_header 
*dm, void *v)
 {
const char *d = (const char *)dm;
static int nr;
+   u64 bytes;
+   u16 size;
 
if (dm->type != DMI_ENTRY_MEM_DEVICE || dm->length < 0x12)
return;
@@ -405,6 +408,18 @@ static void __init save_mem_devices(const struct 
dmi_header *dm, void *v)
dmi_memdev[nr].handle = get_unaligned(>handle);
dmi_memdev[nr].device = dmi_string(dm, d[0x10]);
dmi_memdev[nr].bank = dmi_string(dm, d[0x11]);
+   size = get_unaligned((u16 *)[0xC]);
+   if (size == 0)
+   bytes = 0;
+   else if (size == 0x)
+   bytes = ~0ul;
+   else if (size & 0x8000)
+   bytes = (u64)(size & 0x7fff) << 10;
+   else if (size != 0x7fff)
+   bytes = (u64)size << 20;
+   else
+   bytes = (u64)get_unaligned((u32 *)[0x1C]) << 20;
+   dmi_memdev[nr].size = bytes;
nr++;
 }
 
@@ -1073,3 +1088,17 @@ void dmi_memdev_name(u16 handle, const char **bank, 
const char **device)
}
 }
 EXPORT_SYMBOL_GPL(dmi_memdev_name);
+
+u64 dmi_memdev_size(u16 handle)
+{
+   int n;
+
+   if (dmi_memdev) {
+   for (n = 0; n < dmi_memdev_nr; n++) {
+   if (handle == dmi_memdev[n].handle)
+   return dmi_memdev[n].size;
+   }
+   }
+   return ~0ul;
+}
+EXPORT_SYMBOL_GPL(dmi_memdev_size);
diff --git a/include/linux/dmi.h b/include/linux/dmi.h
index 46e151172d95..7f5929123b69 100644
--- a/include/linux/dmi.h
+++ b/include/linux/dmi.h
@@ -113,6 +113,7 @@ extern int dmi_walk(void (*decode)(const struct dmi_header 
*, void *),
void *private_data);
 extern bool dmi_match(enum dmi_field f, const char *str);
 extern void dmi_memdev_name(u16 handle, const char **bank, const char 
**device);
+extern u64 dmi_memdev_size(u16 handle);
 
 #else
 
@@ -142,6 +143,7 @@ static inline bool dmi_match(enum dmi_field f, const char 
*str)
{ return false; }
 static inline void dmi_memdev_name(u16 handle, const char **bank,
const char **device) { }
+static inline u64 dmi_memdev_size(u16 handle) { return ~0ul; }
 static inline const struct dmi_system_id *
dmi_first_match(const struct dmi_system_id *list) { return NULL; }
 
-- 
2.14.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[RFC PATCH 3/4] edac: Add new memory type for non-volatile DIMMs

2017-11-30 Thread Tony Luck
There are now non-volatile versions of DIMMs. Add a new entry to
"enum mem_type" and update places that use it with new strings.

Signed-off-by: Tony Luck 
---
 drivers/edac/edac_mc.c   | 1 +
 drivers/edac/edac_mc_sysfs.c | 3 ++-
 include/linux/edac.h | 3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 480072139b7a..8178e74decbf 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -215,6 +215,7 @@ const char * const edac_mem_types[] = {
[MEM_LRDDR3]= "Load-Reduced DDR3 RAM",
[MEM_DDR4]  = "Unbuffered DDR4 RAM",
[MEM_RDDR4] = "Registered DDR4 RAM",
+   [MEM_NVDIMM]= "Non-volatile RAM",
 };
 EXPORT_SYMBOL_GPL(edac_mem_types);
 
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index e4fcfa84fbd3..53cbb3518efc 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -110,7 +110,8 @@ static const char * const mem_types[] = {
[MEM_DDR3] = "Unbuffered-DDR3",
[MEM_RDDR3] = "Registered-DDR3",
[MEM_DDR4] = "Unbuffered-DDR4",
-   [MEM_RDDR4] = "Registered-DDR4"
+   [MEM_RDDR4] = "Registered-DDR4",
+   [MEM_NVDIMM] = "Non-volatile RAM",
 };
 
 static const char * const dev_types[] = {
diff --git a/include/linux/edac.h b/include/linux/edac.h
index cd75c173fd00..bffb97828ed6 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -186,6 +186,7 @@ static inline char *mc_event_error_type(const unsigned int 
err_type)
  * @MEM_RDDR4: Registered DDR4 RAM
  * This is a variant of the DDR4 memories.
  * @MEM_LRDDR4:Load-Reduced DDR4 memory.
+ * @MEM_NVDIMM:Non-volatile RAM
  */
 enum mem_type {
MEM_EMPTY = 0,
@@ -209,6 +210,7 @@ enum mem_type {
MEM_DDR4,
MEM_RDDR4,
MEM_LRDDR4,
+   MEM_NVDIMM,
 };
 
 #define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
@@ -231,6 +233,7 @@ enum mem_type {
 #define MEM_FLAG_DDR4   BIT(MEM_DDR4)
 #define MEM_FLAG_RDDR4  BIT(MEM_RDDR4)
 #define MEM_FLAG_LRDDR4 BIT(MEM_LRDDR4)
+#define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM)
 
 /**
  * enum edac-type - Error Detection and Correction capabilities and mode
-- 
2.14.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[RFC PATCH 0/4] Teach EDAC driver about NVDIMMs

2017-11-30 Thread Tony Luck
A Skylake server may have some DIMM slots filled with NVDIMMs
instead of normal DDR4 DIMMs. These are enumerated differently
by the memory controller.

Sadly there isn't an easy way to just peek at some memory controller
register to find the size of these DIMMs, so we have to rely on the
NFIT and SMBIOS tables to get that information.

This series only tackles the topology function of the EDAC
driver.  A later series of patches will fix the address translation
parts so that errors in NVDIMMs will be reported correctly.

It's marked "RFC" because it depends on the new ACPCIA version 20171110
which has only just made it to Rafael's tree.

Some of you may only care about some of the parts that touch code you
maintain, but I copied you on all four because you might like to see
the bigger picture.

Tony Luck (4):
  acpi, nfit: Add function to look up nvdimm device and provide SMBIOS
handle
  firmware: dmi: Add function to look up a handle and return DIMM size
  edac: Add new memory type for non-volatile DIMMs
  EDAC, skx_edac: Detect non-volatile DIMMs

 drivers/acpi/nfit/core.c | 27 +
 drivers/edac/Kconfig |  2 ++
 drivers/edac/edac_mc.c   |  1 +
 drivers/edac/edac_mc_sysfs.c |  3 ++-
 drivers/edac/skx_edac.c  | 56 
 drivers/firmware/dmi_scan.c  | 29 +++
 include/acpi/nfit.h  | 19 +++
 include/linux/dmi.h  |  2 ++
 include/linux/edac.h |  3 +++
 9 files changed, 136 insertions(+), 6 deletions(-)
 create mode 100644 include/acpi/nfit.h


base-commit: 3fc70f8be59950ee2deecefdddb68be19b8cddd1
-- 
2.14.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[RFC PATCH 4/4] EDAC, skx_edac: Detect non-volatile DIMMs

2017-11-30 Thread Tony Luck
This just covers the topology function of the EDAC driver.
We locate which DIMM slots are populated with NVDIMMs and
query the NFIT and SMBIOS tables to get the size.

Signed-off-by: Tony Luck 
---
 drivers/edac/Kconfig|  2 ++
 drivers/edac/skx_edac.c | 56 -
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 96afb2aeed18..5c0c4a358f67 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -232,6 +232,8 @@ config EDAC_SBRIDGE
 config EDAC_SKX
tristate "Intel Skylake server Integrated MC"
depends on PCI && X86_64 && X86_MCE_INTEL && PCI_MMCONFIG
+   select DMI
+   select ACPI_NFIT
help
  Support for error detection and correction the Intel
  Skylake server Integrated Memory Controllers.
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index 16dea97568a1..814a5245029c 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -14,6 +14,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -24,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -298,6 +301,7 @@ static int get_dimm_attr(u32 reg, int lobit, int hibit, int 
add, int minval,
 }
 
 #define IS_DIMM_PRESENT(mtr)   GET_BITFIELD((mtr), 15, 15)
+#define IS_NVDIMM_PRESENT(mcddrtcfg, i)GET_BITFIELD((mcddrtcfg), (i), 
(i))
 
 #define numrank(reg) get_dimm_attr((reg), 12, 13, 0, 1, 2, "ranks")
 #define numrow(reg) get_dimm_attr((reg), 2, 4, 12, 1, 6, "rows")
@@ -346,8 +350,6 @@ static int get_dimm_info(u32 mtr, u32 amap, struct 
dimm_info *dimm,
int  banks = 16, ranks, rows, cols, npages;
u64 size;
 
-   if (!IS_DIMM_PRESENT(mtr))
-   return 0;
ranks = numrank(mtr);
rows = numrow(mtr);
cols = numcol(mtr);
@@ -379,6 +381,46 @@ static int get_dimm_info(u32 mtr, u32 amap, struct 
dimm_info *dimm,
return 1;
 }
 
+static int get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc,
+  int chan, int dimmno)
+{
+   int smbios_handle;
+   u32 dev_handle;
+   u16 flags;
+   u64 size;
+
+   dev_handle = ACPI_NFIT_BUILD_DEVICE_HANDLE(dimmno, chan, imc->lmc,
+  imc->src_id, 0);
+
+   smbios_handle = nfit_get_smbios_id(dev_handle, );
+   if (smbios_handle < 0) {
+   skx_printk(KERN_ERR, "Can't find handle for NVDIMM ADR=%x\n", 
dev_handle);
+   return 0;
+   }
+   if (flags & ACPI_NFIT_MEM_MAP_FAILED) {
+   skx_printk(KERN_ERR, "NVDIMM ADR=%x is not mapped\n", 
dev_handle);
+   return 0;
+   }
+   size = dmi_memdev_size(smbios_handle);
+   if (size == ~0ul) {
+   skx_printk(KERN_ERR, "Can't find size for NVDIMM 
ADR=%x/SMBIOS=%x\n",
+  dev_handle, smbios_handle);
+   return 0;
+   }
+   edac_dbg(0, "mc#%d: channel %d, dimm %d, %lld Mb (%lld pages)\n",
+imc->mc, chan, dimmno, size >> 20, size >> PAGE_SHIFT);
+
+   dimm->nr_pages = size >> PAGE_SHIFT;
+   dimm->grain = 32;
+   dimm->dtype = DEV_UNKNOWN;
+   dimm->mtype = MEM_NVDIMM;
+   dimm->edac_mode = EDAC_SECDED; /* likely better than this */
+   snprintf(dimm->label, sizeof(dimm->label), 
"CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u",
+imc->src_id, imc->lmc, chan, dimmno);
+
+   return 1;
+}
+
 #define SKX_GET_MTMTR(dev, reg) \
pci_read_config_dword((dev), 0x87c, )
 
@@ -395,20 +437,24 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
 {
struct skx_pvt *pvt = mci->pvt_info;
struct skx_imc *imc = pvt->imc;
+   u32 mtr, amap, mcddrtcfg;
struct dimm_info *dimm;
int i, j;
-   u32 mtr, amap;
int ndimms;
 
for (i = 0; i < NUM_CHANNELS; i++) {
ndimms = 0;
pci_read_config_dword(imc->chan[i].cdev, 0x8C, );
+   pci_read_config_dword(imc->chan[i].cdev, 0x400, );
for (j = 0; j < NUM_DIMMS; j++) {
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
 mci->n_layers, i, j, 0);
pci_read_config_dword(imc->chan[i].cdev,
0x80 + 4*j, );
-   ndimms += get_dimm_info(mtr, amap, dimm, imc, i, j);
+   if (IS_DIMM_PRESENT(mtr))
+   ndimms += get_dimm_info(mtr, amap, dimm, imc, 
i, j);
+   else if (IS_NVDIMM_PRESENT(mcddrtcfg, j))
+   ndimms += get_nvdimm_info(dimm, imc, i, j);
}
if (ndimms && !skx_check_ecc(imc->chan[0].cdev)) {
skx_printk(KERN_ERR, "ECC is disabled on imc %d\n", 

[RFC PATCH 1/4] acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle

2017-11-30 Thread Tony Luck
EDAC driver needs to look up attributes of NVDIMMs provided in SMBIOS.

Provide a function that looks up an acpi_nfit_memory_map from a device
handle (node/socket/mc/channel/dimm) and returns the SMBIOS handle.
Also pass back the "flags" so we can see if the NVDIMM is OK.

Signed-off-by: Tony Luck 
---
 drivers/acpi/nfit/core.c | 27 +++
 include/acpi/nfit.h  | 19 +++
 2 files changed, 46 insertions(+)
 create mode 100644 include/acpi/nfit.h

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 9c2c49b6a240..31c0dc30f88f 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "nfit.h"
 
 /*
@@ -478,6 +479,32 @@ static bool add_memdev(struct acpi_nfit_desc *acpi_desc,
return true;
 }
 
+int nfit_get_smbios_id(u32 device_handle, u16 *flags)
+{
+   struct acpi_nfit_memory_map *memdev;
+   struct acpi_nfit_desc *acpi_desc;
+   struct nfit_mem *nfit_mem;
+
+   mutex_lock(_desc_lock);
+   list_for_each_entry(acpi_desc, _descs, list) {
+   mutex_lock(_desc->init_mutex);
+   list_for_each_entry(nfit_mem, _desc->dimms, list) {
+   memdev = __to_nfit_memdev(nfit_mem);
+   if (memdev->device_handle == device_handle) {
+   mutex_unlock(_desc->init_mutex);
+   mutex_unlock(_desc_lock);
+   *flags = memdev->flags;
+   return memdev->physical_id;
+   }
+   }
+   mutex_unlock(_desc->init_mutex);
+   }
+   mutex_unlock(_desc_lock);
+
+   return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(nfit_get_smbios_id);
+
 /*
  * An implementation may provide a truncated control region if no block windows
  * are defined.
diff --git a/include/acpi/nfit.h b/include/acpi/nfit.h
new file mode 100644
index ..1eee1e32e72e
--- /dev/null
+++ b/include/acpi/nfit.h
@@ -0,0 +1,19 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef __ACPI_NFIT_H
+#define __ACPI_NFIT_H
+
+int nfit_get_smbios_id(u32 device_handle, u16 *flags);
+
+#endif /* __ACPI_NFIT_H */
-- 
2.14.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Dan Williams
[ adding linux-rdma ]

On Thu, Nov 30, 2017 at 10:17 AM, Michal Hocko  wrote:
>
> On Thu 30-11-17 10:03:26, Dan Williams wrote:
> > On Thu, Nov 30, 2017 at 9:42 AM, Michal Hocko  wrote:
> > >
> > > On Thu 30-11-17 08:39:51, Dan Williams wrote:
> > > > On Thu, Nov 30, 2017 at 1:53 AM, Michal Hocko  wrote:
> > > > > On Wed 29-11-17 10:05:35, Dan Williams wrote:
> > > > >> Until there is a solution to the dma-to-dax vs truncate problem it is
> > > > >> not safe to allow long standing memory registrations against
> > > > >> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> > > > >> explicitly allowed.
> > > > >>
> > > > >> This is temporary until a "memory registration with layout-lease"
> > > > >> mechanism can be implemented for the affected sub-systems (RDMA and
> > > > >> V4L2).
> > > > >
> > > > > One thing is not clear to me. Who is allowed to pin pages for ever?
> > > > > Is it possible to pin LRU pages that way as well? If yes then there
> > > > > absolutely has to be a limit for that. Sorry I could have studied the
> > > > > code much more but from a quick glance it seems to me that this is not
> > > > > limited to dax (or non-LRU in general) pages.
> > > >
> > > > I would turn this question around. "who can not tolerate a page being
> > > > pinned forever?".
> > >
> > > Any struct page on the movable zone or anything that is living on the
> > > LRU list because such a memory is unreclaimable.
> > >
> > > > In the case of filesytem-dax a page is
> > > > one-in-the-same object as a filesystem-block, and a filesystem expects
> > > > that its operations will not be blocked indefinitely. LRU pages can
> > > > continue to be pinned indefinitely because operations can continue
> > > > around the pinned page, i.e. every agent, save for the dma agent,
> > > > drops their reference to the page and its tolerable that the final
> > > > put_page() never arrives.
> > >
> > > I do not understand. Are you saying that a user triggered IO can pin LRU
> > > pages indefinitely. This would be _really_ wrong. It would be basically
> > > an mlock without any limit. So I must be misreading you here
> >
> > You're not misreading. See ib_umem_get() for example, it pins pages in
> > response to the userspace library call ibv_reg_mr() (memory
> > registration), and will not release those pages unless/until a call to
> > ibv_dereg_mr() is made.
>
> Who and how many LRU pages can pin that way and how do you prevent nasty
> users to DoS systems this way?

I assume this is something the RDMA community has had to contend with?
I'm not an RDMA person, I'm just here to fix dax.

> I remember PeterZ wanted to address a similar issue by vmpin syscall
> that would be a subject of a rlimit control. Sorry but I cannot find a
> reference here

https://lwn.net/Articles/600502/

> but if this is at g-u-p level without any accounting then
> it smells quite broken to me.

It's certainly broken with respect to filesystem-dax and if there is
other breakage we should get it all on the table.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Michal Hocko
On Thu 30-11-17 10:03:26, Dan Williams wrote:
> On Thu, Nov 30, 2017 at 9:42 AM, Michal Hocko  wrote:
> >
> > On Thu 30-11-17 08:39:51, Dan Williams wrote:
> > > On Thu, Nov 30, 2017 at 1:53 AM, Michal Hocko  wrote:
> > > > On Wed 29-11-17 10:05:35, Dan Williams wrote:
> > > >> Until there is a solution to the dma-to-dax vs truncate problem it is
> > > >> not safe to allow long standing memory registrations against
> > > >> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> > > >> explicitly allowed.
> > > >>
> > > >> This is temporary until a "memory registration with layout-lease"
> > > >> mechanism can be implemented for the affected sub-systems (RDMA and
> > > >> V4L2).
> > > >
> > > > One thing is not clear to me. Who is allowed to pin pages for ever?
> > > > Is it possible to pin LRU pages that way as well? If yes then there
> > > > absolutely has to be a limit for that. Sorry I could have studied the
> > > > code much more but from a quick glance it seems to me that this is not
> > > > limited to dax (or non-LRU in general) pages.
> > >
> > > I would turn this question around. "who can not tolerate a page being
> > > pinned forever?".
> >
> > Any struct page on the movable zone or anything that is living on the
> > LRU list because such a memory is unreclaimable.
> >
> > > In the case of filesytem-dax a page is
> > > one-in-the-same object as a filesystem-block, and a filesystem expects
> > > that its operations will not be blocked indefinitely. LRU pages can
> > > continue to be pinned indefinitely because operations can continue
> > > around the pinned page, i.e. every agent, save for the dma agent,
> > > drops their reference to the page and its tolerable that the final
> > > put_page() never arrives.
> >
> > I do not understand. Are you saying that a user triggered IO can pin LRU
> > pages indefinitely. This would be _really_ wrong. It would be basically
> > an mlock without any limit. So I must be misreading you here
> 
> You're not misreading. See ib_umem_get() for example, it pins pages in
> response to the userspace library call ibv_reg_mr() (memory
> registration), and will not release those pages unless/until a call to
> ibv_dereg_mr() is made.

Who and how many LRU pages can pin that way and how do you prevent nasty
users to DoS systems this way?

I remember PeterZ wanted to address a similar issue by vmpin syscall
that would be a subject of a rlimit control. Sorry but I cannot find a
reference here but if this is at g-u-p level without any accounting then
it smells quite broken to me.
-- 
Michal Hocko
SUSE Labs
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Dan Williams
On Thu, Nov 30, 2017 at 9:42 AM, Michal Hocko  wrote:
>
> On Thu 30-11-17 08:39:51, Dan Williams wrote:
> > On Thu, Nov 30, 2017 at 1:53 AM, Michal Hocko  wrote:
> > > On Wed 29-11-17 10:05:35, Dan Williams wrote:
> > >> Until there is a solution to the dma-to-dax vs truncate problem it is
> > >> not safe to allow long standing memory registrations against
> > >> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> > >> explicitly allowed.
> > >>
> > >> This is temporary until a "memory registration with layout-lease"
> > >> mechanism can be implemented for the affected sub-systems (RDMA and
> > >> V4L2).
> > >
> > > One thing is not clear to me. Who is allowed to pin pages for ever?
> > > Is it possible to pin LRU pages that way as well? If yes then there
> > > absolutely has to be a limit for that. Sorry I could have studied the
> > > code much more but from a quick glance it seems to me that this is not
> > > limited to dax (or non-LRU in general) pages.
> >
> > I would turn this question around. "who can not tolerate a page being
> > pinned forever?".
>
> Any struct page on the movable zone or anything that is living on the
> LRU list because such a memory is unreclaimable.
>
> > In the case of filesytem-dax a page is
> > one-in-the-same object as a filesystem-block, and a filesystem expects
> > that its operations will not be blocked indefinitely. LRU pages can
> > continue to be pinned indefinitely because operations can continue
> > around the pinned page, i.e. every agent, save for the dma agent,
> > drops their reference to the page and its tolerable that the final
> > put_page() never arrives.
>
> I do not understand. Are you saying that a user triggered IO can pin LRU
> pages indefinitely. This would be _really_ wrong. It would be basically
> an mlock without any limit. So I must be misreading you here

You're not misreading. See ib_umem_get() for example, it pins pages in
response to the userspace library call ibv_reg_mr() (memory
registration), and will not release those pages unless/until a call to
ibv_dereg_mr() is made. The current plan to fix this is to create
something like a ibv_reg_mr_lease() call that registers the memory
with an F_SETLEASE semantic so that the kernel can notify userspace
that a memory registration is being forcibly revoked by the kernel. A
previous attempt at something like this was the proposed MAP_DIRECT
mmap flag [1].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012815.html
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Michal Hocko
On Thu 30-11-17 08:39:51, Dan Williams wrote:
> On Thu, Nov 30, 2017 at 1:53 AM, Michal Hocko  wrote:
> > On Wed 29-11-17 10:05:35, Dan Williams wrote:
> >> Until there is a solution to the dma-to-dax vs truncate problem it is
> >> not safe to allow long standing memory registrations against
> >> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> >> explicitly allowed.
> >>
> >> This is temporary until a "memory registration with layout-lease"
> >> mechanism can be implemented for the affected sub-systems (RDMA and
> >> V4L2).
> >
> > One thing is not clear to me. Who is allowed to pin pages for ever?
> > Is it possible to pin LRU pages that way as well? If yes then there
> > absolutely has to be a limit for that. Sorry I could have studied the
> > code much more but from a quick glance it seems to me that this is not
> > limited to dax (or non-LRU in general) pages.
> 
> I would turn this question around. "who can not tolerate a page being
> pinned forever?".

Any struct page on the movable zone or anything that is living on the
LRU list because such a memory is unreclaimable.

> In the case of filesytem-dax a page is
> one-in-the-same object as a filesystem-block, and a filesystem expects
> that its operations will not be blocked indefinitely. LRU pages can
> continue to be pinned indefinitely because operations can continue
> around the pinned page, i.e. every agent, save for the dma agent,
> drops their reference to the page and its tolerable that the final
> put_page() never arrives.

I do not understand. Are you saying that a user triggered IO can pin LRU
pages indefinitely. This would be _really_ wrong. It would be basically
an mlock without any limit. So I must be misreading you here

> As far as I can tell it's only filesystems
> and dax that have this collision of wanting to revoke dma access to a
> page combined with not being able to wait indefinitely for dma to
> quiesce.

-- 
Michal Hocko
SUSE Labs
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Dan Williams
On Thu, Nov 30, 2017 at 1:53 AM, Michal Hocko  wrote:
> On Wed 29-11-17 10:05:35, Dan Williams wrote:
>> Until there is a solution to the dma-to-dax vs truncate problem it is
>> not safe to allow long standing memory registrations against
>> filesytem-dax vmas. Device-dax vmas do not have this problem and are
>> explicitly allowed.
>>
>> This is temporary until a "memory registration with layout-lease"
>> mechanism can be implemented for the affected sub-systems (RDMA and
>> V4L2).
>
> One thing is not clear to me. Who is allowed to pin pages for ever?
> Is it possible to pin LRU pages that way as well? If yes then there
> absolutely has to be a limit for that. Sorry I could have studied the
> code much more but from a quick glance it seems to me that this is not
> limited to dax (or non-LRU in general) pages.

I would turn this question around. "who can not tolerate a page being
pinned forever?". In the case of filesytem-dax a page is
one-in-the-same object as a filesystem-block, and a filesystem expects
that its operations will not be blocked indefinitely. LRU pages can
continue to be pinned indefinitely because operations can continue
around the pinned page, i.e. every agent, save for the dma agent,
drops their reference to the page and its tolerable that the final
put_page() never arrives. As far as I can tell it's only filesystems
and dax that have this collision of wanting to revoke dma access to a
page combined with not being able to wait indefinitely for dma to
quiesce.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 1/4] mm: introduce get_user_pages_longterm

2017-11-30 Thread Michal Hocko
On Wed 29-11-17 10:05:35, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow long standing memory registrations against
> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> explicitly allowed.
> 
> This is temporary until a "memory registration with layout-lease"
> mechanism can be implemented for the affected sub-systems (RDMA and
> V4L2).

One thing is not clear to me. Who is allowed to pin pages for ever?
Is it possible to pin LRU pages that way as well? If yes then there
absolutely has to be a limit for that. Sorry I could have studied the
code much more but from a quick glance it seems to me that this is not
limited to dax (or non-LRU in general) pages.
-- 
Michal Hocko
SUSE Labs
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm