[PATCH AUTOSEL 4.4 04/14] libnvdimm/btt: Fix a kmemdup failure check

2019-05-06 Thread Sasha Levin
From: Aditya Pakki 

[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]

In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.

Signed-off-by: Aditya Pakki 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/btt_devs.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index cb477518dd0e..4c129450495d 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -170,14 +170,15 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
 
nd_btt->id = ida_simple_get(_region->btt_ida, 0, 0, GFP_KERNEL);
-   if (nd_btt->id < 0) {
-   kfree(nd_btt);
-   return NULL;
-   }
+   if (nd_btt->id < 0)
+   goto out_nd_btt;
 
nd_btt->lbasize = lbasize;
-   if (uuid)
+   if (uuid) {
uuid = kmemdup(uuid, 16, GFP_KERNEL);
+   if (!uuid)
+   goto out_put_id;
+   }
nd_btt->uuid = uuid;
dev = _btt->dev;
dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id);
@@ -192,6 +193,13 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
}
return dev;
+
+out_put_id:
+   ida_simple_remove(_region->btt_ida, nd_btt->id);
+
+out_nd_btt:
+   kfree(nd_btt);
+   return NULL;
 }
 
 struct device *nd_btt_create(struct nd_region *nd_region)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.9 02/25] libnvdimm/namespace: Fix a potential NULL pointer dereference

2019-05-06 Thread Sasha Levin
From: Kangjie Lu 

[ Upstream commit 55c1fc0af29a6c1b92f217b7eb7581a882e0c07c ]

In case kmemdup fails, the fix goes to blk_err to avoid NULL
pointer dereference.

Signed-off-by: Kangjie Lu 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/namespace_devs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 9bc5f555ee68..cf4a90b50f8b 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2028,9 +2028,12 @@ struct device *create_namespace_blk(struct nd_region 
*nd_region,
if (!nsblk->uuid)
goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
-   if (name[0])
+   if (name[0]) {
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
GFP_KERNEL);
+   if (!nsblk->alt_name)
+   goto blk_err;
+   }
res = nsblk_add_resource(nd_region, ndd, nsblk,
__le64_to_cpu(nd_label->dpa));
if (!res)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.9 06/25] libnvdimm/btt: Fix a kmemdup failure check

2019-05-06 Thread Sasha Levin
From: Aditya Pakki 

[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]

In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.

Signed-off-by: Aditya Pakki 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/btt_devs.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 97dd2925ed6e..5d2c76682848 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -190,14 +190,15 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
 
nd_btt->id = ida_simple_get(_region->btt_ida, 0, 0, GFP_KERNEL);
-   if (nd_btt->id < 0) {
-   kfree(nd_btt);
-   return NULL;
-   }
+   if (nd_btt->id < 0)
+   goto out_nd_btt;
 
nd_btt->lbasize = lbasize;
-   if (uuid)
+   if (uuid) {
uuid = kmemdup(uuid, 16, GFP_KERNEL);
+   if (!uuid)
+   goto out_put_id;
+   }
nd_btt->uuid = uuid;
dev = _btt->dev;
dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id);
@@ -212,6 +213,13 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
}
return dev;
+
+out_put_id:
+   ida_simple_remove(_region->btt_ida, nd_btt->id);
+
+out_nd_btt:
+   kfree(nd_btt);
+   return NULL;
 }
 
 struct device *nd_btt_create(struct nd_region *nd_region)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.14 06/95] libnvdimm/btt: Fix a kmemdup failure check

2019-05-06 Thread Sasha Levin
From: Aditya Pakki 

[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]

In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.

Signed-off-by: Aditya Pakki 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/btt_devs.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index d58925295aa7..e610dd890263 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -190,14 +190,15 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
 
nd_btt->id = ida_simple_get(_region->btt_ida, 0, 0, GFP_KERNEL);
-   if (nd_btt->id < 0) {
-   kfree(nd_btt);
-   return NULL;
-   }
+   if (nd_btt->id < 0)
+   goto out_nd_btt;
 
nd_btt->lbasize = lbasize;
-   if (uuid)
+   if (uuid) {
uuid = kmemdup(uuid, 16, GFP_KERNEL);
+   if (!uuid)
+   goto out_put_id;
+   }
nd_btt->uuid = uuid;
dev = _btt->dev;
dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id);
@@ -212,6 +213,13 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
}
return dev;
+
+out_put_id:
+   ida_simple_remove(_region->btt_ida, nd_btt->id);
+
+out_nd_btt:
+   kfree(nd_btt);
+   return NULL;
 }
 
 struct device *nd_btt_create(struct nd_region *nd_region)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.14 02/95] libnvdimm/namespace: Fix a potential NULL pointer dereference

2019-05-06 Thread Sasha Levin
From: Kangjie Lu 

[ Upstream commit 55c1fc0af29a6c1b92f217b7eb7581a882e0c07c ]

In case kmemdup fails, the fix goes to blk_err to avoid NULL
pointer dereference.

Signed-off-by: Kangjie Lu 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/namespace_devs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 50b01d3eadd9..e3f228af59d1 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2234,9 +2234,12 @@ struct device *create_namespace_blk(struct nd_region 
*nd_region,
if (!nsblk->uuid)
goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
-   if (name[0])
+   if (name[0]) {
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
GFP_KERNEL);
+   if (!nsblk->alt_name)
+   goto blk_err;
+   }
res = nsblk_add_resource(nd_region, ndd, nsblk,
__le64_to_cpu(nd_label->dpa));
if (!res)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.19 04/81] acpi/nfit: Always dump _DSM output payload

2019-05-06 Thread Sasha Levin
From: Dan Williams 

[ Upstream commit 351f339faa308c1c1461314a18c832239a841ca0 ]

The dynamic-debug statements for command payload output only get emitted
when the command is not ND_CMD_CALL. Move the output payload dumping
ahead of the early return path for ND_CMD_CALL.

Fixes: 31eca76ba2fc9 ("...whitelisted dimm command marshaling mechanism")
Reported-by: Vishal Verma 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/acpi/nfit/core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 925dbc751322..8340c81b258b 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -542,6 +542,12 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
goto out;
}
 
+   dev_dbg(dev, "%s cmd: %s output length: %d\n", dimm_name,
+   cmd_name, out_obj->buffer.length);
+   print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4, 4,
+   out_obj->buffer.pointer,
+   min_t(u32, 128, out_obj->buffer.length), true);
+
if (call_pkg) {
call_pkg->nd_fw_size = out_obj->buffer.length;
memcpy(call_pkg->nd_payload + call_pkg->nd_size_in,
@@ -560,12 +566,6 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
return 0;
}
 
-   dev_dbg(dev, "%s cmd: %s output length: %d\n", dimm_name,
-   cmd_name, out_obj->buffer.length);
-   print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4, 4,
-   out_obj->buffer.pointer,
-   min_t(u32, 128, out_obj->buffer.length), true);
-
for (i = 0, offset = 0; i < desc->out_num; i++) {
u32 out_size = nd_cmd_out_size(nvdimm, cmd, desc, i, buf,
(u32 *) out_obj->buffer.pointer,
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.19 17/81] libnvdimm/pmem: fix a possible OOB access when read and write pmem

2019-05-06 Thread Sasha Levin
From: Li RongQing 

[ Upstream commit 9dc6488e84b0f64df17672271664752488cd6a25 ]

If offset is not zero and length is bigger than PAGE_SIZE,
this will cause to out of boundary access to a page memory

Fixes: 98cc093cba1e ("block, THP: make block_device_operations.rw_page support 
THP")
Co-developed-by: Liang ZhiCheng 
Signed-off-by: Liang ZhiCheng 
Signed-off-by: Li RongQing 
Reviewed-by: Ira Weiny 
Reviewed-by: Jeff Moyer 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/pmem.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 1d432c5ed275..cff027fc2676 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,13 +113,13 @@ static void write_pmem(void *pmem_addr, struct page *page,
 
while (len) {
mem = kmap_atomic(page);
-   chunk = min_t(unsigned int, len, PAGE_SIZE);
+   chunk = min_t(unsigned int, len, PAGE_SIZE - off);
memcpy_flushcache(pmem_addr, mem + off, chunk);
kunmap_atomic(mem);
len -= chunk;
off = 0;
page++;
-   pmem_addr += PAGE_SIZE;
+   pmem_addr += chunk;
}
 }
 
@@ -132,7 +132,7 @@ static blk_status_t read_pmem(struct page *page, unsigned 
int off,
 
while (len) {
mem = kmap_atomic(page);
-   chunk = min_t(unsigned int, len, PAGE_SIZE);
+   chunk = min_t(unsigned int, len, PAGE_SIZE - off);
rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
kunmap_atomic(mem);
if (rem)
@@ -140,7 +140,7 @@ static blk_status_t read_pmem(struct page *page, unsigned 
int off,
len -= chunk;
off = 0;
page++;
-   pmem_addr += PAGE_SIZE;
+   pmem_addr += chunk;
}
return BLK_STS_OK;
 }
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.19 05/81] libnvdimm/namespace: Fix a potential NULL pointer dereference

2019-05-06 Thread Sasha Levin
From: Kangjie Lu 

[ Upstream commit 55c1fc0af29a6c1b92f217b7eb7581a882e0c07c ]

In case kmemdup fails, the fix goes to blk_err to avoid NULL
pointer dereference.

Signed-off-by: Kangjie Lu 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/namespace_devs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 54d79837f7c6..73a444c41cde 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2251,9 +2251,12 @@ static struct device *create_namespace_blk(struct 
nd_region *nd_region,
if (!nsblk->uuid)
goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
-   if (name[0])
+   if (name[0]) {
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
GFP_KERNEL);
+   if (!nsblk->alt_name)
+   goto blk_err;
+   }
res = nsblk_add_resource(nd_region, ndd, nsblk,
__le64_to_cpu(nd_label->dpa));
if (!res)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 4.19 09/81] libnvdimm/btt: Fix a kmemdup failure check

2019-05-06 Thread Sasha Levin
From: Aditya Pakki 

[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]

In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.

Signed-off-by: Aditya Pakki 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/btt_devs.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 795ad4ff35ca..e341498876ca 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -190,14 +190,15 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
 
nd_btt->id = ida_simple_get(_region->btt_ida, 0, 0, GFP_KERNEL);
-   if (nd_btt->id < 0) {
-   kfree(nd_btt);
-   return NULL;
-   }
+   if (nd_btt->id < 0)
+   goto out_nd_btt;
 
nd_btt->lbasize = lbasize;
-   if (uuid)
+   if (uuid) {
uuid = kmemdup(uuid, 16, GFP_KERNEL);
+   if (!uuid)
+   goto out_put_id;
+   }
nd_btt->uuid = uuid;
dev = _btt->dev;
dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id);
@@ -212,6 +213,13 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
}
return dev;
+
+out_put_id:
+   ida_simple_remove(_region->btt_ida, nd_btt->id);
+
+out_nd_btt:
+   kfree(nd_btt);
+   return NULL;
 }
 
 struct device *nd_btt_create(struct nd_region *nd_region)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 5.0 18/99] libnvdimm/pmem: fix a possible OOB access when read and write pmem

2019-05-06 Thread Sasha Levin
From: Li RongQing 

[ Upstream commit 9dc6488e84b0f64df17672271664752488cd6a25 ]

If offset is not zero and length is bigger than PAGE_SIZE,
this will cause to out of boundary access to a page memory

Fixes: 98cc093cba1e ("block, THP: make block_device_operations.rw_page support 
THP")
Co-developed-by: Liang ZhiCheng 
Signed-off-by: Liang ZhiCheng 
Signed-off-by: Li RongQing 
Reviewed-by: Ira Weiny 
Reviewed-by: Jeff Moyer 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/pmem.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..0279eb1da3ef 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,13 +113,13 @@ static void write_pmem(void *pmem_addr, struct page *page,
 
while (len) {
mem = kmap_atomic(page);
-   chunk = min_t(unsigned int, len, PAGE_SIZE);
+   chunk = min_t(unsigned int, len, PAGE_SIZE - off);
memcpy_flushcache(pmem_addr, mem + off, chunk);
kunmap_atomic(mem);
len -= chunk;
off = 0;
page++;
-   pmem_addr += PAGE_SIZE;
+   pmem_addr += chunk;
}
 }
 
@@ -132,7 +132,7 @@ static blk_status_t read_pmem(struct page *page, unsigned 
int off,
 
while (len) {
mem = kmap_atomic(page);
-   chunk = min_t(unsigned int, len, PAGE_SIZE);
+   chunk = min_t(unsigned int, len, PAGE_SIZE - off);
rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
kunmap_atomic(mem);
if (rem)
@@ -140,7 +140,7 @@ static blk_status_t read_pmem(struct page *page, unsigned 
int off,
len -= chunk;
off = 0;
page++;
-   pmem_addr += PAGE_SIZE;
+   pmem_addr += chunk;
}
return BLK_STS_OK;
 }
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 5.0 09/99] libnvdimm/btt: Fix a kmemdup failure check

2019-05-06 Thread Sasha Levin
From: Aditya Pakki 

[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]

In case kmemdup fails, the fix releases resources and returns to
avoid the NULL pointer dereference.

Signed-off-by: Aditya Pakki 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/btt_devs.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 795ad4ff35ca..e341498876ca 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -190,14 +190,15 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
 
nd_btt->id = ida_simple_get(_region->btt_ida, 0, 0, GFP_KERNEL);
-   if (nd_btt->id < 0) {
-   kfree(nd_btt);
-   return NULL;
-   }
+   if (nd_btt->id < 0)
+   goto out_nd_btt;
 
nd_btt->lbasize = lbasize;
-   if (uuid)
+   if (uuid) {
uuid = kmemdup(uuid, 16, GFP_KERNEL);
+   if (!uuid)
+   goto out_put_id;
+   }
nd_btt->uuid = uuid;
dev = _btt->dev;
dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id);
@@ -212,6 +213,13 @@ static struct device *__nd_btt_create(struct nd_region 
*nd_region,
return NULL;
}
return dev;
+
+out_put_id:
+   ida_simple_remove(_region->btt_ida, nd_btt->id);
+
+out_nd_btt:
+   kfree(nd_btt);
+   return NULL;
 }
 
 struct device *nd_btt_create(struct nd_region *nd_region)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 5.0 04/99] acpi/nfit: Always dump _DSM output payload

2019-05-06 Thread Sasha Levin
From: Dan Williams 

[ Upstream commit 351f339faa308c1c1461314a18c832239a841ca0 ]

The dynamic-debug statements for command payload output only get emitted
when the command is not ND_CMD_CALL. Move the output payload dumping
ahead of the early return path for ND_CMD_CALL.

Fixes: 31eca76ba2fc9 ("...whitelisted dimm command marshaling mechanism")
Reported-by: Vishal Verma 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/acpi/nfit/core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 4be4dc3e8aa6..38ec79bb3edd 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -563,6 +563,12 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
goto out;
}
 
+   dev_dbg(dev, "%s cmd: %s output length: %d\n", dimm_name,
+   cmd_name, out_obj->buffer.length);
+   print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4, 4,
+   out_obj->buffer.pointer,
+   min_t(u32, 128, out_obj->buffer.length), true);
+
if (call_pkg) {
call_pkg->nd_fw_size = out_obj->buffer.length;
memcpy(call_pkg->nd_payload + call_pkg->nd_size_in,
@@ -581,12 +587,6 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
return 0;
}
 
-   dev_dbg(dev, "%s cmd: %s output length: %d\n", dimm_name,
-   cmd_name, out_obj->buffer.length);
-   print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4, 4,
-   out_obj->buffer.pointer,
-   min_t(u32, 128, out_obj->buffer.length), true);
-
for (i = 0, offset = 0; i < desc->out_num; i++) {
u32 out_size = nd_cmd_out_size(nvdimm, cmd, desc, i, buf,
(u32 *) out_obj->buffer.pointer,
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 5.0 05/99] libnvdimm/namespace: Fix a potential NULL pointer dereference

2019-05-06 Thread Sasha Levin
From: Kangjie Lu 

[ Upstream commit 55c1fc0af29a6c1b92f217b7eb7581a882e0c07c ]

In case kmemdup fails, the fix goes to blk_err to avoid NULL
pointer dereference.

Signed-off-by: Kangjie Lu 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/namespace_devs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 33a3b23b3db7..e761b29f7160 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2249,9 +2249,12 @@ static struct device *create_namespace_blk(struct 
nd_region *nd_region,
if (!nsblk->uuid)
goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
-   if (name[0])
+   if (name[0]) {
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
GFP_KERNEL);
+   if (!nsblk->alt_name)
+   goto blk_err;
+   }
res = nsblk_add_resource(nd_region, ndd, nsblk,
__le64_to_cpu(nd_label->dpa));
if (!res)
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH AUTOSEL 5.0 16/99] libnvdimm/security: provide fix for secure-erase to use zero-key

2019-05-06 Thread Sasha Levin
From: Dave Jiang 

[ Upstream commit 037c8489ade669e0f09ad40d5b91e5e1159a14b1 ]

Add a zero key in order to standardize hardware that want a key of 0's to
be passed. Some platforms defaults to a zero-key with security enabled
rather than allow the OS to enable the security. The zero key would allow
us to manage those platform as well. This also adds a fix to secure erase
so it can use the zero key to do crypto erase. Some other security commands
already use zero keys. This introduces a standard zero-key to allow
unification of semantics cross nvdimm security commands.

Signed-off-by: Dave Jiang 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin 
---
 drivers/nvdimm/security.c| 17 -
 tools/testing/nvdimm/test/nfit.c | 11 +--
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
index f8bb746a549f..6bea6852bf27 100644
--- a/drivers/nvdimm/security.c
+++ b/drivers/nvdimm/security.c
@@ -22,6 +22,8 @@ static bool key_revalidate = true;
 module_param(key_revalidate, bool, 0444);
 MODULE_PARM_DESC(key_revalidate, "Require key validation at init.");
 
+static const char zero_key[NVDIMM_PASSPHRASE_LEN];
+
 static void *key_data(struct key *key)
 {
struct encrypted_key_payload *epayload = dereference_key_locked(key);
@@ -286,8 +288,9 @@ int nvdimm_security_erase(struct nvdimm *nvdimm, unsigned 
int keyid,
 {
struct device *dev = >dev;
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
-   struct key *key;
+   struct key *key = NULL;
int rc;
+   const void *data;
 
/* The bus lock should be held at the top level of the call stack */
lockdep_assert_held(_bus->reconfig_mutex);
@@ -319,11 +322,15 @@ int nvdimm_security_erase(struct nvdimm *nvdimm, unsigned 
int keyid,
return -EOPNOTSUPP;
}
 
-   key = nvdimm_lookup_user_key(nvdimm, keyid, NVDIMM_BASE_KEY);
-   if (!key)
-   return -ENOKEY;
+   if (keyid != 0) {
+   key = nvdimm_lookup_user_key(nvdimm, keyid, NVDIMM_BASE_KEY);
+   if (!key)
+   return -ENOKEY;
+   data = key_data(key);
+   } else
+   data = zero_key;
 
-   rc = nvdimm->sec.ops->erase(nvdimm, key_data(key), pass_type);
+   rc = nvdimm->sec.ops->erase(nvdimm, data, pass_type);
dev_dbg(dev, "key: %d erase%s: %s\n", key_serial(key),
pass_type == NVDIMM_MASTER ? "(master)" : "(user)",
rc == 0 ? "success" : "fail");
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index b579f962451d..cad719876ef4 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -225,6 +225,8 @@ static struct workqueue_struct *nfit_wq;
 
 static struct gen_pool *nfit_pool;
 
+static const char zero_key[NVDIMM_PASSPHRASE_LEN];
+
 static struct nfit_test *to_nfit_test(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -1059,8 +1061,7 @@ static int nd_intel_test_cmd_secure_erase(struct 
nfit_test *t,
struct device *dev = >pdev.dev;
struct nfit_test_sec *sec = _sec_info[dimm];
 
-   if (!(sec->state & ND_INTEL_SEC_STATE_ENABLED) ||
-   (sec->state & ND_INTEL_SEC_STATE_FROZEN)) {
+   if (sec->state & ND_INTEL_SEC_STATE_FROZEN) {
nd_cmd->status = ND_INTEL_STATUS_INVALID_STATE;
dev_dbg(dev, "secure erase: wrong security state\n");
} else if (memcmp(nd_cmd->passphrase, sec->passphrase,
@@ -1068,6 +1069,12 @@ static int nd_intel_test_cmd_secure_erase(struct 
nfit_test *t,
nd_cmd->status = ND_INTEL_STATUS_INVALID_PASS;
dev_dbg(dev, "secure erase: wrong passphrase\n");
} else {
+   if (!(sec->state & ND_INTEL_SEC_STATE_ENABLED)
+   && (memcmp(nd_cmd->passphrase, zero_key,
+   ND_INTEL_PASSPHRASE_SIZE) != 0)) {
+   dev_dbg(dev, "invalid zero key\n");
+   return 0;
+   }
memset(sec->passphrase, 0, ND_INTEL_PASSPHRASE_SIZE);
memset(sec->master_passphrase, 0, ND_INTEL_PASSPHRASE_SIZE);
sec->state = 0;
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-06 Thread Frank Rowand
On 5/1/19 4:01 PM, Brendan Higgins wrote:
> ## TLDR
> 
> I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> 5.2.
> 
> Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> we would merge through your tree when the time came? Am I remembering
> correctly?
> 
> ## Background
> 
> This patch set proposes KUnit, a lightweight unit testing and mocking
> framework for the Linux kernel.
> 
> Unlike Autotest and kselftest, KUnit is a true unit testing framework;
> it does not require installing the kernel on a test machine or in a VM
> and does not require tests to be written in userspace running on a host
> kernel. Additionally, KUnit is fast: From invocation to completion KUnit
> can run several dozen tests in under a second. Currently, the entire
> KUnit test suite for KUnit runs in under a second from the initial
> invocation (build time excluded).
> 
> KUnit is heavily inspired by JUnit, Python's unittest.mock, and
> Googletest/Googlemock for C++. KUnit provides facilities for defining
> unit test cases, grouping related test cases into test suites, providing
> common infrastructure for running tests, mocking, spying, and much more.

As a result of the emails replying to this patch thread, I am now
starting to look at kselftest.  My level of understanding is based
on some slide presentations, an LWN article, https://kselftest.wiki.kernel.org/
and a _tiny_ bit of looking at kselftest code.

tl;dr; I don't really understand kselftest yet.


(1) why KUnit exists

> ## What's so special about unit testing?
> 
> A unit test is supposed to test a single unit of code in isolation,
> hence the name. There should be no dependencies outside the control of
> the test; this means no external dependencies, which makes tests orders
> of magnitudes faster. Likewise, since there are no external dependencies,
> there are no hoops to jump through to run the tests. Additionally, this
> makes unit tests deterministic: a failing unit test always indicates a
> problem. Finally, because unit tests necessarily have finer granularity,
> they are able to test all code paths easily solving the classic problem
> of difficulty in exercising error handling code.

(2) KUnit is not meant to replace kselftest

> ## Is KUnit trying to replace other testing frameworks for the kernel?
> 
> No. Most existing tests for the Linux kernel are end-to-end tests, which
> have their place. A well tested system has lots of unit tests, a
> reasonable number of integration tests, and some end-to-end tests. KUnit
> is just trying to address the unit test space which is currently not
> being addressed.

My understanding is that the intent of KUnit is to avoid booting a kernel on
real hardware or in a virtual machine.  That seems to be a matter of semantics
to me because isn't invoking a UML Linux just running the Linux kernel in
a different form of virtualization?

So I do not understand why KUnit is an improvement over kselftest.

It seems to me that KUnit is just another piece of infrastructure that I
am going to have to be familiar with as a kernel developer.  More overhead,
more information to stuff into my tiny little brain.

I would guess that some developers will focus on just one of the two test
environments (and some will focus on both), splitting the development
resources instead of pooling them on a common infrastructure.

What am I missing?

-Frank


> 
> ## More information on KUnit
> 
> There is a bunch of documentation near the end of this patch set that
> describes how to use KUnit and best practices for writing unit tests.
> For convenience I am hosting the compiled docs here:
> https://google.github.io/kunit-docs/third_party/kernel/docs/
> Additionally for convenience, I have applied these patches to a branch:
> https://kunit.googlesource.com/linux/+/kunit/rfc/v5.1-rc7/v1
> The repo may be cloned with:
> git clone https://kunit.googlesource.com/linux
> This patchset is on the kunit/rfc/v5.1-rc7/v1 branch.
> 
> ## Changes Since Last Version
> 
> None. I just rebased the last patchset on v5.1-rc7.
> 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 09/12] mm/sparsemem: Support sub-section hotplug

2019-05-06 Thread Dan Williams
The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:

 WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 
devm_memremap_pages+0x3b5/0x4c0
 devm_memremap_pages attempted on mixed region [mem 0x2-0x2fbff 
flags 0x200]
 [..]
 Call Trace:
   dump_stack+0x86/0xc3
   __warn+0xcb/0xf0
   warn_slowpath_fmt+0x5f/0x80
   devm_memremap_pages+0x3b5/0x4c0
   __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
   pmem_attach_disk+0x19a/0x440 [nd_pmem]

Recently it was discovered that the problem goes beyond RAM vs PMEM
collisions as some platform produce PMEM vs PMEM collisions within a
given section. The libnvdimm workaround for that case revealed that the
libnvdimm section-alignment-padding implementation has been broken for a
long while. A fix for that long-standing breakage introduces as many
problems as it solves as it would require a backward-incompatible change
to the namespace metadata interpretation. Instead of that dubious route
[1], address the root problem in the memory-hotplug implementation.

[1]: 
https://lore.kernel.org/r/155000671719.348031.2347363160141119237.st...@dwillia2-desk3.amr.corp.intel.com
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 mm/sparse.c |  255 ---
 1 file changed, 175 insertions(+), 80 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 8867f8901ee2..34f322d14e62 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long 
section_nr, int nid)
unsigned long root = SECTION_NR_TO_ROOT(section_nr);
struct mem_section *section;
 
+   /*
+* An existing section is possible in the sub-section hotplug
+* case. First hot-add instantiates, follow-on hot-add reuses
+* the existing section.
+*
+* The mem_hotplug_lock resolves the apparent race below.
+*/
if (mem_section[root])
-   return -EEXIST;
+   return 0;
 
section = sparse_index_alloc(nid);
if (!section)
@@ -210,6 +217,15 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
 }
 
+void subsection_mask_set(unsigned long *map, unsigned long pfn,
+   unsigned long nr_pages)
+{
+   int idx = subsection_map_index(pfn);
+   int end = subsection_map_index(pfn + nr_pages - 1);
+
+   bitmap_set(map, idx, end - idx + 1);
+}
+
 void subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 {
int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
@@ -219,20 +235,17 @@ void subsection_map_init(unsigned long pfn, unsigned long 
nr_pages)
return;
 
for (i = start_sec; i <= end_sec; i++) {
-   int idx, end;
-   unsigned long pfns;
struct mem_section *ms;
+   unsigned long pfns;
 
-   idx = subsection_map_index(pfn);
pfns = min(nr_pages, PAGES_PER_SECTION
- (pfn & ~PAGE_SECTION_MASK));
-   end = subsection_map_index(pfn + pfns - 1);
-
ms = __nr_to_section(i);
-   bitmap_set(ms->usage->subsection_map, idx, end - idx + 1);
+   subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
 
pr_debug("%s: sec: %d pfns: %ld set(%d, %d)\n", __func__, i,
-   pfns, idx, end - idx + 1);
+   pfns, subsection_map_index(pfn),
+   subsection_map_index(pfn + pfns - 1));
 
pfn += pfns;
nr_pages -= pfns;
@@ -319,6 +332,15 @@ static void __meminit sparse_init_one_section(struct 
mem_section *ms,
unsigned long pnum, struct page *mem_map,
struct mem_section_usage *usage)
 {
+   /*
+* Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug,
+* ->section_mem_map can not be guaranteed to point to a full
+*  section's worth of memory.  The field is only valid / used
+*  in the SPARSEMEM_VMEMMAP=n case.
+*/
+   if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   mem_map = NULL;
+
ms->section_mem_map &= ~SECTION_MAP_MASK;
ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
SECTION_HAS_MEM_MAP;
@@ -724,10 +746,142 @@ static void free_map_bootmem(struct page *memmap)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+#ifndef CONFIG_MEMORY_HOTREMOVE

[PATCH v8 10/12] mm/devm_memremap_pages: Enable sub-section remap

2019-05-06 Thread Dan Williams
Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Cc: Michal Hocko 
Cc: Toshi Kani 
Cc: Jérôme Glisse 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   61 +
 1 file changed, 24 insertions(+), 37 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index f355586ea54a..425904858d97 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,7 +59,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
struct vmem_altmap *altmap = >altmap;
unsigned long pfn;
 
-   pfn = res->start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
if (pgmap->altmap_valid)
pfn += vmem_altmap_offset(altmap);
return pfn;
@@ -87,7 +87,6 @@ static void devm_memremap_pages_release(void *data)
struct dev_pagemap *pgmap = data;
struct device *dev = pgmap->dev;
struct resource *res = >res;
-   resource_size_t align_start, align_size;
unsigned long pfn;
int nid;
 
@@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data)
put_page(pfn_to_page(pfn));
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-
-   nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+   nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
 
mem_hotplug_begin();
if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-   pfn = align_start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-   align_size >> PAGE_SHIFT, NULL);
+   PHYS_PFN(resource_size(res)), NULL);
} else {
-   arch_remove_memory(nid, align_start, align_size,
+   arch_remove_memory(nid, res->start, resource_size(res),
pgmap->altmap_valid ? >altmap : NULL);
-   kasan_remove_zero_shadow(__va(align_start), align_size);
+   kasan_remove_zero_shadow(__va(res->start), resource_size(res));
}
mem_hotplug_done();
 
-   untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+   untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
pgmap_array_delete(res);
dev_WARN_ONCE(dev, pgmap->altmap.alloc,
  "%s: failed to free all reserved pages\n", __func__);
@@ -141,16 +136,13 @@ static void devm_memremap_pages_release(void *data)
  */
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-   resource_size_t align_start, align_size, align_end;
-   struct vmem_altmap *altmap = pgmap->altmap_valid ?
-   >altmap : NULL;
struct resource *res = >res;
struct dev_pagemap *conflict_pgmap;
struct mhp_restrictions restrictions = {
/*
 * We do not want any optional features only our own memmap
*/
-   .altmap = altmap,
+   .altmap = pgmap->altmap_valid ? >altmap : NULL,
};
pgprot_t pgprot = PAGE_KERNEL;
int error, nid, is_ram;
@@ -158,26 +150,21 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
if (!pgmap->ref || !pgmap->kill)
return ERR_PTR(-EINVAL);
 
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-   align_end = align_start + align_size - 1;
-
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
return ERR_PTR(-ENOMEM);
}
 
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
return ERR_PTR(-ENOMEM);
}
 
-   is_ram = region_intersects(align_start, align_size,
+   is_ram = region_intersects(res->start, resource_size(res),
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
if (is_ram != 

[PATCH v8 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment

2019-05-06 Thread Dan Williams
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Reported-by: Jeff Moyer 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pfn.h  |   14 
 drivers/nvdimm/pfn_devs.c |   77 -
 include/linux/mmzone.h|3 ++
 3 files changed, 16 insertions(+), 78 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index e901e3a3b04c..cc042a98758f 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -41,18 +41,4 @@ struct nd_pfn_sb {
__le64 checksum;
 };
 
-#ifdef CONFIG_SPARSEMEM
-#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x)
-#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x)
-#else
-/*
- * In this case ZONE_DEVICE=n and we will disable 'pfn' device support,
- * but we still want pmem to compile.
- */
-#define PFN_SECTION_ALIGN_DOWN(x) (x)
-#define PFN_SECTION_ALIGN_UP(x) (x)
-#endif
-
-#define PHYS_SECTION_ALIGN_DOWN(x) 
PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x)))
-#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x)))
 #endif /* __NVDIMM_PFN_H */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index a2406253eb70..7f54374b082f 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -595,14 +595,14 @@ static u32 info_block_reserve(void)
 }
 
 /*
- * We hotplug memory at section granularity, pad the reserved area from
- * the previous section base to the namespace base address.
+ * We hotplug memory at sub-section granularity, pad the reserved area
+ * from the previous section base to the namespace base address.
  */
 static unsigned long init_altmap_base(resource_size_t base)
 {
unsigned long base_pfn = PHYS_PFN(base);
 
-   return PFN_SECTION_ALIGN_DOWN(base_pfn);
+   return SUBSECTION_ALIGN_DOWN(base_pfn);
 }
 
 static unsigned long init_altmap_reserve(resource_size_t base)
@@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t 
base)
unsigned long reserve = info_block_reserve() >> PAGE_SHIFT;
unsigned long base_pfn = PHYS_PFN(base);
 
-   reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn);
+   reserve += base_pfn - SUBSECTION_ALIGN_DOWN(base_pfn);
return reserve;
 }
 
@@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct 
dev_pagemap *pgmap)
nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
pgmap->altmap_valid = false;
} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-   nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
-   - offset) / PAGE_SIZE);
+   nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
dev_info(_pfn->dev,
"number of pfns truncated from %lld to 
%ld\n",
@@ -658,54 +657,14 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, 
struct dev_pagemap *pgmap)
return 0;
 }
 
-static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
-{
-   return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
-   ALIGN_DOWN(phys, nd_pfn->align));
-}
-
-/*
- * Check if pmem collides with 'System RAM', or other regions when
- * section aligned.  Trim it accordingly.
- */
-static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 
*end_trunc)
-{
-   struct nd_namespace_common *ndns = nd_pfn->ndns;
-   struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
-   struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent);
-   const resource_size_t start = nsio->res.start;
-   const resource_size_t end = start + resource_size(>res);
-   resource_size_t adjust, size;
-
-   *start_pad = 0;
-   *end_trunc = 0;
-
-   adjust = start - PHYS_SECTION_ALIGN_DOWN(start);
-   size = resource_size(>res) + adjust;
-   if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || nd_region_conflict(nd_region, start - adjust, size))
-   *start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
-
-   /* Now check that end of the range does not collide. */
-   adjust = PHYS_SECTION_ALIGN_UP(end) - end;
-   size = resource_size(>res) + adjust;
-   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || !IS_ALIGNED(end, nd_pfn->align)
-   || nd_region_conflict(nd_region, start, size))
-   *end_trunc = end - phys_pmem_align_down(nd_pfn, end);
-}
-
 static int nd_pfn_init(struct nd_pfn *nd_pfn)
 {
struct nd_namespace_common *ndns = nd_pfn->ndns;

[PATCH v8 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields

2019-05-06 Thread Dan Williams
At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely
on those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate it
is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
corruption is expected to benign since all other critical fields are
explicitly initialized.

Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Cc: 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/dax_devs.c |2 +-
 drivers/nvdimm/pfn.h  |1 +
 drivers/nvdimm/pfn_devs.c |   18 +++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 0453f49dc708..326f02ffca81 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(>dev);
if (!dax_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "");
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..e901e3a3b04c 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -36,6 +36,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+   /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..a2406253eb70 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn 
*nd_pfn)
return 0;
 }
 
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
u64 checksum, offset;
@@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(>dev);
if (!pfn_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
u64 checksum;
int rc;
 
-   pfn_sb = devm_kzalloc(_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
 
@@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
 
/* no info block, do init */;
+   memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(_pfn->dev,
@@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
-   pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 07/12] mm: Kill is_dev_zone() helper

2019-05-06 Thread Dan Williams
Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Acked-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   12 
 mm/page_alloc.c|2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6dd52d544857..49e7fb452dfd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return 
node_id; };
  */
 #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
 
-#ifdef CONFIG_ZONE_DEVICE
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return zone_idx(zone) == ZONE_DEVICE;
-}
-#else
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return false;
-}
-#endif
-
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13816c5a51eb..2a5c5cbfb5fc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5864,7 +5864,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
unsigned long start = jiffies;
int nid = pgdat->node_id;
 
-   if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+   if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
return;
 
/*

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 03/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-05-06 Thread Dan Williams
Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
sub-section active bitmask, each bit representing a PMD_SIZE span of the
architecture's memory hotplug section size.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and read the sub-section
active ranges from the bitmask. The expectation is that the bitmask
(subsection_map) fits in the same cacheline as the valid_section() data,
so the incremental performance overhead to pfn_valid() should be
negligible.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Tested-by: Jane Chu 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   29 -
 mm/page_alloc.c|4 +++-
 mm/sparse.c|   29 +
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ac163f2f274f..6dd52d544857 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1199,6 +1199,8 @@ struct mem_section_usage {
unsigned long pageblock_flags[0];
 };
 
+void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1336,12 +1338,36 @@ static inline struct mem_section 
*__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
+static inline int subsection_map_index(unsigned long pfn)
+{
+   return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION;
+}
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   int idx = subsection_map_index(pfn);
+
+   return test_bit(idx, ms->usage->subsection_map);
+}
+#else
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   return 1;
+}
+#endif
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
+   struct mem_section *ms;
+
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
-   return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+   ms = __nr_to_section(pfn_to_section_nr(pfn));
+   if (!valid_section(ms))
+   return 0;
+   return pfn_section_valid(ms, pfn);
 }
 #endif
 
@@ -1373,6 +1399,7 @@ void sparse_init(void);
 #define sparse_init()  do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #define pfn_present pfn_valid
+#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 61c2b54a5b61..13816c5a51eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7291,10 +7291,12 @@ void __init free_area_init_nodes(unsigned long 
*max_zone_pfn)
 
/* Print out the early node map */
pr_info("Early memory node ranges\n");
-   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, )
+   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, ) {
pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
(u64)start_pfn << PAGE_SHIFT,
((u64)end_pfn << PAGE_SHIFT) - 1);
+   subsection_map_init(start_pfn, end_pfn - start_pfn);
+   }
 
/* Initialise every node */
mminit_verify_pageflags_layout();
diff --git a/mm/sparse.c b/mm/sparse.c
index f87de7ad32c8..ac47a48050c7 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -210,6 +210,35 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
 }
 
+void subsection_map_init(unsigned long pfn, unsigned long nr_pages)
+{
+   int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+   int i, start_sec = pfn_to_section_nr(pfn);
+
+   if (!nr_pages)
+   return;
+
+   for (i = start_sec; i <= end_sec; i++) {
+   int idx, end;
+   unsigned long pfns;
+   struct mem_section *ms;
+
+   idx = subsection_map_index(pfn);
+   pfns = min(nr_pages, PAGES_PER_SECTION
+   - (pfn & ~PAGE_SECTION_MASK));
+   end = subsection_map_index(pfn + pfns - 1);
+
+   ms = __nr_to_section(i);
+   bitmap_set(ms->usage->subsection_map, idx, end - idx + 1);
+
+   pr_debug("%s: sec: %d pfns: %ld set(%d, %d)\n", __func__, i,
+   pfns, idx, end - idx + 1);
+
+   pfn += pfns;
+   nr_pages -= pfns;
+   }
+}
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 00/12] mm: Sub-section memory hotplug support

2019-05-06 Thread Dan Williams
Changes since v7 [1]:

- Make subsection helpers pfn based rather than physical-address based
  (Oscar and Pavel)

- Make subsection bitmap definition scalable for different section and
  sub-section sizes across architectures. As a result:

  unsigned long map_active

  ...is converted to:

  DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION)

  ...and the helpers are renamed with a 'subsection' prefix. (Pavel)

- New in this version is a touch of arch/powerpc/include/asm/sparsemem.h
  in "[PATCH v8 01/12] mm/sparsemem: Introduce struct mem_section_usage"
  to define ARCH_SUBSECTION_SHIFT.

- Drop "mm/sparsemem: Introduce common definitions for the size and mask
  of a section" in favor of Robin's "mm/memremap: Rename and consolidate
  SECTION_SIZE" (Pavel)

- Collect some more Reviewed-by tags. Patches that still lack review
  tags: 1, 3, 9 - 12

[1]: 
https://lore.kernel.org/lkml/155677652226.2336373.8700273400832001094.st...@dwillia2-desk3.amr.corp.intel.com/

---
[merge logistics]

Hi Andrew,

These are too late for v5.2, I'm posting this v8 during the merge window
to maintain the review momentum. 

---
[cover letter]

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
  unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check

- __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.

- Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem
ranges with other pmem ranges by default [3]. In short,
devm_memremap_pages() has pushed the venerable section-size constraint
past the breaking point, and the simplicity of section-aligned

[PATCH v8 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages()

2019-05-06 Thread Dan Williams
The zone type check was a leftover from the cleanup that plumbed altmap
through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
vmem_altmap to arch_remove_memory and __remove_pages".

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Cc: Pavel Tatashin 
Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 mm/memory_hotplug.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 393ab2b9c3f7..cb9e68729ea3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -544,11 +544,8 @@ void __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
unsigned long map_offset = 0;
int sections_to_remove;
 
-   /* In the ZONE_DEVICE case device driver owns the memory region */
-   if (is_dev_zone(zone)) {
-   if (altmap)
-   map_offset = vmem_altmap_offset(altmap);
-   }
+   if (altmap)
+   map_offset = vmem_altmap_offset(altmap);
 
clear_zone_contiguous(zone);
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges

2019-05-06 Thread Dan Williams
Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames. No
intended functional changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 include/linux/memory_hotplug.h |7 +-
 mm/memory_hotplug.c|  118 +---
 mm/sparse.c|7 +-
 3 files changed, 83 insertions(+), 49 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..835a94650ee3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,9 +354,10 @@ extern int add_memory_resource(int nid, struct resource 
*resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern int sparse_add_one_section(int nid, unsigned long start_pfn,
- struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms,
+extern int sparse_add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+   unsigned long pfn, unsigned long nr_pages,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cb9e68729ea3..41b544f63816 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,22 +251,44 @@ void __init register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-   struct vmem_altmap *altmap, bool want_memblock)
+static int __meminit __add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
int ret;
 
-   if (pfn_valid(phys_start_pfn))
+   if (pfn_valid(pfn))
return -EEXIST;
 
-   ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
+   ret = sparse_add_section(nid, pfn, nr_pages, altmap);
if (ret < 0)
return ret;
 
if (!want_memblock)
return 0;
 
-   return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+   return hotplug_memory_register(nid, __pfn_to_section(pfn));
+}
+
+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+   unsigned long flags, const char *reason)
+{
+   /*
+* Only allow partial section hotplug for !memblock ranges,
+* since register_new_memory() requires section alignment, and
+* CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+* populated.
+*/
+   if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+   || (flags & MHP_MEMBLOCK_API))
+   && ((pfn & ~PAGE_SECTION_MASK)
+   || (nr_pages & ~PAGE_SECTION_MASK))) {
+   WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+   (flags & MHP_MEMBLOCK_API)
+   ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+   return -EINVAL;
+   }
+   return 0;
 }
 
 /*
@@ -275,34 +297,40 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
  * call this function after deciding the zone to which to
  * add the new pages.
  */
-int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-   unsigned long nr_pages, struct mhp_restrictions *restrictions)
+int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
+   struct mhp_restrictions *restrictions)
 {
unsigned long i;
-   int err = 0;
-   int start_sec, end_sec;
+   int start_sec, end_sec, err;
struct vmem_altmap *altmap = restrictions->altmap;
 
-   /* during initialize mem_map, align hot-added range to section */
-   start_sec = pfn_to_section_nr(phys_start_pfn);
-   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
 */
-   if (altmap->base_pfn != phys_start_pfn
+   if (altmap->base_pfn != pfn
|| 

[PATCH v8 01/12] mm/sparsemem: Introduce struct mem_section_usage

2019-05-06 Thread Dan Williams
Towards enabling memory hotplug to track partial population of a
section, introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
house a new 'subsection_map' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

The default SUBSECTION_SHIFT is chosen to keep the 'subsection_map' no
larger than a single 'unsigned long' on the major architectures.
Alternatively an architecture can define ARCH_SUBSECTION_SHIFT to
override the default PMD_SHIFT. Note that PowerPC needs to use
ARCH_SUBSECTION_SHIFT to workaround PMD_SHIFT being a non-constant
expression on PowerPC.

The primary motivation for this functionality is to support platforms
that mix "System RAM" and "Persistent Memory" within a single section,
or multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Signed-off-by: Dan Williams 
---
 arch/powerpc/include/asm/sparsemem.h |3 +
 include/linux/mmzone.h   |   48 +++-
 mm/memory_hotplug.c  |   18 
 mm/page_alloc.c  |2 -
 mm/sparse.c  |   81 +-
 5 files changed, 99 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index 3192d454a733..1aa3c9303bf8 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -10,6 +10,9 @@
  */
 #define SECTION_SIZE_BITS   24
 
+/* Reflect the largest possible PMD-size as the subsection-size constant */
+#define ARCH_SUBSECTION_SHIFT 24
+
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..ef8d878079f9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1160,6 +1160,44 @@ static inline unsigned long section_nr_to_pfn(unsigned 
long sec)
 #define SECTION_ALIGN_UP(pfn)  (((pfn) + PAGES_PER_SECTION - 1) & 
PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK)
 
+/*
+ * SUBSECTION_SHIFT must be constant since it is used to declare
+ * subsection_map and related bitmaps without triggering the generation
+ * of variable-length arrays. The most natural size for a subsection is
+ * a PMD-page. For architectures that do not have a constant PMD-size
+ * ARCH_SUBSECTION_SHIFT can be set to a constant max size, or otherwise
+ * fallback to 2MB.
+ */
+#if defined(ARCH_SUBSECTION_SHIFT)
+#define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT)
+#elif defined(PMD_SHIFT)
+#define SUBSECTION_SHIFT (PMD_SHIFT)
+#else
+/*
+ * Memory hotplug enabled platforms avoid this default because they
+ * either define ARCH_SUBSECTION_SHIFT, or PMD_SHIFT is a constant, but
+ * this is kept as a backstop to allow compilation on
+ * !ARCH_ENABLE_MEMORY_HOTPLUG archs.
+ */
+#define SUBSECTION_SHIFT 21
+#endif
+
+#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT)
+#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT)
+#define PAGE_SUBSECTION_MASK ((~(PAGES_PER_SUBSECTION-1)))
+
+#if SUBSECTION_SHIFT > SECTION_SIZE_BITS
+#error Subsection size exceeds section size
+#else
+#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT))
+#endif
+
+struct mem_section_usage {
+   DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
+   /* See declaration of similar field in struct zone */
+   unsigned long pageblock_flags[0];
+};
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1177,8 +1215,7 @@ struct mem_section {
 */
unsigned long section_mem_map;
 
-   /* See declaration of similar field in struct zone */
-   unsigned long *pageblock_flags;
+   struct mem_section_usage *usage;
 #ifdef CONFIG_PAGE_EXTENSION
/*
 * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
@@ -1209,6 +1246,11 @@ extern struct mem_section **mem_section;
 extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
 #endif
 
+static inline unsigned long *section_to_usemap(struct mem_section *ms)
+{
+   return ms->usage->pageblock_flags;
+}
+
 static inline struct mem_section *__nr_to_section(unsigned long nr)
 {
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1220,7 +1262,7 @@ static inline struct mem_section 

[PATCH v8 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal

2019-05-06 Thread Dan Williams
Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 mm/memory_hotplug.c |   29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a76fc6a6e9fe..393ab2b9c3f7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -325,12 +325,8 @@ static unsigned long find_smallest_section_pfn(int nid, 
struct zone *zone,
 unsigned long start_pfn,
 unsigned long end_pfn)
 {
-   struct mem_section *ms;
-
-   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(start_pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(start_pfn)))
continue;
 
if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -350,15 +346,12 @@ static unsigned long find_biggest_section_pfn(int nid, 
struct zone *zone,
unsigned long start_pfn,
unsigned long end_pfn)
 {
-   struct mem_section *ms;
unsigned long pfn;
 
/* pfn is the end pfn of a memory section. */
pfn = end_pfn - 1;
-   for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (unlikely(pfn_to_nid(pfn) != nid))
@@ -380,7 +373,6 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
unsigned long zone_end_pfn = z;
unsigned long pfn;
-   struct mem_section *ms;
int nid = zone_to_nid(zone);
 
zone_span_writelock(zone);
@@ -417,10 +409,8 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
 * it check the zone has only hole or not.
 */
pfn = zone_start_pfn;
-   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (page_zone(pfn_to_page(pfn)) != zone)
@@ -448,7 +438,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace 
clash */
unsigned long pgdat_end_pfn = p;
unsigned long pfn;
-   struct mem_section *ms;
int nid = pgdat->node_id;
 
if (pgdat_start_pfn == start_pfn) {
@@ -485,10 +474,8 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 * has only hole or not.
 */
pfn = pgdat_start_pfn;
-   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (pfn_to_nid(pfn) != nid)

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 02/12] mm/memremap: Rename and consolidate SECTION_SIZE

2019-05-06 Thread Dan Williams
From: Robin Murphy 

Trying to activate ZONE_DEVICE for arm64 reveals that memremap's
internal helpers for sparsemem sections conflict with and arm64's
definitions for hugepages, which inherit the name of "sections" from
earlier versions of the ARM architecture.

Disambiguate memremap (and now HMM too) by propagating sparsemem's PA_
prefix, to clarify that these values are in terms of addresses rather
than PFNs (and because it's a heck of a lot easier than changing all the
arch code). SECTION_MASK is unused, so it can just go.

[anshuman: Consolidated mm/hmm.c instance and updated the commit message]

Acked-by: Michal Hocko 
Reviewed-by: David Hildenbrand 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Robin Murphy 
Signed-off-by: Anshuman Khandual 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |1 +
 kernel/memremap.c  |   10 --
 mm/hmm.c   |2 --
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ef8d878079f9..ac163f2f274f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1134,6 +1134,7 @@ static inline unsigned long early_pfn_to_nid(unsigned 
long pfn)
  * PFN_SECTION_SHIFT   pfn to/from section number
  */
 #define PA_SECTION_SHIFT   (SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE(1UL << PA_SECTION_SHIFT)
 #define PFN_SECTION_SHIFT  (SECTION_SIZE_BITS - PAGE_SHIFT)
 
 #define NR_MEM_SECTIONS(1UL << SECTIONS_SHIFT)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4e59d29245f4..f355586ea54a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -14,8 +14,6 @@
 #include 
 
 static DEFINE_XARRAY(pgmap_array);
-#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
-#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
@@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data)
put_page(pfn_to_page(pfn));
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
 
nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
@@ -160,8 +158,8 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
if (!pgmap->ref || !pgmap->kill)
return ERR_PTR(-EINVAL);
 
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
align_end = align_start + align_size - 1;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 0db8491090b8..a7e7f8e33c5f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -34,8 +34,6 @@
 #include 
 #include 
 
-#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-
 #if IS_ENABLED(CONFIG_HMM_MIRROR)
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v8 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()

2019-05-06 Thread Dan Williams
Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 arch/x86/mm/init_64.c |4 ++-
 include/linux/mm.h|4 ++-
 mm/sparse-vmemmap.c   |   21 +++--
 mm/sparse.c   |   61 +++--
 4 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 20d14254b686..bb018d09d2dc 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1457,7 +1457,9 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
 {
int err;
 
-   if (boot_cpu_has(X86_FEATURE_PSE))
+   if (end - start < PAGES_PER_SECTION * sizeof(struct page))
+   err = vmemmap_populate_basepages(start, end, node);
+   else if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..5360a0e4051d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2748,8 +2748,8 @@ const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
 void *sparse_buffer_alloc(unsigned long size);
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap);
+struct page * __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 7fec05796796..200aef686722 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long 
start,
return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page * __meminit __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long start;
unsigned long end;
-   struct page *map;
 
-   map = pfn_to_page(pnum * PAGES_PER_SECTION);
-   start = (unsigned long)map;
-   end = (unsigned long)(map + PAGES_PER_SECTION);
+   /*
+* The minimum granularity of memmap extensions is
+* PAGES_PER_SUBSECTION as allocations are tracked in the
+* 'subsection_map' bitmap of the section.
+*/
+   end = ALIGN(pfn + nr_pages, PAGES_PER_SUBSECTION);
+   pfn &= PAGE_SUBSECTION_MASK;
+   nr_pages = end - pfn;
+
+   start = (unsigned long) pfn_to_page(pfn);
+   end = start + nr_pages * sizeof(struct page);
 
if (vmemmap_populate(start, end, nid, altmap))
return NULL;
 
-   return map;
+   return pfn_to_page(pfn);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index ac47a48050c7..d613f108cf34 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -433,8 +433,8 @@ static unsigned long __init section_map_size(void)
return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
 
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page __init *__populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long size = section_map_size();
struct page *map = sparse_buffer_alloc(size);
@@ -515,10 +515,13 @@ static void __init sparse_init_nid(int nid, unsigned long 
pnum_begin,
}
sparse_buffer_init(map_count * section_map_size(), nid);
for_each_present_section_nr(pnum_begin, pnum) {
+   unsigned long pfn = section_nr_to_pfn(pnum);
+
if (pnum >= pnum_end)
break;
 
-   map = sparse_mem_map_populate(pnum, nid, NULL);
+   map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
+   nid, NULL);
if (!map) {
pr_err("%s: node[%d] memory map backing failed. Some 
memory will not be available.",
   __func__, nid);
@@ -618,17 +621,17 @@ void offline_mem_sections(unsigned long start_pfn, 
unsigned 

Re: [ndctl PATCH 0/8] daxctl: add a new reconfigure-device command

2019-05-06 Thread Verma, Vishal L
On Mon, 2019-05-06 at 14:50 -0700, Dave Hansen wrote:
> This all looks quite nice to me.  Thanks, Vishal!
> 
> One minor nit: for those of us new to daxctl and friends, they can be
> a
> bit hard to get started with.  Could you maybe add a few example
> invocations to the Documentation, or even this cover letter to help us
> newbies get started?

Yes, good idea, I'll add an examples section to the Documentation page
(other commands do this, and this should too), and add those to the
cover letter as well for v2.

Thanks!
-Vishal
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH 0/8] daxctl: add a new reconfigure-device command

2019-05-06 Thread Dave Hansen
This all looks quite nice to me.  Thanks, Vishal!

One minor nit: for those of us new to daxctl and friends, they can be a
bit hard to get started with.  Could you maybe add a few example
invocations to the Documentation, or even this cover letter to help us
newbies get started?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-06 Thread Brendan Higgins
> On Sun, May 5, 2019 at 5:19 PM Frank Rowand  wrote:
> > You can see the full version 14 document in the submitter's repo:
> >
> >   $ git clone https://github.com/isaacs/testanything.github.io.git
> >   $ cd testanything.github.io
> >   $ git checkout tap14
> >   $ ls tap-version-14-specification.md
> >
> > My understanding is the the version 14 specification is not trying to
> > add new features, but instead capture what is already implemented in
> > the wild.
>
> Oh! I didn't know about the work on TAP 14. I'll go read through this.
>
> > > ## Here is what I propose for this patchset:
> > >
> > >  - Print out test number range at the beginning of each test suite.
> > >  - Print out log lines as soon as they happen as diagnostics.
> > >  - Print out the lines that state whether a test passes or fails as a
> > > ok/not ok line.
> > >
> > > This would be technically conforming with TAP13 and is consistent with
> > > what some kselftests have done.
>
> This is what I fixed kselftest to actually do (it wasn't doing correct
> TAP13), and Shuah is testing the series now:
> https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/log/?h=ksft-tap-refactor

Oh, cool! I guess this is an okay approach then.

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-06 Thread Brendan Higgins
> On 5/3/19 4:14 PM, Brendan Higgins wrote:
> >> On 5/2/19 10:36 PM, Brendan Higgins wrote:
> >>> On Thu, May 2, 2019 at 6:45 PM Frank Rowand  
> >>> wrote:
> 
>  On 5/2/19 4:45 PM, Brendan Higgins wrote:
> > On Thu, May 2, 2019 at 2:16 PM Frank Rowand  
> > wrote:
> >>
> >> On 5/2/19 11:07 AM, Brendan Higgins wrote:
> >>> On Thu, May 2, 2019 at 4:02 AM Greg KH  
> >>> wrote:
> 
>  On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> > From: Felix Guo 
> >
> > The ultimate goal is to create minimal isolated test binaries; in 
> > the
> > meantime we are using UML to provide the infrastructure to run 
> > tests, so
> > define an abstract way to configure and run tests that allow us to
> > change the context in which tests are built without affecting the 
> > user.
> > This also makes pretty and dynamic error reporting, and a lot of 
> > other
> > nice features easier.
> >
> > kunit_config.py:
> >   - parse .config and Kconfig files.
> >
> > kunit_kernel.py: provides helper functions to:
> >   - configure the kernel using kunitconfig.
> >   - build the kernel with the appropriate configuration.
> >   - provide function to invoke the kernel and stream the output 
> > back.
> >
> > Signed-off-by: Felix Guo 
> > Signed-off-by: Brendan Higgins 
> 
>  Ah, here's probably my answer to my previous logging format question,
>  right?  What's the chance that these wrappers output stuff in a 
>  standard
>  format that test-framework-tools can already parse?  :)
> >
> > To be clear, the test-framework-tools format we are talking about is
> > TAP13[1], correct?
> 
>  I'm not sure what the test community prefers for a format.  I'll let them
>  jump in and debate that question.
> 
> 
> >
> > My understanding is that is what kselftest is being converted to use.
> >
> >>>
> >>> It should be pretty easy to do. I had some patches that pack up the
> >>> results into a serialized format for a presubmit service; it should be
> >>> pretty straightforward to take the same logic and just change the
> >>> output format.
> >>
> >> When examining and trying out the previous versions of the patch I 
> >> found
> >> the wrappers useful to provide information about how to control and use
> >> the tests, but I had no interest in using the scripts as they do not
> >> fit in with my personal environment and workflow.
> >>
> >> In the previous versions of the patch, these helper scripts are 
> >> optional,
> >> which is good for my use case.  If the helper scripts are required to
> >
> > They are still optional.
> >
> >> get the data into the proper format then the scripts are not quite so
> >> optional, they become the expected environment.  I think the proper
> >> format should exist without the helper scripts.
> >
> > That's a good point. A couple things,
> >
> > First off, supporting TAP13, either in the kernel or the wrapper
> > script is not hard, but I don't think that is the real issue that you
> > raise.
> >
> > If your only concern is that you will always be able to have human
> > readable KUnit results printed to the kernel log, that is a guarantee
> > I feel comfortable making. Beyond that, I think it is going to take a
> > long while before I would feel comfortable guaranteeing anything about
> > how will KUnit work, what kind of data it will want to expose, and how
> > it will be organized. I think the wrapper script provides a nice
> > facade that I can maintain, can mediate between the implementation
> > details and the user, and can mediate between the implementation
> > details and other pieces of software that might want to consume
> > results.
> >
> > [1] https://testanything.org/tap-version-13-specification.html
> 
>  My concern is based on a focus on my little part of the world
>  (which in _previous_ versions of the patch series was the devicetree
>  unittest.c tests being converted to use the kunit infrastructure).
>  If I step back and think of the entire kernel globally I may end
>  up with a different conclusion - but I'm going to remain myopic
>  for this email.
> 
>  I want the test results to be usable by me and my fellow
>  developers.  I prefer that the test results be easily accessible
>  (current printk() implementation means that kunit messages are
>  just as accessible as the current unittest.c printk() output).
>  If the printk() output needs to be filtered through a script
>  to generate the actual test results then that is sub-optimal
>  to me.  It is one 

Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Pavel Tatashin
On Mon, May 6, 2019 at 1:57 PM Dave Hansen  wrote:
>
> > -static inline void remove_memory(int nid, u64 start, u64 size) {}
> > +static inline bool remove_memory(int nid, u64 start, u64 size)
> > +{
> > + return -EBUSY;
> > +}
>
> This seems like an appropriate place for a WARN_ONCE(), if someone
> manages to call remove_memory() with hotplug disabled.
>
> BTW, I looked and can't think of a better errno, but -EBUSY probably
> isn't the best error code, right?

Same here, I looked and did not find any better then -EBUSY. Also, it
is close to check_cpu_on_node() in the same file.

>
> > -void remove_memory(int nid, u64 start, u64 size)
> > +/**
> > + * remove_memory
> > + * @nid: the node ID
> > + * @start: physical address of the region to remove
> > + * @size: size of the region to remove
> > + *
> > + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > + * and online/offline operations before this call, as required by
> > + * try_offline_node().
> > + */
> > +void __remove_memory(int nid, u64 start, u64 size)
> >  {
> > +
> > + /*
> > +  * trigger BUG() is some memory is not offlined prior to calling this
> > +  * function
> > +  */
> > + if (try_remove_memory(nid, start, size))
> > + BUG();
> > +}
>
> Could we call this remove_offline_memory()?  That way, it makes _some_
> sense why we would BUG() if the memory isn't offline.

Sure, I will rename this function.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Pavel Tatashin
On Mon, May 6, 2019 at 2:04 PM Dave Hansen  wrote:
>
> On 5/6/19 11:01 AM, Dan Williams wrote:
> >>> +void __remove_memory(int nid, u64 start, u64 size)
> >>>  {
> >>> +
> >>> + /*
> >>> +  * trigger BUG() is some memory is not offlined prior to calling 
> >>> this
> >>> +  * function
> >>> +  */
> >>> + if (try_remove_memory(nid, start, size))
> >>> + BUG();
> >>> +}
> >> Could we call this remove_offline_memory()?  That way, it makes _some_
> >> sense why we would BUG() if the memory isn't offline.
> > Please WARN() instead of BUG() because failing to remove memory should
> > not be system fatal.
>
> That is my preference as well.  But, the existing code BUG()s, so I'm
> OK-ish with this staying for the moment until we have a better handle on
> what all the callers do if this fails.

Yes, this is the reason why I BUG() here. The current code does this,
and I was not sure what would happen if we simply continue executing.
Of course, I would prefer to return failure, so the callers can act
appropriately, but let's make one thing at a time, this should not be
part of this series.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Dan Williams
On Mon, May 6, 2019 at 10:57 AM Dave Hansen  wrote:
>
> > -static inline void remove_memory(int nid, u64 start, u64 size) {}
> > +static inline bool remove_memory(int nid, u64 start, u64 size)
> > +{
> > + return -EBUSY;
> > +}
>
> This seems like an appropriate place for a WARN_ONCE(), if someone
> manages to call remove_memory() with hotplug disabled.
>
> BTW, I looked and can't think of a better errno, but -EBUSY probably
> isn't the best error code, right?
>
> > -void remove_memory(int nid, u64 start, u64 size)
> > +/**
> > + * remove_memory
> > + * @nid: the node ID
> > + * @start: physical address of the region to remove
> > + * @size: size of the region to remove
> > + *
> > + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > + * and online/offline operations before this call, as required by
> > + * try_offline_node().
> > + */
> > +void __remove_memory(int nid, u64 start, u64 size)
> >  {
> > +
> > + /*
> > +  * trigger BUG() is some memory is not offlined prior to calling this
> > +  * function
> > +  */
> > + if (try_remove_memory(nid, start, size))
> > + BUG();
> > +}
>
> Could we call this remove_offline_memory()?  That way, it makes _some_
> sense why we would BUG() if the memory isn't offline.

Please WARN() instead of BUG() because failing to remove memory should
not be system fatal.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Dave Hansen
On 5/6/19 11:01 AM, Dan Williams wrote:
>>> +void __remove_memory(int nid, u64 start, u64 size)
>>>  {
>>> +
>>> + /*
>>> +  * trigger BUG() is some memory is not offlined prior to calling this
>>> +  * function
>>> +  */
>>> + if (try_remove_memory(nid, start, size))
>>> + BUG();
>>> +}
>> Could we call this remove_offline_memory()?  That way, it makes _some_
>> sense why we would BUG() if the memory isn't offline.
> Please WARN() instead of BUG() because failing to remove memory should
> not be system fatal.

That is my preference as well.  But, the existing code BUG()s, so I'm
OK-ish with this staying for the moment until we have a better handle on
what all the callers do if this fails.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Dave Hansen
> -static inline void remove_memory(int nid, u64 start, u64 size) {}
> +static inline bool remove_memory(int nid, u64 start, u64 size)
> +{
> + return -EBUSY;
> +}

This seems like an appropriate place for a WARN_ONCE(), if someone
manages to call remove_memory() with hotplug disabled.

BTW, I looked and can't think of a better errno, but -EBUSY probably
isn't the best error code, right?

> -void remove_memory(int nid, u64 start, u64 size)
> +/**
> + * remove_memory
> + * @nid: the node ID
> + * @start: physical address of the region to remove
> + * @size: size of the region to remove
> + *
> + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> + * and online/offline operations before this call, as required by
> + * try_offline_node().
> + */
> +void __remove_memory(int nid, u64 start, u64 size)
>  {
> +
> + /*
> +  * trigger BUG() is some memory is not offlined prior to calling this
> +  * function
> +  */
> + if (try_remove_memory(nid, start, size))
> + BUG();
> +}

Could we call this remove_offline_memory()?  That way, it makes _some_
sense why we would BUG() if the memory isn't offline.

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-06 Thread Kees Cook
On Sun, May 5, 2019 at 5:19 PM Frank Rowand  wrote:
> You can see the full version 14 document in the submitter's repo:
>
>   $ git clone https://github.com/isaacs/testanything.github.io.git
>   $ cd testanything.github.io
>   $ git checkout tap14
>   $ ls tap-version-14-specification.md
>
> My understanding is the the version 14 specification is not trying to
> add new features, but instead capture what is already implemented in
> the wild.

Oh! I didn't know about the work on TAP 14. I'll go read through this.

> > ## Here is what I propose for this patchset:
> >
> >  - Print out test number range at the beginning of each test suite.
> >  - Print out log lines as soon as they happen as diagnostics.
> >  - Print out the lines that state whether a test passes or fails as a
> > ok/not ok line.
> >
> > This would be technically conforming with TAP13 and is consistent with
> > what some kselftests have done.

This is what I fixed kselftest to actually do (it wasn't doing correct
TAP13), and Shuah is testing the series now:
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/log/?h=ksft-tap-refactor

I'll go read TAP 14 now...

-- 
Kees Cook
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


李总:如何业绩倍增,业务拓展不再难

2019-05-06 Thread 李总



 原邮件信息 -
发件人:李总
收件人:linux-nvdimm 
发送时间:2019-5-6  19:16:29
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 15/17] MAINTAINERS: add entry for KUnit the unit testing framework

2019-05-06 Thread Brendan Higgins
On Fri, May 3, 2019 at 7:38 AM shuah  wrote:
>
> On 5/1/19 5:01 PM, Brendan Higgins wrote:
> > Add myself as maintainer of KUnit, the Linux kernel's unit testing
> > framework.
> >
> > Signed-off-by: Brendan Higgins 
> > ---
> >   MAINTAINERS | 10 ++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 5c38f21aee787..c78ae95c56b80 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8448,6 +8448,16 @@ S: Maintained
> >   F:  tools/testing/selftests/
> >   F:  Documentation/dev-tools/kselftest*
> >
> > +KERNEL UNIT TESTING FRAMEWORK (KUnit)
> > +M:   Brendan Higgins 
> > +L:   kunit-...@googlegroups.com
> > +W:   https://google.github.io/kunit-docs/third_party/kernel/docs/
> > +S:   Maintained
> > +F:   Documentation/kunit/
> > +F:   include/kunit/
> > +F:   kunit/
> > +F:   tools/testing/kunit/
> > +
>
> Please add kselftest mailing list to this entry, based on our
> conversation on taking these patches through kselftest tree.

Will do.

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 11/17] kunit: test: add test managed resource tests

2019-05-06 Thread Brendan Higgins
On Fri, May 3, 2019 at 7:34 AM shuah  wrote:
>
> On 5/1/19 5:01 PM, Brendan Higgins wrote:
> > From: Avinash Kondareddy 
> >
> > Tests how tests interact with test managed resources in their lifetime.
> >
> > Signed-off-by: Avinash Kondareddy 
> > Signed-off-by: Brendan Higgins 
> > ---
>
> I think this change log could use more details. It is vague on what it
> does.

Agreed. Will fix in next revision.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 08/17] kunit: test: add support for test abort

2019-05-06 Thread Brendan Higgins
On Fri, May 3, 2019 at 5:33 AM Logan Gunthorpe  wrote:
>
>
>
> On 2019-05-03 12:48 a.m., Brendan Higgins wrote:
> > On Thu, May 2, 2019 at 8:15 PM Logan Gunthorpe  wrote:
> >> On 2019-05-01 5:01 p.m., Brendan Higgins wrote:
> >>> +/*
> >>> + * struct kunit_try_catch - provides a generic way to run code which 
> >>> might fail.
> >>> + * @context: used to pass user data to the try and catch functions.
> >>> + *
> >>> + * kunit_try_catch provides a generic, architecture independent way to 
> >>> execute
> >>> + * an arbitrary function of type kunit_try_catch_func_t which may bail 
> >>> out by
> >>> + * calling kunit_try_catch_throw(). If kunit_try_catch_throw() is 
> >>> called, @try
> >>> + * is stopped at the site of invocation and @catch is catch is called.
> >>
> >> I found some of the C++ comparisons in this series a bit distasteful but
> >> wasn't going to say anything until I saw the try catch But looking
> >> into the implementation it's just a thread that can exit early which
> >> seems fine to me. Just a poor choice of name I guess...
> >
> > Guilty as charged (I have a long history with C++, sorry). Would you
> > prefer I changed the name? I just figured that try-catch is a commonly
> > understood pattern that describes exactly what I am doing.
>
> It is a commonly understood pattern, but I don't think it's what the
> code is doing. Try-catch cleans up an entire stack and allows each level
> of the stack to apply local cleanup. This implementation simply exits a
> thread and has none of that complexity. To me, it seems like an odd
> abstraction here as it's really just a test runner that can exit early
> (though I haven't seen the follow-up UML implementation).

Yeah, that is closer to what the UML specific version does, but that's
a conversation for another time.

>
> I would prefer to see this cleaned up such that the abstraction matches
> more what's going on but I don't feel that strongly about it so I'll
> leave it up to you to figure out what's best unless other reviewers have
> stronger opinions.

Cool. Let's revisit this with the follow-up patchset.

>
> >>
> >> [snip]
> >>
> >>> +static void __noreturn kunit_abort(struct kunit *test)
> >>> +{
> >>> + kunit_set_death_test(test, true);
> >>> +
> >>> + kunit_try_catch_throw(>try_catch);
> >>> +
> >>> + /*
> >>> +  * Throw could not abort from test.
> >>> +  *
> >>> +  * XXX: we should never reach this line! As kunit_try_catch_throw is
> >>> +  * marked __noreturn.
> >>> +  */
> >>> + WARN_ONCE(true, "Throw could not abort from test!\n");
> >>> +}
> >>> +
> >>>  int kunit_init_test(struct kunit *test, const char *name)
> >>>  {
> >>>   spin_lock_init(>lock);
> >>> @@ -77,6 +103,7 @@ int kunit_init_test(struct kunit *test, const char 
> >>> *name)
> >>>   test->name = name;
> >>>   test->vprintk = kunit_vprintk;
> >>>   test->fail = kunit_fail;
> >>> + test->abort = kunit_abort;
> >>
> >> There are a number of these function pointers which seem to be pointless
> >> to me as you only ever set them to one function. Just call the function
> >> directly. As it is, it is an unnecessary indirection for someone reading
> >> the code. If and when you have multiple implementations of the function
> >> then add the pointer. Don't assume you're going to need it later on and
> >> add all this maintenance burden if you never use it..
> >
> > Ah, yes, Frank (and probably others) previously asked me to remove
> > unnecessary method pointers; I removed all the totally unused ones. As
> > for these, I don't use them in this patchset, but I use them in my
> > patchsets that will follow up this one. These in particular are
> > present so that they can be mocked out for testing.
>
> Adding indirection and function pointers solely for the purpose of
> mocking out while testing doesn't sit well with me and I don't think it
> should be a pattern that's encouraged. Adding extra complexity like this
> to a design to make it unit-testable doesn't seem like something that
> makes sense in kernel code. Especially given that indirect calls are
> more expensive in the age of spectre.

Indirection is a pretty common method to make something mockable or
fakeable. Nevertheless, probably an easier discussion to have once we
have some examples to discuss.

>
> Also, mocking these particular functions seems like it's an artifact of
> how you've designed the try/catch abstraction. If the abstraction was
> more around an abort-able test runner then it doesn't make sense to need
> to mock out the abort/fail functions as you will be testing overly
> generic features of something that don't seem necessary to the
> implementation.
>
> >>
> >> [snip]
> >>
> >>> +void kunit_generic_try_catch_init(struct kunit_try_catch *try_catch)
> >>> +{
> >>> + try_catch->run = kunit_generic_run_try_catch;
> >>> + try_catch->throw = kunit_generic_throw;
> >>> +}
> >>
> >> Same here. There's only one implementation of try_catch and