Hi, Aneesh,
"Aneesh Kumar K.V" <[email protected]> writes:
> On architectures that have different page size values used for kernel
> direct mapping and userspace mappings, the user can end up creating zero-sized
> namespaces as shown below
>
> :/sys/bus/nd/devices/region1# cat align
> 0x1000000
> /sys/bus/nd/devices/region1# echo 0x200000 > align
> /sys/bus/nd/devices/region1/dax1.0# cat supported_alignments
> 65536 16777216
> $ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K
> {
> "dev":"namespace1.0",
> "mode":"devdax",
> "map":"dev",
> "size":0,
> "uuid":"3094329a-0c66-4905-847e-357223e56ab0",
> "daxregion":{
> "id":1,
> "size":0,
> "align":65536
> },
> "align":65536
> }
> similarily for fsdax
>
> $ ndctl create-namespace -r region1 -m fsdax -s 18M --align 64K
> {
> "dev":"namespace1.0",
> "mode":"fsdax",
> "map":"dev",
> "size":0,
> "uuid":"45538a6f-dec7-405d-b1da-2a4075e06232",
> "sector_size":512,
> "align":65536,
> "blockdev":"pmem1"
> }
Just curious, but have you seen this in practice? It seems like an odd
thing to do.
> In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce
> memremap_compat_align()")
> memremap_compat_align was added to make sure the kernel always creates
> namespaces with 16MB alignment. But the user can still override the
> region alignment and no input validation is done in the region alignment
> values to retain the flexibility user had before. However, the kernel
> ensures that only part of the namespace that can be mapped via kernel
> direct mapping page size is enabled. This is achieved by tracking the
> unmapped part of the namespace in pfn_sb->end_trunc. The kernel also
> ensures that the start address of the namespace is also aligned to the
> kernel direct mapping page size.
>
> Depending on the user request, the kernel implements userspace mapping
> alignment by updating pfn device alignment attribute and this value is
> used to adjust the start address for userspace mappings. This is tracked
> in pfn_sb->dataoff. Hence the available size for userspace mapping is:
>
> usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff
>
> If the kernel finds the user mapping size zero then don't allow the
> creation of namespace.
>
> After fix:
> $ ndctl create-namespace -f -r region1 -m devdax -s 18M --align 64K
> libndctl: ndctl_dax_enable: dax1.1: failed to enable
> Error: namespace1.2: failed to enable
>
> failed to create namespace: No such device or address
>
> And existing zero sized namespace will be marked disabled.
> root@ltczz75-lp2:/home/kvaneesh# ndctl list -N -r region1 -i
> [
> {
> "dev":"namespace1.0",
> "mode":"raw",
> "size":18874368,
> "uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181",
> "sector_size":512,
> "state":"disabled"
> },
Thank you for providing examples of the command output before and after
the change. I appreciate that.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> drivers/nvdimm/pfn_devs.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index af7d9301520c..36b904a129b9 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -453,7 +453,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
> *sig)
> struct resource *res;
> enum nd_pfn_mode mode;
> struct nd_namespace_io *nsio;
> - unsigned long align, start_pad;
> + unsigned long align, start_pad, end_trunc;
> struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
> struct nd_namespace_common *ndns = nd_pfn->ndns;
> const uuid_t *parent_uuid = nd_dev_to_uuid(&ndns->dev);
> @@ -503,6 +503,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
> *sig)
> align = le32_to_cpu(pfn_sb->align);
> offset = le64_to_cpu(pfn_sb->dataoff);
> start_pad = le32_to_cpu(pfn_sb->start_pad);
> + end_trunc = le32_to_cpu(pfn_sb->end_trunc);
> if (align == 0)
> align = 1UL << ilog2(offset);
> mode = le32_to_cpu(pfn_sb->mode);
> @@ -610,6 +611,10 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
> *sig)
> return -EOPNOTSUPP;
> }
>
> + if (offset >= (res->end - res->start + 1 - start_pad - end_trunc)) {
^^^^^^^^^^^^^^^^^^^^^^^^^ That's what
resource_size(res) does. It might be better to create a local variable
'size' to hold that, as there are now two instances of that in the
function.
> + dev_err(&nd_pfn->dev, "bad offset with small namespace\n");
> + return -EOPNOTSUPP;
> + }
> return 0;
> }
> EXPORT_SYMBOL(nd_pfn_validate);
> @@ -810,7 +815,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> else
> return -ENXIO;
>
> - if (offset >= size) {
> + if (offset >= (size - end_trunc)) {
> + /* This implies we result in zero size devices */
> dev_err(&nd_pfn->dev, "%s unable to satisfy requested
> alignment\n",
> dev_name(&ndns->dev));
> return -ENXIO;
Functionally, this looks good to me.
Cheers,
Jeff