Hi, Aneesh,

"Aneesh Kumar K.V" <[email protected]> writes:

> On architectures that have different page size values used for kernel
> direct mapping and userspace mappings, the user can end up creating zero-sized
> namespaces as shown below
>
> :/sys/bus/nd/devices/region1# cat align
> 0x1000000
> /sys/bus/nd/devices/region1# echo 0x200000 > align
> /sys/bus/nd/devices/region1/dax1.0# cat supported_alignments
> 65536 16777216
>  $ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K
> {
>   "dev":"namespace1.0",
>   "mode":"devdax",
>   "map":"dev",
>   "size":0,
>   "uuid":"3094329a-0c66-4905-847e-357223e56ab0",
>   "daxregion":{
>     "id":1,
>     "size":0,
>     "align":65536
>   },
>   "align":65536
> }
> similarily for fsdax
>
>  $ ndctl create-namespace -r region1 -m fsdax  -s 18M --align 64K
> {
>   "dev":"namespace1.0",
>   "mode":"fsdax",
>   "map":"dev",
>   "size":0,
>   "uuid":"45538a6f-dec7-405d-b1da-2a4075e06232",
>   "sector_size":512,
>   "align":65536,
>   "blockdev":"pmem1"
> }

Just curious, but have you seen this in practice?  It seems like an odd
thing to do.

> In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce 
> memremap_compat_align()")
> memremap_compat_align was added to make sure the kernel always creates
> namespaces with 16MB alignment. But the user can still override the
> region alignment and no input validation is done in the region alignment
> values to retain the flexibility user had before. However, the kernel
> ensures that only part of the namespace that can be mapped via kernel
> direct mapping page size is enabled. This is achieved by tracking the
> unmapped part of the namespace in pfn_sb->end_trunc. The kernel also
> ensures that the start address of the namespace is also aligned to the
> kernel direct mapping page size.
>
> Depending on the user request, the kernel implements userspace mapping
> alignment by updating pfn device alignment attribute and this value is
> used to adjust the start address for userspace mappings. This is tracked
> in pfn_sb->dataoff. Hence the available size for userspace mapping is:
>
> usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff
>
> If the kernel finds the user mapping size zero then don't allow the
> creation of namespace.
>
> After fix:
> $ ndctl create-namespace -f  -r region1 -m devdax  -s 18M --align 64K
> libndctl: ndctl_dax_enable: dax1.1: failed to enable
>   Error: namespace1.2: failed to enable
>
> failed to create namespace: No such device or address
>
> And existing zero sized namespace will be marked disabled.
> root@ltczz75-lp2:/home/kvaneesh# ndctl  list -N -r region1 -i
> [
>   {
>     "dev":"namespace1.0",
>     "mode":"raw",
>     "size":18874368,
>     "uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181",
>     "sector_size":512,
>     "state":"disabled"
>   },

Thank you for providing examples of the command output before and after
the change.  I appreciate that.

>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
>  drivers/nvdimm/pfn_devs.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index af7d9301520c..36b904a129b9 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -453,7 +453,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
> *sig)
>       struct resource *res;
>       enum nd_pfn_mode mode;
>       struct nd_namespace_io *nsio;
> -     unsigned long align, start_pad;
> +     unsigned long align, start_pad, end_trunc;
>       struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
>       struct nd_namespace_common *ndns = nd_pfn->ndns;
>       const uuid_t *parent_uuid = nd_dev_to_uuid(&ndns->dev);
> @@ -503,6 +503,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
> *sig)
>       align = le32_to_cpu(pfn_sb->align);
>       offset = le64_to_cpu(pfn_sb->dataoff);
>       start_pad = le32_to_cpu(pfn_sb->start_pad);
> +     end_trunc = le32_to_cpu(pfn_sb->end_trunc);
>       if (align == 0)
>               align = 1UL << ilog2(offset);
>       mode = le32_to_cpu(pfn_sb->mode);
> @@ -610,6 +611,10 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
> *sig)
>               return -EOPNOTSUPP;
>       }
>  
> +     if (offset >= (res->end - res->start + 1 - start_pad - end_trunc)) {
                       ^^^^^^^^^^^^^^^^^^^^^^^^^ That's what
resource_size(res) does.  It might be better to create a local variable
'size' to hold that, as there are now two instances of that in the
function.

> +             dev_err(&nd_pfn->dev, "bad offset with small namespace\n");
> +             return -EOPNOTSUPP;
> +     }
>       return 0;
>  }
>  EXPORT_SYMBOL(nd_pfn_validate);
> @@ -810,7 +815,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>       else
>               return -ENXIO;
>  
> -     if (offset >= size) {
> +     if (offset >= (size - end_trunc)) {
> +             /* This implies we result in zero size devices */
>               dev_err(&nd_pfn->dev, "%s unable to satisfy requested 
> alignment\n",
>                               dev_name(&ndns->dev));
>               return -ENXIO;

Functionally, this looks good to me.

Cheers,
Jeff


Reply via email to