On 5/18/2021 10:27 AM, Joao Martins wrote:

On 5/5/21 11:36 PM, Joao Martins wrote:
On 5/5/21 11:20 PM, Dan Williams wrote:
On Wed, May 5, 2021 at 12:50 PM Joao Martins <joao.m.mart...@oracle.com> wrote:
On 5/5/21 7:44 PM, Dan Williams wrote:
On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.mart...@oracle.com> wrote:
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index b46f63dcaed3..bb28d82dda5e 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -114,6 +114,7 @@ struct dev_pagemap {
         struct completion done;
         enum memory_type type;
         unsigned int flags;
+       unsigned long align;
I think this wants some kernel-doc above to indicate that non-zero
means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
means "use non-compound base pages".
[...]

The non-zero value must be
PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
Hmm, maybe it should be an
enum:

enum devmap_geometry {
     DEVMAP_PTE,
     DEVMAP_PMD,
     DEVMAP_PUD,
}

I suppose a converter between devmap_geometry and page_size would be needed 
too? And maybe
the whole dax/nvdimm align values change meanwhile (as a followup improvement)?
I think it is ok for dax/nvdimm to continue to maintain their align
value because it should be ok to have 4MB align if the device really
wanted. However, when it goes to map that alignment with
memremap_pages() it can pick a mode. For example, it's already the
case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
they're already separate concepts that can stay separate.

Gotcha.
I am reconsidering part of the above. In general, yes, the meaning of devmap 
@align
represents a slightly different variation of the device @align i.e. how the 
metadata is
laid out **but** regardless of what kind of page table entries we use vmemmap.

By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax 
already
validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE 
and PUD_SIZE)
2) the geometry of metadata is very much tied to the value we pick to @align at 
namespace
provisioning -- not the "align" we might use at mmap() perhaps that's what you 
referred
above? -- and 3) the value of geometry actually derives from dax device @align 
because we
will need to create compound pages representing a page size of @align value.

Using your example above: you're saying that dax->align == 1G is mapped with 
DEVMAP_PTEs,
in reality the vmemmap is populated with PMDs/PUDs page tables (depending on 
what archs
decide to do at vmemmap_populate()) and uses base pages as its metadata 
regardless of what
device @align. In reality what we want to convey in @geometry is not page table 
sizes, but
just the page size used for the vmemmap of the dax device. Additionally, 
limiting its
value might not be desirable... if tomorrow Linux for some arch supports 
dax/nvdimm
devices with 4M align or 64K align, the value of @geometry will have to reflect 
the 4M to
create compound pages of order 10 for the said vmemmap.

I am going to wait until you finish reviewing the remaining four patches of 
this series,
but maybe this is a simple misnomer (s/align/geometry/) with a comment but 
without
DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum 
a
setter/getter to audit its value? Thoughts?

                Joao

Good points there.

My understanding is that  dax->align  conveys granularity of size while carving out a namespace,

it's a geometry attribute loosely akin to sector size of a spindle disk.  I tend to think that

device pagesize  has almost no relation to "align" in that, it's possible to have 1G "align" and

4K pagesize, or verse versa.  That is, with the advent of compound page support, it is possible

to totally separate the two concepts.

How about adding a new option to "ndctl create-namespace" that describes device creator's desired

pagesize, and another parameter to describe whether the pagesize shall be fixed or allowed to be split up,

such that, if the intention is to never split up 2M pagesize, then it would be possible to save a lot metadata

space on the device?

thanks,

-jane
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Reply via email to