Re: [PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-23 Thread Williams, Dan J
[ adding k...@vger.kernel.org ]

On Wed, 2015-09-23 at 09:02 -0700, Dave Hansen wrote:
> On 09/22/2015 09:42 PM, Dan Williams wrote:
> >  /*
> > + * __pfn_t: encapsulates a page-frame number that is optionally backed
> > + * by memmap (struct page).  Whether a __pfn_t has a 'struct page'
> > + * backing is indicated by flags in the low bits of the value;
> > + */
> > +typedef struct {
> > +   unsigned long val;
> > +} __pfn_t;
> > +
> > +/*
> > + * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
> > + * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
> > + * PFN_DEV - pfn is not covered by system memmap by default
> > + * PFN_MAP - pfn has a dynamic page mapping established by a device driver
> > + */
> > +enum {
> > +   PFN_SHIFT = 4,
> > +   PFN_MASK = (1UL << PFN_SHIFT) - 1,
> > +   PFN_SG_CHAIN = (1UL << 0),
> > +   PFN_SG_LAST = (1UL << 1),
> > +   PFN_DEV = (1UL << 2),
> > +   PFN_MAP = (1UL << 3),
> > +};
> 
> Please forgive a little bikeshedding here...
> 
> Why __pfn_t?  Because the KVM code has a pfn_t?  If so, I think we
> should rescue pfn_t from KVM and give them a kvm_pfn_t.

Sounds good, once 0day has a chance to chew on the conversion I'll send
it out.

> I think you should do one of two things:  Make PFN_SHIFT 12 so that a
> physical addr can be stored in a __pfn_t with no work.  Or, use the
> *high* 12 bits of __pfn_t.val.
> 

pfn to pfn_t conversions are more likely, so the high bits sounds
better.

> If you use the high bits, *and* make it store a plain pfn when all the
> bits are 0, then you get a zero-cost pfn<->__pfn_t conversion which will
> hopefully generate the exact same code which is there today.
> 
> The one disadvantage here is that it makes it more likely that somebody
> that's just setting __pfn_t.val=foo will get things subtly wrong
> somehow, but that it will work most of the time.
> 
> Also, about naming...  PFN_SHIFT is pretty awful name for this.  It
> probably needs to be __PFN_T_SOMETHING.  We don't want folks doing
> craziness like:
> 
>   unsigned long phys_addr = pfn << PFN_SHIFT.
> 
> Which *looks* OK.

Heh, true.  It's now PFN_FLAGS_MASK in the reworked patch...


8<---
Subject: mm, dax, pmem: introduce pfn_t

From: Dan Williams 

In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +--
 drivers/nvdimm/pmem.c |   27 +
 drivers/s390/block/dcssblk.c  |   10 +++---
 fs/block_dev.c|2 +
 fs/dax.c  |   23 ---
 include/linux/blkdev.h|4 +--
 include/linux/mm.h|   65 +
 include/linux/pfn.h   |9 ++
 9 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 24ffab2572e8..ec3b072d20cf 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index f645a71ae827..796112e9174b 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -374,7 +374,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long 

Re: [PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-23 Thread Dave Hansen
On 09/22/2015 09:42 PM, Dan Williams wrote:
>  /*
> + * __pfn_t: encapsulates a page-frame number that is optionally backed
> + * by memmap (struct page).  Whether a __pfn_t has a 'struct page'
> + * backing is indicated by flags in the low bits of the value;
> + */
> +typedef struct {
> + unsigned long val;
> +} __pfn_t;
> +
> +/*
> + * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
> + * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
> + * PFN_DEV - pfn is not covered by system memmap by default
> + * PFN_MAP - pfn has a dynamic page mapping established by a device driver
> + */
> +enum {
> + PFN_SHIFT = 4,
> + PFN_MASK = (1UL << PFN_SHIFT) - 1,
> + PFN_SG_CHAIN = (1UL << 0),
> + PFN_SG_LAST = (1UL << 1),
> + PFN_DEV = (1UL << 2),
> + PFN_MAP = (1UL << 3),
> +};

Please forgive a little bikeshedding here...

Why __pfn_t?  Because the KVM code has a pfn_t?  If so, I think we
should rescue pfn_t from KVM and give them a kvm_pfn_t.

I think you should do one of two things:  Make PFN_SHIFT 12 so that a
physical addr can be stored in a __pfn_t with no work.  Or, use the
*high* 12 bits of __pfn_t.val.

If you use the high bits, *and* make it store a plain pfn when all the
bits are 0, then you get a zero-cost pfn<->__pfn_t conversion which will
hopefully generate the exact same code which is there today.

The one disadvantage here is that it makes it more likely that somebody
that's just setting __pfn_t.val=foo will get things subtly wrong
somehow, but that it will work most of the time.

Also, about naming...  PFN_SHIFT is pretty awful name for this.  It
probably needs to be __PFN_T_SOMETHING.  We don't want folks doing
craziness like:

unsigned long phys_addr = pfn << PFN_SHIFT.

Which *looks* OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-23 Thread Dave Hansen
On 09/22/2015 09:42 PM, Dan Williams wrote:
>  /*
> + * __pfn_t: encapsulates a page-frame number that is optionally backed
> + * by memmap (struct page).  Whether a __pfn_t has a 'struct page'
> + * backing is indicated by flags in the low bits of the value;
> + */
> +typedef struct {
> + unsigned long val;
> +} __pfn_t;
> +
> +/*
> + * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
> + * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
> + * PFN_DEV - pfn is not covered by system memmap by default
> + * PFN_MAP - pfn has a dynamic page mapping established by a device driver
> + */
> +enum {
> + PFN_SHIFT = 4,
> + PFN_MASK = (1UL << PFN_SHIFT) - 1,
> + PFN_SG_CHAIN = (1UL << 0),
> + PFN_SG_LAST = (1UL << 1),
> + PFN_DEV = (1UL << 2),
> + PFN_MAP = (1UL << 3),
> +};

Please forgive a little bikeshedding here...

Why __pfn_t?  Because the KVM code has a pfn_t?  If so, I think we
should rescue pfn_t from KVM and give them a kvm_pfn_t.

I think you should do one of two things:  Make PFN_SHIFT 12 so that a
physical addr can be stored in a __pfn_t with no work.  Or, use the
*high* 12 bits of __pfn_t.val.

If you use the high bits, *and* make it store a plain pfn when all the
bits are 0, then you get a zero-cost pfn<->__pfn_t conversion which will
hopefully generate the exact same code which is there today.

The one disadvantage here is that it makes it more likely that somebody
that's just setting __pfn_t.val=foo will get things subtly wrong
somehow, but that it will work most of the time.

Also, about naming...  PFN_SHIFT is pretty awful name for this.  It
probably needs to be __PFN_T_SOMETHING.  We don't want folks doing
craziness like:

unsigned long phys_addr = pfn << PFN_SHIFT.

Which *looks* OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-23 Thread Williams, Dan J
[ adding k...@vger.kernel.org ]

On Wed, 2015-09-23 at 09:02 -0700, Dave Hansen wrote:
> On 09/22/2015 09:42 PM, Dan Williams wrote:
> >  /*
> > + * __pfn_t: encapsulates a page-frame number that is optionally backed
> > + * by memmap (struct page).  Whether a __pfn_t has a 'struct page'
> > + * backing is indicated by flags in the low bits of the value;
> > + */
> > +typedef struct {
> > +   unsigned long val;
> > +} __pfn_t;
> > +
> > +/*
> > + * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
> > + * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
> > + * PFN_DEV - pfn is not covered by system memmap by default
> > + * PFN_MAP - pfn has a dynamic page mapping established by a device driver
> > + */
> > +enum {
> > +   PFN_SHIFT = 4,
> > +   PFN_MASK = (1UL << PFN_SHIFT) - 1,
> > +   PFN_SG_CHAIN = (1UL << 0),
> > +   PFN_SG_LAST = (1UL << 1),
> > +   PFN_DEV = (1UL << 2),
> > +   PFN_MAP = (1UL << 3),
> > +};
> 
> Please forgive a little bikeshedding here...
> 
> Why __pfn_t?  Because the KVM code has a pfn_t?  If so, I think we
> should rescue pfn_t from KVM and give them a kvm_pfn_t.

Sounds good, once 0day has a chance to chew on the conversion I'll send
it out.

> I think you should do one of two things:  Make PFN_SHIFT 12 so that a
> physical addr can be stored in a __pfn_t with no work.  Or, use the
> *high* 12 bits of __pfn_t.val.
> 

pfn to pfn_t conversions are more likely, so the high bits sounds
better.

> If you use the high bits, *and* make it store a plain pfn when all the
> bits are 0, then you get a zero-cost pfn<->__pfn_t conversion which will
> hopefully generate the exact same code which is there today.
> 
> The one disadvantage here is that it makes it more likely that somebody
> that's just setting __pfn_t.val=foo will get things subtly wrong
> somehow, but that it will work most of the time.
> 
> Also, about naming...  PFN_SHIFT is pretty awful name for this.  It
> probably needs to be __PFN_T_SOMETHING.  We don't want folks doing
> craziness like:
> 
>   unsigned long phys_addr = pfn << PFN_SHIFT.
> 
> Which *looks* OK.

Heh, true.  It's now PFN_FLAGS_MASK in the reworked patch...


8<---
Subject: mm, dax, pmem: introduce pfn_t

From: Dan Williams 

In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +--
 drivers/nvdimm/pmem.c |   27 +
 drivers/s390/block/dcssblk.c  |   10 +++---
 fs/block_dev.c|2 +
 fs/dax.c  |   23 ---
 include/linux/blkdev.h|4 +--
 include/linux/mm.h|   65 +
 include/linux/pfn.h   |9 ++
 9 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 24ffab2572e8..ec3b072d20cf 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index f645a71ae827..796112e9174b 100644
--- a/drivers/block/brd.c
+++ 

[PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-22 Thread Dan Williams
In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +-
 drivers/nvdimm/pmem.c |   27 ---
 drivers/s390/block/dcssblk.c  |   10 ++
 fs/block_dev.c|2 +
 fs/dax.c  |   23 +++--
 include/linux/blkdev.h|4 +-
 include/linux/mm.h|   72 +
 8 files changed, 110 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 24ffab2572e8..35eff52c0a38 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, __pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index f645a71ae827..50e78b1ea26c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -374,7 +374,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-   void __pmem **kaddr, unsigned long *pfn)
+   void __pmem **kaddr, __pfn_t *pfn)
 {
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
@@ -385,7 +385,7 @@ static long brd_direct_access(struct block_device *bdev, 
sector_t sector,
if (!page)
return -ENOSPC;
*kaddr = (void __pmem *)page_address(page);
-   *pfn = page_to_pfn(page);
+   *pfn = page_to_pfn_t(page);
 
return PAGE_SIZE;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3ee02af73ad0..1c670775129b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -39,6 +39,7 @@ struct pmem_device {
phys_addr_t phys_addr;
/* when non-zero this device is hosting a 'pfn' instance */
phys_addr_t data_offset;
+   unsigned long   pfn_flags;
void __pmem *virt_addr;
size_t  size;
 };
@@ -108,25 +109,22 @@ static int pmem_rw_page(struct block_device *bdev, 
sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, unsigned long *pfn)
+ void __pmem **kaddr, __pfn_t *pfn)
 {
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;
-   resource_size_t size;
+   resource_size_t size = pmem->size - offset;
 
-   if (pmem->data_offset) {
+   *kaddr = pmem->virt_addr + offset;
+   *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+
+   if (__pfn_t_has_page(*pfn)) {
/*
 * Limit the direct_access() size to what is covered by
 * the memmap
 */
-   size = (pmem->size - offset) & ~ND_PFN_MASK;
-   } else
-   size = pmem->size - offset;
-
-   /* FIXME convert DAX to comprehend that this mapping has a lifetime */
-   *kaddr = pmem->virt_addr + offset;
-   *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
-
+   size &= ~ND_PFN_MASK;
+   }
return size;
 }
 
@@ -158,9 +156,11 @@ static 

[PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-22 Thread Dan Williams
In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +-
 drivers/nvdimm/pmem.c |   27 ---
 drivers/s390/block/dcssblk.c  |   10 ++
 fs/block_dev.c|2 +
 fs/dax.c  |   23 +++--
 include/linux/blkdev.h|4 +-
 include/linux/mm.h|   72 +
 8 files changed, 110 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 24ffab2572e8..35eff52c0a38 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, __pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index f645a71ae827..50e78b1ea26c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -374,7 +374,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-   void __pmem **kaddr, unsigned long *pfn)
+   void __pmem **kaddr, __pfn_t *pfn)
 {
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
@@ -385,7 +385,7 @@ static long brd_direct_access(struct block_device *bdev, 
sector_t sector,
if (!page)
return -ENOSPC;
*kaddr = (void __pmem *)page_address(page);
-   *pfn = page_to_pfn(page);
+   *pfn = page_to_pfn_t(page);
 
return PAGE_SIZE;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3ee02af73ad0..1c670775129b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -39,6 +39,7 @@ struct pmem_device {
phys_addr_t phys_addr;
/* when non-zero this device is hosting a 'pfn' instance */
phys_addr_t data_offset;
+   unsigned long   pfn_flags;
void __pmem *virt_addr;
size_t  size;
 };
@@ -108,25 +109,22 @@ static int pmem_rw_page(struct block_device *bdev, 
sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, unsigned long *pfn)
+ void __pmem **kaddr, __pfn_t *pfn)
 {
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;
-   resource_size_t size;
+   resource_size_t size = pmem->size - offset;
 
-   if (pmem->data_offset) {
+   *kaddr = pmem->virt_addr + offset;
+   *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+
+   if (__pfn_t_has_page(*pfn)) {
/*
 * Limit the direct_access() size to what is covered by
 * the memmap
 */
-   size = (pmem->size - offset) & ~ND_PFN_MASK;
-   } else
-   size = pmem->size - offset;
-
-   /* FIXME convert DAX to comprehend that this mapping has a lifetime */
-   *kaddr = pmem->virt_addr + offset;
-   *pfn = (pmem->phys_addr + offset) >>