On Thu, Jun 01, 2017 at 12:04:02PM +1000, Balbir Singh wrote:
> On Thu, May 25, 2017 at 3:53 AM, Jerome Glisse <jgli...@redhat.com> wrote:
> > On Wed, May 24, 2017 at 11:55:12AM +1000, Balbir Singh wrote:
> >> On Tue, May 23, 2017 at 2:51 AM, Jérôme Glisse <jgli...@redhat.com> wrote:
> >> > Patchset is on top of mmotm mmotm-2017-05-18, git branch:
> >> >
> >> > https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-v22
> >> >
> >> > Change since v21 is adding back special refcounting in put_page() to
> >> > catch when a ZONE_DEVICE page is free (refcount going from 2 to 1
> >> > unlike regular page where a refcount of 0 means the page is free).
> >> > See patch 8 of this serie for this refcounting. I did not use static
> >> > keys because it kind of scares me to do that for an inline function.
> >> > If people strongly feel about this i can try to make static key works
> >> > here. Kirill will most likely want to review this.
> >> >
> >> >
> >> > Everything else is the same. Below is the long description of what HMM
> >> > is about and why. At the end of this email i describe briefly each patch
> >> > and suggest reviewers for each of them.
> >> >
> >> >
> >> > Heterogeneous Memory Management (HMM) (description and justification)
> >> >
> >>
> >> Thanks for the patches! These patches are very helpful. There are a
> >> few additional things we would need on top of this (once HMM the base
> >> is merged)
> >>
> >> 1. Support for other architectures, we'd like to make sure we can get
> >> this working for powerpc for example. As a first step we have
> >> ZONE_DEVICE enablement patches, but I think we need some additional
> >> patches for iomem space searching and memory hotplug, IIRC
> >> 2. HMM-CDM and physical address based migration bits. In a recent RFC
> >> we decided to try and use the HMM CDM route as a route to implementing
> >> coherent device memory as a starting point. It would be nice to have
> >> those patches on top of these once these make it to mm -
> >> https://lwn.net/Articles/720380/
> >>
> >
> > I intend to post the updated HMM CDM patchset early next week. I am
> > tie in couple internal backport but i should be able to resume work
> > on that this week.
> >
> 
> Thanks, I am looking at the HMM CDM branch and trying to forward port
> and see what the results look like on top of HMM-v23. Do we have a timeline
> for the v23 merge?
> 

So i am moving to new office and it has taken me more time than i thought
to pack stuff. Attach is first step of CDM on top of lastest HMM. I hope
to have more time tomorrow or next week to finish rebasing patches and to
run some test with stolen ram as CDM memory.

Jérôme
>From 0ca0ebe4aecedfe69ae029c529045d609352b921 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jgli...@redhat.com>
Date: Thu, 1 Jun 2017 11:25:59 -0400
Subject: [PATCH] mm/device-public-memory: device memory cache coherent with
 CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Platform with advance system bus (like CAPI or CCIX) allow device
memory to be accessible from CPU in a cache coherent fashion. Add
a new type of ZONE_DEVICE to represent such memory. The use case
are the same as for the un-addressable device memory but without
all the corners cases.

Signed-off-by: Jérôme Glisse <jgli...@redhat.com>
---
 include/linux/ioport.h   |  1 +
 include/linux/memremap.h | 21 +++++++++++++++++++++
 mm/Kconfig               | 13 +++++++++++++
 mm/memory.c              | 13 +++++++++++++
 mm/migrate.c             | 23 ++++++++++++++---------
 5 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 3a4f691..f5cf32e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -131,6 +131,7 @@ enum {
        IORES_DESC_PERSISTENT_MEMORY            = 4,
        IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
        IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
+       IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
 };
 
 /* helpers to define resources */
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 0e0d2e6..b9f460a 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -56,10 +56,18 @@ static inline struct vmem_altmap *to_vmem_altmap(unsigned 
long memmap_start)
  * page must be treated as an opaque object, rather than a "normal" struct 
page.
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.txt.
+ *
+ * MEMORY_DEVICE_PUBLIC:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is use on platform that have an advance system bus (like CAPI or CCIX). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allow to pin such memory so that it can always be evicted.
  */
 enum memory_type {
        MEMORY_DEVICE_PUBLIC = 0,
        MEMORY_DEVICE_PRIVATE,
+       MEMORY_DEVICE_PUBLIC,
 };
 
 /*
@@ -91,6 +99,8 @@ enum memory_type {
  * The page_free() callback is called once the page refcount reaches 1
  * (ZONE_DEVICE pages never reach 0 refcount unless there is a refcount bug.
  * This allows the device driver to implement its own memory management.)
+ *
+ * For MEMORY_DEVICE_CACHE_COHERENT only the page_free() callback matter.
  */
 typedef int (*dev_page_fault_t)(struct vm_area_struct *vma,
                                unsigned long addr,
@@ -133,6 +143,12 @@ static inline bool is_device_private_page(const struct 
page *page)
        return is_zone_device_page(page) &&
                page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
+
+static inline bool is_device_public_page(const struct page *page)
+{
+       return is_zone_device_page(page) &&
+               page->pgmap->type == MEMORY_DEVICE_PUBLIC;
+}
 #else
 static inline void *devm_memremap_pages(struct device *dev,
                struct resource *res, struct percpu_ref *ref,
@@ -156,6 +172,11 @@ static inline bool is_device_private_page(const struct 
page *page)
 {
        return false;
 }
+
+static inline bool is_device_public_page(const struct page *page)
+{
+       return false;
+}
 #endif
 
 /**
diff --git a/mm/Kconfig b/mm/Kconfig
index 46296d5d7..bacb193 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -758,6 +758,19 @@ config DEVICE_PRIVATE
          memory; i.e., memory that is only accessible from the device (or
          group of devices).
 
+config DEVICE_PUBLIC
+       bool "Unaddressable device memory (GPU memory, ...)"
+       depends on X86_64
+       depends on ZONE_DEVICE
+       depends on MEMORY_HOTPLUG
+       depends on MEMORY_HOTREMOVE
+       depends on SPARSEMEM_VMEMMAP
+
+       help
+         Allows creation of struct pages to represent addressable device
+         memory; i.e., memory that is accessible from both the device and
+         the CPU
+
 config FRAME_VECTOR
        bool
 
diff --git a/mm/memory.c b/mm/memory.c
index eba61dd..d192f3d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -983,6 +983,19 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
                get_page(page);
                page_dup_rmap(page, false);
                rss[mm_counter(page)]++;
+       } else if (pte_devmap(pte)) {
+               page = pte_page(pte);
+
+               /*
+                * Cache coherent device memory behave like regular page and
+                * not like persistent memory page. For more informations see
+                * MEMORY_DEVICE_CACHE_COHERENT in memory_hotplug.h
+                */
+               if (is_device_public_page(page)) {
+                       get_page(page);
+                       page_dup_rmap(page, false);
+                       rss[mm_counter(page)]++;
+               }
        }
 
 out_set_pte:
diff --git a/mm/migrate.c b/mm/migrate.c
index d7c4db6..a0115b8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -229,12 +229,16 @@ static bool remove_migration_pte(struct page *page, 
struct vm_area_struct *vma,
                if (is_write_migration_entry(entry))
                        pte = maybe_mkwrite(pte, vma);
 
-               if (unlikely(is_zone_device_page(new)) &&
-                   is_device_private_page(new)) {
-                       entry = make_device_private_entry(new, pte_write(pte));
-                       pte = swp_entry_to_pte(entry);
-                       if (pte_swp_soft_dirty(*pvmw.pte))
-                               pte = pte_mksoft_dirty(pte);
+               if (unlikely(is_zone_device_page(new))) {
+                       if (is_device_private_page(new)) {
+                               entry = make_device_private_entry(new, 
pte_write(pte));
+                               pte = swp_entry_to_pte(entry);
+                               if (pte_swp_soft_dirty(*pvmw.pte))
+                                       pte = pte_mksoft_dirty(pte);
+                       } else if (is_device_public_page(new)) {
+                               pte = pte_mkdevmap(pte);
+                               flush_dcache_page(new);
+                       }
                } else
                        flush_dcache_page(new);
 
@@ -2300,9 +2304,10 @@ static bool migrate_vma_check_page(struct page *page)
 
        /* Page from ZONE_DEVICE have one extra reference */
        if (is_zone_device_page(page)) {
-               if (is_device_private_page(page)) {
+               if (is_device_private_page(page) ||
+                   is_device_public_page)
                        extra++;
-               } else
+               else
                        /* Other ZONE_DEVICE memory type are not supported */
                        return false;
        }
@@ -2621,7 +2626,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
                                        migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
                                        continue;
                                }
-                       } else {
+                       } else if (!is_device_public_page(newpage)) {
                                /*
                                 * Other types of ZONE_DEVICE page are not
                                 * supported.
-- 
2.4.11

Reply via email to