Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-30 Thread zhuweixi
Thanks! I am planning to present GMEM in Linux MM Alignment Sessions so I can 
collect more input from the mm developers.

@Christian @Oak I will also send you invitations once a presentation is 
scheduled. :)

-Weixi

-Original Message-
From: David Hildenbrand  
Sent: Thursday, November 30, 2023 10:55 PM
To: zhuweixi ; Dave Airlie ; Christian 
König 
Cc: linux...@kvack.org; linux-ker...@vger.kernel.org; 
a...@linux-foundation.org; weixi@openeuler.sh; mgor...@suse.de; 
jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; 
apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; 
alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; 
felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; 
j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; 
zhi.a.w...@intel.com; intel-gvt-...@lists.freedesktop.org; 
intel-gfx@lists.freedesktop.org; jani.nik...@linux.intel.com; 
joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; 
tvrtko.ursu...@linux.intel.com
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

On 29.11.23 09:27, zhuweixi wrote:
> Glad to hear that more sharable code is desirable.
> IMHO, for a common MM subsystem, it is more beneficial for GMEM to 
> extend core MM instead of building a separate one.

More core-mm complexity, awesome, we all love that! ;)

--
Cheers,

David / dhildenb



Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-30 Thread zhuweixi
From your argument on KVM I can see that the biggest miscommunication between 
us is that you believed that GMEM wanted to share the whole address space. No, 
it is not the case. GMEM is only providing coordination via certain mmap() 
calls. So you are raising a case supporting GMEM again -- passthrough part of 
the CPU addresses space instead of passthrough the whole CPU address space, is 
exactly what GMEM can do. On the other side, the IOMMU SVA feature wildly binds 
the whole address space -- since the hardware feature is to directly share the 
whole CPU page table.

"We really should never ever encourage people to bind their device address 
space to the CPU address space. This is a very special use case and limits the 
driver design to only this use case.
We have exercised this approach to a rather extreme degree with KFD and I can 
clearly say that doing this was a really big mistake.
As far as I can see you are about to repeat that mistake and even encourage 
others to do so as well."

-- The behavior of internally "attach device context to mm_struct" in GMEM is 
ultimately a different approach to coordinate CPU and devices. I want to 
replace MMU notifiers with this approach because I want to protect core MM from 
random interactions with external driver MMs. Both GMEM and MMU notifiers are 
binding device contexts to the CPU context, not putting them in the same 
address space. If someone is against GMEM's approach for binding CPU and device 
context, then someone should be against MMU notifiers as well.

Currently, from our discussion I think I received two messages:
1. The original AMDKFD design was rejected because of inserting 
vendor-specific stuff to the generic core MM.
2. The rejection from #1 led to your opinion that anyone cannot mix 
device and core MM together.

I think #1 really encouraged me that GMEM could help the AMDKFD driver. However 
I am also confused that why GMEM must be compared with a vendor-specific 
driver. AMDKFD was only considering a very special use case: AMD GPUs using AMD 
IOMMU. 
However, GMEM is trying to consider all generalized cases of memory devices. 
The device can be Nvidia's GPU and Huawei's NPU that use their own MMUs, or 
AMD/Intel GPUs that use IOMMUs, or other hundreds of new accelerator vendors.

-Weixi

-Original Message-
From: Christian König  
Sent: Thursday, November 30, 2023 9:05 PM
To: zhuweixi ; Dave Airlie 
Cc: linux...@kvack.org; linux-ker...@vger.kernel.org; 
a...@linux-foundation.org; weixi@openeuler.sh; mgor...@suse.de; 
jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; 
apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; 
alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; 
felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; 
j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; 
zhi.a.w...@intel.com; intel-gvt-...@lists.freedesktop.org; 
intel-gfx@lists.freedesktop.org; jani.nik...@linux.intel.com; 
joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; 
tvrtko.ursu...@linux.intel.com; Danilo Krummrich ; Daniel 
Vetter ; Zeng, Oak 
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

Am 30.11.23 um 08:22 schrieb zhuweixi:
> Add @Oak to the KFD discussion. I will reply separately elaborating your 
> questions on GMEM's difference from HMM/MMU notifiers.
>
> Christian, thanks for pointing me to that AMDKFD discussion. I have read the 
> discussion around the AMDKFD skeleton patch and found the previous discussion 
> in the following URLs:
> https://lore.kernel.org/dri-devel/1405028848-5660-1-git-send-email-ode
> d.gab...@amd.com/#r 
> https://lore.kernel.org/dri-devel/20140711154231.gb1...@gmail.com/
>
> I believe AMDKFD's original patch was rejected mostly because of inserting 
> vendor-specific stuff to the generic core MM.  Jérôme has clearly stated this 
> issue in the second URL. If the code is vendor-specific then it has no place 
> in core MM, period.
>
> But why does that vendor-specific solution relate to a generalized solution 
> like GMEM? The initial AMDKFD patch doesn't work for Nvidia or Intel.

KFD was meant to be a vendor agnostic framework, very similar to what you 
propose here.

It's just that it was seen as vendor specific because nobody else actually 
wanted to design the their drivers this way.

>
> In fact I think the rejection of the initial AMDKFD patch supports GMEM's 
> idea -- there could have been a simpler AMDKFD implementation if the core MM 
> was extended by GMEM. Also, after 9 years, there are so many other companies 
> building their accelerators over the past few years, especially now the 
> GPT-family has made a much bigger success. Don't we want to advance Linux's 
> core MM for more friendly and generalized support for the upcoming new 
> vend

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-30 Thread zhuweixi
Glad to know that there is a common demand for a new syscall like hmadvise(). I 
expect it would also be useful for homogeneous NUMA cases. Credits to 
cudaMemAdvise() API which brought this idea to GMEM's design.

To answer @Oak's questions about GMEM vs. HMM,

Here is the major difference:
  GMEM's main target is to stop drivers from reinventing MM code, while HMM/MMU 
notifiers provide a compatible struct page solution and a coordination 
mechanism for existing device driver MMs that requires adding extra code to 
interact with CPU MM.

A straightforward qualitative result for the main target: after integrating 
Huawei's Ascend NPU driver with GMEM's interface, 30,000 lines of MM code were 
cut, leaving <100 lines invoking GMEM interface and 3,700 lines implementing 
vendor-specific functions. Some code from the 3,700 lines should be further 
moved to GMEM as a generalized feature like device memory oversubscription, but 
not included in this RFC patch yet. 

A list of high-level differences: 
  1. With HMM/MMU notifiers, drivers need to first implement a full MM 
subsystem. With GMEM, drivers can reuse Linux's core MM.

  2. HMM encodes device mapping information in the CPU arch-dependent PTEs, 
while GMEM proposes an abstraction layer in vm_object. Since GMEM's approach 
further decouples the arch-related stuff, drivers do not need to implement 
separate code for X86/ARM and etc.

  3. MMU notifiers register hooks at certain core MM events, while GMEM 
declares basic functions and internally invokes them. GMEM requires less from 
the driver side -- no need to understand what core MM behaves at certain MMU 
events. GMEM also expects fewer bugs than MMU notifiers: implementing basic 
operations with standard declarations vs. implementing whatever random device 
MM logic in MMU notifiers.

  4. GMEM plans to support a more lightweight physical memory management. The 
discussion about this part can be found in my cover letter. The question is 
whether struct page should be compatible (directly use HMM's ZONE_DEVICE 
solution) or a trimmed, smaller struct page that satisfies generalized demands 
from accelerators is more preferrable?

  5. GMEM has been demonstrated to allow device memory oversubscription (a 
GMEM-based 32GB NPU card can run a GPT model oversubscribing 500GB host DDR), 
while drivers using HMM/MMU notifier must implement this logic one by one. I 
will submit this part in a future RFC patch.

I want to reiterate that GMEM's shared address space support is a bonus result, 
not a main contribution... It was done because it was not difficult to 
implement internal CPU-device coordination mechanism when core MM is extended 
by GMEM to support devices.

-Weixi

-Original Message-
From: Christian König  
Sent: Thursday, November 30, 2023 4:28 PM
To: Zeng, Oak ; Christian König ; 
zhuweixi ; linux...@kvack.org; 
linux-ker...@vger.kernel.org; a...@linux-foundation.org; Danilo Krummrich 
; Dave Airlie ; Daniel Vetter 

Cc: intel-gvt-...@lists.freedesktop.org; rcampb...@nvidia.com; 
mhairgr...@nvidia.com; j...@nvidia.com; weixi@openeuler.sh; 
jhubb...@nvidia.com; intel-gfx@lists.freedesktop.org; apop...@nvidia.com; 
xinhui@amd.com; amd-...@lists.freedesktop.org; 
tvrtko.ursu...@linux.intel.com; ogab...@kernel.org; jgli...@redhat.com; 
dri-de...@lists.freedesktop.org; z...@nvidia.com; Vivi, Rodrigo 
; alexander.deuc...@amd.com; leo...@nvidia.com; 
felix.kuehl...@amd.com; Wang, Zhi A ; mgor...@suse.de
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

Hi Oak,

yeah, #4 is indeed a really good point and I think Felix will agree to that as 
well.

HMM is basically still missing a way to advise device attributes for the CPU 
address space. Both migration strategy as well as device specific information 
(like cache preferences) fall into this category.

Since there is a device specific component in those attributes as well I think 
device specific IOCTLs still make sense to update them, but HMM should offer 
the functionality to manage and store those information.

Split and merge of VMAs only become a problem if you attach those information 
to VMAs, if you keep them completely separate than that doesn't become an issue 
either. The down side of this approach is that you don't get automatically 
extending attribute ranges for growing VMAs for example.

Regards,
Christian.

Am 29.11.23 um 23:23 schrieb Zeng, Oak:
> Hi Weixi,
>
> Even though Christian has listed reasons rejecting this proposal (yes they 
> are very reasonable to me), I would open my mind and further explore the 
> possibility here. Since the current GPU driver uses a hmm based 
> implementation (AMD and NV has done this; At Intel we are catching up), I 
> want to explore how much we can benefit from the proposed approach and how 
> your approach can solve some pain points of our development. So basically 
> what I am questioning here is:

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-29 Thread zhuweixi
Add @Oak to the KFD discussion. I will reply separately elaborating your 
questions on GMEM's difference from HMM/MMU notifiers.

Christian, thanks for pointing me to that AMDKFD discussion. I have read the 
discussion around the AMDKFD skeleton patch and found the previous discussion 
in the following URLs:
https://lore.kernel.org/dri-devel/1405028848-5660-1-git-send-email-oded.gab...@amd.com/#r
https://lore.kernel.org/dri-devel/20140711154231.gb1...@gmail.com/

I believe AMDKFD's original patch was rejected mostly because of inserting 
vendor-specific stuff to the generic core MM.  Jérôme has clearly stated this 
issue in the second URL. If the code is vendor-specific then it has no place in 
core MM, period. 

But why does that vendor-specific solution relate to a generalized solution 
like GMEM? The initial AMDKFD patch doesn't work for Nvidia or Intel.

In fact I think the rejection of the initial AMDKFD patch supports GMEM's idea 
-- there could have been a simpler AMDKFD implementation if the core MM was 
extended by GMEM. Also, after 9 years, there are so many other companies 
building their accelerators over the past few years, especially now the 
GPT-family has made a much bigger success. Don't we want to advance Linux's 
core MM for more friendly and generalized support for the upcoming new vendors? 

Now answering Christian's design concerns:

1. "There are cases that do not want to share CPU address space"
Maybe, but I am not fully convinced. The current case we can find is when a NIC 
utilizes IOMMU for security. For this case, GMEM implemented a generalized VMA 
support and tested it with NICs using both Intel-IOMMU/Arm-SMMU. This cut 600 
LoC of IOVA management code from the IOMMU driver, but it is still not included 
in this RFC patch -- I cannot find other cases demanding this isolation. The 
isolation is also unnecessary -- the NIC can enable the IOMMU SVM feature to 
share the CPU address space. As of KVM, it is essentially a host process that 
utilizes two different MMUs within the same address space, so it fits GMEM's 
design... 

2. "This does not integrate well with the filesystem layer in Linux..."
To be honest, not using a logical page table for anonymous memory is why Linux 
THP fails compared with FreeBSD's superpage, but I am not going to elaborate it 
here. But yes, and I am looking for merging struct 
vm_object->logical_page_table with struct address_space->i_pages. This will 
make a natural support for devices oversubscribing both host DRAM and disks. As 
explained in my cover letter, struct vm_object borrows FreeBSD's VM design -- 
it provides a unified abstraction layer for anonymous, file-backed memory and 
etc. 

3. "Requirements to CPU address space management and device address space 
management are just massively different. For example huge and giant pages are a 
must have for modern devices..."
I think you are asking two questions. First, is VA space a problem? GMEM 
assumes that device VA space should be covered by CPU VA space (sorry i386), 
should we consider devices using more VA bits than the CPU (64-bit)? Second, 
yes, modern accelerators definitely demand large pages. From my experience, 
both Nvidia GPUs and Huawei Ascend NPUs suffer from performance issues using 
page sizes smaller than 2MB. However, GMEM does not stop a device to use a 
different page size. A device can choose a 64KB page size running on an X86 
host, and GMEM will still work -- whether the CPU page fault goes to 2MB-THP or 
4KB paths, GMEM looks up stuct vm_object to examine whether a 
virtual-to-physical mapping exist on the device page table. If the faulted VA 
is covered by a 64KB device mapping, a 4KB sub-page must at least be migrated 
and the 64KB device mapping must be invoked. The device can either keep the 
rest 15 4KB physical pages and create 15 "contiguous" (with a hole) 4KB 
mappings or simply wait for the next device page fault to migrate one 4KB page 
and install a 64KB mapping. The policy is left for device to choose, but the 
mechanisms are provided by GMEM. So, the current assumption of GMEM is just 
that your device page sizes must be multiples of CPU base page size.

4. "The argument that a shared memory management leads to less bugs has also 
absolutely not be proven true. Instead we literally spend month if not years 
hunting down bugs which resulted from interaction between CPU and devices."
This is another case supporting GMEM. Don't developers want to let GMEM handle 
the CPU-device interaction so that they can waive months of debugging cost?

PS, hmadvise() is based on the idea of Nvidia's cudaMemAdvise() which provides 
abundant and useful memory policies. HMM extended mbind() instead.

-Weixi

-Original Message-
From: Christian König  
Sent: Wednesday, November 29, 2023 11:22 PM
To: zhuweixi ; Dave Airlie 
Cc: linux...@kvack.org; linux-ker...@vger.kernel.org; 
a...@linux-foundation.org; weix

Re: [Intel-gfx] [RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track address mapping status

2023-11-29 Thread zhuweixi
Oops, that should be changed to the following:

/* SPDX-License-Identifier: GPL-2.0-only */
/*
 * Generalized Memory Management.
 *
 * Copyright (C) 2023- Huawei, Inc.
 * Author: Weixi Zhu
 *
 */

Thanks for pointing it out.
-Weixi

-Original Message-
From: emily  
Sent: Wednesday, November 29, 2023 4:33 PM
To: zhuweixi ; linux...@kvack.org; 
linux-ker...@vger.kernel.org; a...@linux-foundation.org
Cc: leo...@nvidia.com; apop...@nvidia.com; amd-...@lists.freedesktop.org; 
mgor...@suse.de; z...@nvidia.com; zhi.a.w...@intel.com; rcampb...@nvidia.com; 
j...@nvidia.com; weixi@openeuler.sh; jhubb...@nvidia.com; 
intel-gfx@lists.freedesktop.org; mhairgr...@nvidia.com; jgli...@redhat.com; 
rodrigo.v...@intel.com; intel-gvt-...@lists.freedesktop.org; 
tvrtko.ursu...@linux.intel.com; felix.kuehl...@amd.com; xinhui@amd.com; 
christian.koe...@amd.com; alexander.deuc...@amd.com; ogab...@kernel.org
Subject: Re: [RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track 
address mapping status


On 11/28/23 07:50, Weixi Zhu wrote:
> This patch adds an abstraction layer, struct vm_object, that maintains 
> per-process virtual-to-physical mapping status stored in struct gm_mapping.
> For example, a virtual page may be mapped to a CPU physical page or to 
> a device physical page. Struct vm_object effectively maintains an 
> arch-independent page table, which is defined as a "logical page table".
> While arch-dependent page table used by a real MMU is named a 
> "physical page table". The logical page table is useful if Linux core 
> MM is extended to handle a unified virtual address space with external 
> accelerators using customized MMUs.
>
> In this patch, struct vm_object utilizes a radix tree (xarray) to 
> track where a virtual page is mapped to. This adds extra memory 
> consumption from xarray, but provides a nice abstraction to isolate 
> mapping status from the machine-dependent layer (PTEs). Besides 
> supporting accelerators with external MMUs, struct vm_object is 
> planned to further union with i_pages in struct address_mapping for 
> file-backed memory.
>
> The idea of struct vm_object is originated from FreeBSD VM design, 
> which provides a unified abstraction for anonymous memory, file-backed 
> memory, page cache and etc[1].
>
> Currently, Linux utilizes a set of hierarchical page walk functions to 
> abstract page table manipulations of different CPU architecture. The 
> problem happens when a device wants to reuse Linux MM code to manage 
> its page table -- the device page table may not be accessible to the CPU.
> Existing solution like Linux HMM utilizes the MMU notifier mechanisms 
> to invoke device-specific MMU functions, but relies on encoding the 
> mapping status on the CPU page table entries. This entangles 
> machine-independent code with machine-dependent code, and also brings 
> unnecessary restrictions.
> The PTE size and format vary arch by arch, which harms the extensibility.
>
> [1] https://docs.freebsd.org/en/articles/vm-design/
>
> Signed-off-by: Weixi Zhu 
> ---
>   include/linux/gmem.h | 120 +
>   include/linux/mm_types.h |   4 +
>   mm/Makefile  |   2 +-
>   mm/vm_object.c   | 184 +++
>   4 files changed, 309 insertions(+), 1 deletion(-)
>   create mode 100644 mm/vm_object.c
>
> diff --git a/include/linux/gmem.h b/include/linux/gmem.h index 
> fff877873557..529ff6755a99 100644
> --- a/include/linux/gmem.h
> +++ b/include/linux/gmem.h
> @@ -9,11 +9,131 @@
>   #ifndef _GMEM_H
>   #define _GMEM_H
>   
> +#include 
> +
>   #ifdef CONFIG_GMEM
> +
> +#define GM_PAGE_CPU  0x10 /* Determines whether page is a pointer or a pfn 
> number. */
> +#define GM_PAGE_DEVICE   0x20
> +#define GM_PAGE_NOMAP0x40
> +#define GM_PAGE_WILLNEED 0x80
> +
> +#define GM_PAGE_TYPE_MASK(GM_PAGE_CPU | GM_PAGE_DEVICE | GM_PAGE_NOMAP)
> +
> +struct gm_mapping {
> + unsigned int flag;
> +
> + union {
> + struct page *page;  /* CPU node */
> + struct gm_dev *dev; /* hetero-node. TODO: support multiple 
> devices */
> + unsigned long pfn;
> + };
> +
> + struct mutex lock;
> +};
> +
> +static inline void gm_mapping_flags_set(struct gm_mapping 
> +*gm_mapping, int flags) {
> + if (flags & GM_PAGE_TYPE_MASK)
> + gm_mapping->flag &= ~GM_PAGE_TYPE_MASK;
> +
> + gm_mapping->flag |= flags;
> +}
> +
> +static inline void gm_mapping_flags_clear(struct gm_mapping 
> +*gm_mapping, int flags) {
> + gm_mapping->flag &= ~flags;
> +}
> +
> +static inline bool gm_mapping_cpu(struct gm_mapping *gm_mapping) 

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-29 Thread zhuweixi
Glad to hear that more sharable code is desirable. 
IMHO, for a common MM subsystem, it is more beneficial for 
GMEM to extend core MM instead of building a separate one.

As stated in the beginning of my RFC letter, MM systems are 
large and similar. Even a sophisticated one like Linux MM
that has evolved over decades still suffers from an increasing 
number of bugs[1]. So, directly extending core MM to support
devices not only avoids opening a new box of bugs, but also 
allows the community to concentrate on maintaining one single 
MM system. On the other side, GMEM does no hurt to core MM
If a CPU process is not attached with device contexts.

@Christian, could you provide more information on what AMD
proposed with KFD and why it was rejected?

[1] Huang, Jian, Moinuddin K. Qureshi, and Karsten Schwan. "An evolutionary 
study of linux memory management for fun and profit." 2016 USENIX Annual 
Technical Conference (USENIX ATC 16). 2016.

Thanks,
Weixi

-Original Message-
From: Dave Airlie  
Sent: Wednesday, November 29, 2023 1:15 PM
To: Christian König 
Cc: zhuweixi ; linux...@kvack.org; 
linux-ker...@vger.kernel.org; a...@linux-foundation.org; 
weixi@openeuler.sh; mgor...@suse.de; jgli...@redhat.com; 
rcampb...@nvidia.com; jhubb...@nvidia.com; apop...@nvidia.com; 
mhairgr...@nvidia.com; z...@nvidia.com; alexander.deuc...@amd.com; 
xinhui@amd.com; amd-...@lists.freedesktop.org; felix.kuehl...@amd.com; 
ogab...@kernel.org; dri-de...@lists.freedesktop.org; j...@nvidia.com; 
leo...@nvidia.com; zhen...@linux.intel.com; zhi.a.w...@intel.com; 
intel-gvt-...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; 
jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com; 
rodrigo.v...@intel.com; tvrtko.ursu...@linux.intel.com
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

On Tue, 28 Nov 2023 at 23:07, Christian König  wrote:
>
> Am 28.11.23 um 13:50 schrieb Weixi Zhu:
> > The problem:
> >
> > Accelerator driver developers are forced to reinvent external MM subsystems
> > case by case, because Linux core MM only considers host memory resources.
> > These reinvented MM subsystems have similar orders of magnitude of LoC as
> > Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
> > 30K. Meanwhile, more and more vendors are implementing their own
> > accelerators, e.g. Microsoft's Maia 100. At the same time,
> > application-level developers suffer from poor programmability -- they must
> > consider parallel address spaces and be careful about the limited device
> > DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
> > be shared by the accelerator, or the abundant host DRAM can further
> > transparently backup the device local memory.
> >
> > These external MM systems share similar mechanisms except for the
> > hardware-dependent part, so reinventing them is effectively introducing
> > redundant code (14K~70K for each case). Such developing/maintaining is not
> > cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
> > need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
> > notifiers/HMM. This raises the bar for driver development, since developers
> > must understand how Linux MM works. Further, it creates code maintenance
> > problems -- any changes to Linux MM potentially require coordinated changes
> > to accelerator drivers using low-level MM APIs.
> >
> > Putting a cache-coherent bus between host and device will not make these
> > external MM subsystems disappear. For example, a throughput-oriented
> > accelerator will not tolerate executing heavy memory access workload with
> > a host MMU/IOMMU via a remote bus. Therefore, devices will still have
> > their own MMU and pick a simpler page table format for lower address
> > translation overhead, requiring external MM subsystems.
> >
> > 
> >
> > What GMEM (Generalized Memory Management [1]) does:
> >
> > GMEM extends Linux MM to share its machine-independent MM code. Only
> > high-level interface is provided for device drivers. This prevents
> > accelerator drivers from reinventing the wheel, but relies on drivers to
> > implement their hardware-dependent functions declared by GMEM. GMEM's key
> > interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
> > gm_dev_register_physmem(). Here briefly describe how a device driver
> > utilizes them:
> > 1. At boot time, call gm_dev_create() and registers the implementation of
> > hardware-dependent functions as declared in struct gm_mmu.
> >   - If the device has local DRAM, call gm_dev_register_physmem() to
> > re