Re: [PATCH 4/6] HMM: add HMM page table.
On Mon, Jan 05, 2015 at 05:44:47PM -0500, j.gli...@gmail.com wrote: > From: Jérôme Glisse > > Heterogeneous memory management main purpose is to mirror a process address. > To do so it must maintain a secondary page table that is use by the device > driver to program the device or build a device specific page table. > > Radix tree can not be use to create this secondary page table because HMM > needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we > believe HMM will require so much flags that cost will becomes prohibitive > to others users of radix tree). > > Moreover radix tree is built around long but for HMM we need to store dma > address and on some platform sizeof(dma_addr_t) > sizeof(long). Thus radix > tree is unsuitable to fulfill HMM requirement hence why we introduce this > code which allows to create page table that can grow and shrink dynamicly. > > The design is very clause to CPU page table as it reuse some of the feature > such as spinlock embedded in struct page. Hi Linus, I was hopping that after LCA or maybe on a plane back from it, you could take a look at this version of the patchset and share your view on them, especialy the page table one as it seemed to be the contentious point of previous version. I would really like to know where we stand on this. Hardware using this feature is coming fast and i would rather have linux kernel support early. Hope you will be mildly happier with that version. Cheers, Jérôme > > Signed-off-by: Jérôme Glisse > Signed-off-by: Sherry Cheung > Signed-off-by: Subhash Gutti > Signed-off-by: Mark Hairgrove > Signed-off-by: John Hubbard > Signed-off-by: Jatin Kumar > --- > MAINTAINERS| 2 + > include/linux/hmm_pt.h | 261 ++ > mm/Makefile| 2 +- > mm/hmm_pt.c| 425 > + > 4 files changed, 689 insertions(+), 1 deletion(-) > create mode 100644 include/linux/hmm_pt.h > create mode 100644 mm/hmm_pt.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index 3ec87c4..4090e86 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -4539,6 +4539,8 @@ L: linux...@kvack.org > S: Maintained > F: mm/hmm.c > F: include/linux/hmm.h > +F: mm/hmm_pt.c > +F: include/linux/hmm_pt.h > > HOST AP DRIVER > M: Jouni Malinen > diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h > new file mode 100644 > index 000..88fc519 > --- /dev/null > +++ b/include/linux/hmm_pt.h > @@ -0,0 +1,261 @@ > +/* > + * Copyright 2014 Red Hat Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * Authors: Jérôme Glisse > + */ > +/* > + * This provide a set of helpers for HMM page table. See include/linux/hmm.h > + * for a description of what HMM is. > + * > + * HMM page table rely on a locking mecanism similar to CPU page table for > page > + * table update. It use the spinlock embedded inside the struct page to > protect > + * change to page table directory which should minimize lock contention for > + * concurrent update. > + * > + * It does also provide a directory tree protection mechanism. Unlike CPU > page > + * table there is no mmap semaphore to protect directory tree from removal > and > + * this is done intentionaly so that concurrent removal/insertion of > directory > + * inside the tree can happen. > + * > + * So anyone walking down the page table must protect directory it traverses > so > + * they are not free by some other thread. This is done by using a reference > + * counter for each directory. Before traversing a directory a reference is > + * taken and once traversal is done the reference is drop. > + * > + * A directory entry dereference and refcount increment of sub-directory page > + * must happen in a critical rcu section so that directory page removal can > + * gracefully wait for all possible other threads that might have > dereferenced > + * the directory. > + */ > +#ifndef _HMM_PT_H > +#define _HMM_PT_H > + > +/* > + * The HMM page table entry does not reflect any specific hardware. It is > just > + * a common entry format use by HMM internal and expose to HMM user so they > can > + * extract information out of HMM page table. > + */ > +#define HMM_PTE_VALID(1 << 0) > +#define HMM_PTE_WRITE(1 << 1) > +#define HMM_PTE_DIRTY(1 << 2) > +#define HMM_PFN_SHIFT4 > +#define HMM_PFN_MASK (~((dma_addr_t)((1 << HMM_PFN_SHIFT) - 1))) > + >
Re: [PATCH 4/6] HMM: add HMM page table.
On Mon, Jan 05, 2015 at 05:44:47PM -0500, j.gli...@gmail.com wrote: From: Jérôme Glisse jgli...@redhat.com Heterogeneous memory management main purpose is to mirror a process address. To do so it must maintain a secondary page table that is use by the device driver to program the device or build a device specific page table. Radix tree can not be use to create this secondary page table because HMM needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we believe HMM will require so much flags that cost will becomes prohibitive to others users of radix tree). Moreover radix tree is built around long but for HMM we need to store dma address and on some platform sizeof(dma_addr_t) sizeof(long). Thus radix tree is unsuitable to fulfill HMM requirement hence why we introduce this code which allows to create page table that can grow and shrink dynamicly. The design is very clause to CPU page table as it reuse some of the feature such as spinlock embedded in struct page. Hi Linus, I was hopping that after LCA or maybe on a plane back from it, you could take a look at this version of the patchset and share your view on them, especialy the page table one as it seemed to be the contentious point of previous version. I would really like to know where we stand on this. Hardware using this feature is coming fast and i would rather have linux kernel support early. Hope you will be mildly happier with that version. Cheers, Jérôme Signed-off-by: Jérôme Glisse jgli...@redhat.com Signed-off-by: Sherry Cheung sche...@nvidia.com Signed-off-by: Subhash Gutti sgu...@nvidia.com Signed-off-by: Mark Hairgrove mhairgr...@nvidia.com Signed-off-by: John Hubbard jhubb...@nvidia.com Signed-off-by: Jatin Kumar jaku...@nvidia.com --- MAINTAINERS| 2 + include/linux/hmm_pt.h | 261 ++ mm/Makefile| 2 +- mm/hmm_pt.c| 425 + 4 files changed, 689 insertions(+), 1 deletion(-) create mode 100644 include/linux/hmm_pt.h create mode 100644 mm/hmm_pt.c diff --git a/MAINTAINERS b/MAINTAINERS index 3ec87c4..4090e86 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4539,6 +4539,8 @@ L: linux...@kvack.org S: Maintained F: mm/hmm.c F: include/linux/hmm.h +F: mm/hmm_pt.c +F: include/linux/hmm_pt.h HOST AP DRIVER M: Jouni Malinen j...@w1.fi diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h new file mode 100644 index 000..88fc519 --- /dev/null +++ b/include/linux/hmm_pt.h @@ -0,0 +1,261 @@ +/* + * Copyright 2014 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Jérôme Glisse jgli...@redhat.com + */ +/* + * This provide a set of helpers for HMM page table. See include/linux/hmm.h + * for a description of what HMM is. + * + * HMM page table rely on a locking mecanism similar to CPU page table for page + * table update. It use the spinlock embedded inside the struct page to protect + * change to page table directory which should minimize lock contention for + * concurrent update. + * + * It does also provide a directory tree protection mechanism. Unlike CPU page + * table there is no mmap semaphore to protect directory tree from removal and + * this is done intentionaly so that concurrent removal/insertion of directory + * inside the tree can happen. + * + * So anyone walking down the page table must protect directory it traverses so + * they are not free by some other thread. This is done by using a reference + * counter for each directory. Before traversing a directory a reference is + * taken and once traversal is done the reference is drop. + * + * A directory entry dereference and refcount increment of sub-directory page + * must happen in a critical rcu section so that directory page removal can + * gracefully wait for all possible other threads that might have dereferenced + * the directory. + */ +#ifndef _HMM_PT_H +#define _HMM_PT_H + +/* + * The HMM page table entry does not reflect any specific hardware. It is just + * a common entry format use by HMM internal and expose to HMM user so they can + * extract information out of HMM page table. + */ +#define HMM_PTE_VALID(1 0) +#define HMM_PTE_WRITE(1 1) +#define HMM_PTE_DIRTY(1 2) +#define HMM_PFN_SHIFT4 +#define HMM_PFN_MASK
[PATCH 4/6] HMM: add HMM page table.
From: Jérôme Glisse Heterogeneous memory management main purpose is to mirror a process address. To do so it must maintain a secondary page table that is use by the device driver to program the device or build a device specific page table. Radix tree can not be use to create this secondary page table because HMM needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we believe HMM will require so much flags that cost will becomes prohibitive to others users of radix tree). Moreover radix tree is built around long but for HMM we need to store dma address and on some platform sizeof(dma_addr_t) > sizeof(long). Thus radix tree is unsuitable to fulfill HMM requirement hence why we introduce this code which allows to create page table that can grow and shrink dynamicly. The design is very clause to CPU page table as it reuse some of the feature such as spinlock embedded in struct page. Signed-off-by: Jérôme Glisse Signed-off-by: Sherry Cheung Signed-off-by: Subhash Gutti Signed-off-by: Mark Hairgrove Signed-off-by: John Hubbard Signed-off-by: Jatin Kumar --- MAINTAINERS| 2 + include/linux/hmm_pt.h | 261 ++ mm/Makefile| 2 +- mm/hmm_pt.c| 425 + 4 files changed, 689 insertions(+), 1 deletion(-) create mode 100644 include/linux/hmm_pt.h create mode 100644 mm/hmm_pt.c diff --git a/MAINTAINERS b/MAINTAINERS index 3ec87c4..4090e86 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4539,6 +4539,8 @@ L:linux...@kvack.org S: Maintained F: mm/hmm.c F: include/linux/hmm.h +F: mm/hmm_pt.c +F: include/linux/hmm_pt.h HOST AP DRIVER M: Jouni Malinen diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h new file mode 100644 index 000..88fc519 --- /dev/null +++ b/include/linux/hmm_pt.h @@ -0,0 +1,261 @@ +/* + * Copyright 2014 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Jérôme Glisse + */ +/* + * This provide a set of helpers for HMM page table. See include/linux/hmm.h + * for a description of what HMM is. + * + * HMM page table rely on a locking mecanism similar to CPU page table for page + * table update. It use the spinlock embedded inside the struct page to protect + * change to page table directory which should minimize lock contention for + * concurrent update. + * + * It does also provide a directory tree protection mechanism. Unlike CPU page + * table there is no mmap semaphore to protect directory tree from removal and + * this is done intentionaly so that concurrent removal/insertion of directory + * inside the tree can happen. + * + * So anyone walking down the page table must protect directory it traverses so + * they are not free by some other thread. This is done by using a reference + * counter for each directory. Before traversing a directory a reference is + * taken and once traversal is done the reference is drop. + * + * A directory entry dereference and refcount increment of sub-directory page + * must happen in a critical rcu section so that directory page removal can + * gracefully wait for all possible other threads that might have dereferenced + * the directory. + */ +#ifndef _HMM_PT_H +#define _HMM_PT_H + +/* + * The HMM page table entry does not reflect any specific hardware. It is just + * a common entry format use by HMM internal and expose to HMM user so they can + * extract information out of HMM page table. + */ +#define HMM_PTE_VALID (1 << 0) +#define HMM_PTE_WRITE (1 << 1) +#define HMM_PTE_DIRTY (1 << 2) +#define HMM_PFN_SHIFT 4 +#define HMM_PFN_MASK (~((dma_addr_t)((1 << HMM_PFN_SHIFT) - 1))) + +static inline dma_addr_t hmm_pte_from_pfn(dma_addr_t pfn) +{ + return (pfn << HMM_PFN_SHIFT) | HMM_PTE_VALID; +} + +static inline unsigned long hmm_pte_pfn(dma_addr_t pte) +{ + return pte >> HMM_PFN_SHIFT; +} + +#define HMM_PT_MAX_LEVEL 6 + +/* struct hmm_pt - HMM page table structure. + * + * @mask: Array of address mask value of each level. + * @directory_mask: Mask for directory index (see below). + * @last: Last valid address (inclusive). + * @pgd: page global directory (top first level of the directory tree). + * @lock: Share lock if spinlock_t does not fit in struct page. + * @shift: Array of address shift value of each level. + * @llevel: Last level. + * + * The index into each directory for a given address and level is : + * (address >>
[PATCH 4/6] HMM: add HMM page table.
From: Jérôme Glisse jgli...@redhat.com Heterogeneous memory management main purpose is to mirror a process address. To do so it must maintain a secondary page table that is use by the device driver to program the device or build a device specific page table. Radix tree can not be use to create this secondary page table because HMM needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we believe HMM will require so much flags that cost will becomes prohibitive to others users of radix tree). Moreover radix tree is built around long but for HMM we need to store dma address and on some platform sizeof(dma_addr_t) sizeof(long). Thus radix tree is unsuitable to fulfill HMM requirement hence why we introduce this code which allows to create page table that can grow and shrink dynamicly. The design is very clause to CPU page table as it reuse some of the feature such as spinlock embedded in struct page. Signed-off-by: Jérôme Glisse jgli...@redhat.com Signed-off-by: Sherry Cheung sche...@nvidia.com Signed-off-by: Subhash Gutti sgu...@nvidia.com Signed-off-by: Mark Hairgrove mhairgr...@nvidia.com Signed-off-by: John Hubbard jhubb...@nvidia.com Signed-off-by: Jatin Kumar jaku...@nvidia.com --- MAINTAINERS| 2 + include/linux/hmm_pt.h | 261 ++ mm/Makefile| 2 +- mm/hmm_pt.c| 425 + 4 files changed, 689 insertions(+), 1 deletion(-) create mode 100644 include/linux/hmm_pt.h create mode 100644 mm/hmm_pt.c diff --git a/MAINTAINERS b/MAINTAINERS index 3ec87c4..4090e86 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4539,6 +4539,8 @@ L:linux...@kvack.org S: Maintained F: mm/hmm.c F: include/linux/hmm.h +F: mm/hmm_pt.c +F: include/linux/hmm_pt.h HOST AP DRIVER M: Jouni Malinen j...@w1.fi diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h new file mode 100644 index 000..88fc519 --- /dev/null +++ b/include/linux/hmm_pt.h @@ -0,0 +1,261 @@ +/* + * Copyright 2014 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Jérôme Glisse jgli...@redhat.com + */ +/* + * This provide a set of helpers for HMM page table. See include/linux/hmm.h + * for a description of what HMM is. + * + * HMM page table rely on a locking mecanism similar to CPU page table for page + * table update. It use the spinlock embedded inside the struct page to protect + * change to page table directory which should minimize lock contention for + * concurrent update. + * + * It does also provide a directory tree protection mechanism. Unlike CPU page + * table there is no mmap semaphore to protect directory tree from removal and + * this is done intentionaly so that concurrent removal/insertion of directory + * inside the tree can happen. + * + * So anyone walking down the page table must protect directory it traverses so + * they are not free by some other thread. This is done by using a reference + * counter for each directory. Before traversing a directory a reference is + * taken and once traversal is done the reference is drop. + * + * A directory entry dereference and refcount increment of sub-directory page + * must happen in a critical rcu section so that directory page removal can + * gracefully wait for all possible other threads that might have dereferenced + * the directory. + */ +#ifndef _HMM_PT_H +#define _HMM_PT_H + +/* + * The HMM page table entry does not reflect any specific hardware. It is just + * a common entry format use by HMM internal and expose to HMM user so they can + * extract information out of HMM page table. + */ +#define HMM_PTE_VALID (1 0) +#define HMM_PTE_WRITE (1 1) +#define HMM_PTE_DIRTY (1 2) +#define HMM_PFN_SHIFT 4 +#define HMM_PFN_MASK (~((dma_addr_t)((1 HMM_PFN_SHIFT) - 1))) + +static inline dma_addr_t hmm_pte_from_pfn(dma_addr_t pfn) +{ + return (pfn HMM_PFN_SHIFT) | HMM_PTE_VALID; +} + +static inline unsigned long hmm_pte_pfn(dma_addr_t pte) +{ + return pte HMM_PFN_SHIFT; +} + +#define HMM_PT_MAX_LEVEL 6 + +/* struct hmm_pt - HMM page table structure. + * + * @mask: Array of address mask value of each level. + * @directory_mask: Mask for directory index (see below). + * @last: Last valid address (inclusive). + * @pgd: page global directory (top first level of the directory tree). + * @lock: Share lock if spinlock_t does not fit in struct page. + * @shift: Array of address shift