Re: [PATCH 4/6] HMM: add HMM page table.

2015-01-19 Thread Jerome Glisse
On Mon, Jan 05, 2015 at 05:44:47PM -0500, j.gli...@gmail.com wrote:
> From: Jérôme Glisse 
> 
> Heterogeneous memory management main purpose is to mirror a process address.
> To do so it must maintain a secondary page table that is use by the device
> driver to program the device or build a device specific page table.
> 
> Radix tree can not be use to create this secondary page table because HMM
> needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we
> believe HMM will require so much flags that cost will becomes prohibitive
> to others users of radix tree).
> 
> Moreover radix tree is built around long but for HMM we need to store dma
> address and on some platform sizeof(dma_addr_t) > sizeof(long). Thus radix
> tree is unsuitable to fulfill HMM requirement hence why we introduce this
> code which allows to create page table that can grow and shrink dynamicly.
> 
> The design is very clause to CPU page table as it reuse some of the feature
> such as spinlock embedded in struct page.

Hi Linus,

I was hopping that after LCA or maybe on a plane back from it, you could take
a look at this version of the patchset and share your view on them, especialy
the page table one as it seemed to be the contentious point of previous version.

I would really like to know where we stand on this. Hardware using this feature
is coming fast and i would rather have linux kernel support early.

Hope you will be mildly happier with that version.

Cheers,
Jérôme

> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: John Hubbard 
> Signed-off-by: Jatin Kumar 
> ---
>  MAINTAINERS|   2 +
>  include/linux/hmm_pt.h | 261 ++
>  mm/Makefile|   2 +-
>  mm/hmm_pt.c| 425 
> +
>  4 files changed, 689 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/hmm_pt.h
>  create mode 100644 mm/hmm_pt.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3ec87c4..4090e86 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4539,6 +4539,8 @@ L:  linux...@kvack.org
>  S:   Maintained
>  F:   mm/hmm.c
>  F:   include/linux/hmm.h
> +F:   mm/hmm_pt.c
> +F:   include/linux/hmm_pt.h
>  
>  HOST AP DRIVER
>  M:   Jouni Malinen 
> diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
> new file mode 100644
> index 000..88fc519
> --- /dev/null
> +++ b/include/linux/hmm_pt.h
> @@ -0,0 +1,261 @@
> +/*
> + * Copyright 2014 Red Hat Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors: Jérôme Glisse 
> + */
> +/*
> + * This provide a set of helpers for HMM page table. See include/linux/hmm.h
> + * for a description of what HMM is.
> + *
> + * HMM page table rely on a locking mecanism similar to CPU page table for 
> page
> + * table update. It use the spinlock embedded inside the struct page to 
> protect
> + * change to page table directory which should minimize lock contention for
> + * concurrent update.
> + *
> + * It does also provide a directory tree protection mechanism. Unlike CPU 
> page
> + * table there is no mmap semaphore to protect directory tree from removal 
> and
> + * this is done intentionaly so that concurrent removal/insertion of 
> directory
> + * inside the tree can happen.
> + *
> + * So anyone walking down the page table must protect directory it traverses 
> so
> + * they are not free by some other thread. This is done by using a reference
> + * counter for each directory. Before traversing a directory a reference is
> + * taken and once traversal is done the reference is drop.
> + *
> + * A directory entry dereference and refcount increment of sub-directory page
> + * must happen in a critical rcu section so that directory page removal can
> + * gracefully wait for all possible other threads that might have 
> dereferenced
> + * the directory.
> + */
> +#ifndef _HMM_PT_H
> +#define _HMM_PT_H
> +
> +/*
> + * The HMM page table entry does not reflect any specific hardware. It is 
> just
> + * a common entry format use by HMM internal and expose to HMM user so they 
> can
> + * extract information out of HMM page table.
> + */
> +#define HMM_PTE_VALID(1 << 0)
> +#define HMM_PTE_WRITE(1 << 1)
> +#define HMM_PTE_DIRTY(1 << 2)
> +#define HMM_PFN_SHIFT4
> +#define HMM_PFN_MASK (~((dma_addr_t)((1 << HMM_PFN_SHIFT) - 1)))
> +
> 

Re: [PATCH 4/6] HMM: add HMM page table.

2015-01-19 Thread Jerome Glisse
On Mon, Jan 05, 2015 at 05:44:47PM -0500, j.gli...@gmail.com wrote:
 From: Jérôme Glisse jgli...@redhat.com
 
 Heterogeneous memory management main purpose is to mirror a process address.
 To do so it must maintain a secondary page table that is use by the device
 driver to program the device or build a device specific page table.
 
 Radix tree can not be use to create this secondary page table because HMM
 needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we
 believe HMM will require so much flags that cost will becomes prohibitive
 to others users of radix tree).
 
 Moreover radix tree is built around long but for HMM we need to store dma
 address and on some platform sizeof(dma_addr_t)  sizeof(long). Thus radix
 tree is unsuitable to fulfill HMM requirement hence why we introduce this
 code which allows to create page table that can grow and shrink dynamicly.
 
 The design is very clause to CPU page table as it reuse some of the feature
 such as spinlock embedded in struct page.

Hi Linus,

I was hopping that after LCA or maybe on a plane back from it, you could take
a look at this version of the patchset and share your view on them, especialy
the page table one as it seemed to be the contentious point of previous version.

I would really like to know where we stand on this. Hardware using this feature
is coming fast and i would rather have linux kernel support early.

Hope you will be mildly happier with that version.

Cheers,
Jérôme

 
 Signed-off-by: Jérôme Glisse jgli...@redhat.com
 Signed-off-by: Sherry Cheung sche...@nvidia.com
 Signed-off-by: Subhash Gutti sgu...@nvidia.com
 Signed-off-by: Mark Hairgrove mhairgr...@nvidia.com
 Signed-off-by: John Hubbard jhubb...@nvidia.com
 Signed-off-by: Jatin Kumar jaku...@nvidia.com
 ---
  MAINTAINERS|   2 +
  include/linux/hmm_pt.h | 261 ++
  mm/Makefile|   2 +-
  mm/hmm_pt.c| 425 
 +
  4 files changed, 689 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/hmm_pt.h
  create mode 100644 mm/hmm_pt.c
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 3ec87c4..4090e86 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -4539,6 +4539,8 @@ L:  linux...@kvack.org
  S:   Maintained
  F:   mm/hmm.c
  F:   include/linux/hmm.h
 +F:   mm/hmm_pt.c
 +F:   include/linux/hmm_pt.h
  
  HOST AP DRIVER
  M:   Jouni Malinen j...@w1.fi
 diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
 new file mode 100644
 index 000..88fc519
 --- /dev/null
 +++ b/include/linux/hmm_pt.h
 @@ -0,0 +1,261 @@
 +/*
 + * Copyright 2014 Red Hat Inc.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * Authors: Jérôme Glisse jgli...@redhat.com
 + */
 +/*
 + * This provide a set of helpers for HMM page table. See include/linux/hmm.h
 + * for a description of what HMM is.
 + *
 + * HMM page table rely on a locking mecanism similar to CPU page table for 
 page
 + * table update. It use the spinlock embedded inside the struct page to 
 protect
 + * change to page table directory which should minimize lock contention for
 + * concurrent update.
 + *
 + * It does also provide a directory tree protection mechanism. Unlike CPU 
 page
 + * table there is no mmap semaphore to protect directory tree from removal 
 and
 + * this is done intentionaly so that concurrent removal/insertion of 
 directory
 + * inside the tree can happen.
 + *
 + * So anyone walking down the page table must protect directory it traverses 
 so
 + * they are not free by some other thread. This is done by using a reference
 + * counter for each directory. Before traversing a directory a reference is
 + * taken and once traversal is done the reference is drop.
 + *
 + * A directory entry dereference and refcount increment of sub-directory page
 + * must happen in a critical rcu section so that directory page removal can
 + * gracefully wait for all possible other threads that might have 
 dereferenced
 + * the directory.
 + */
 +#ifndef _HMM_PT_H
 +#define _HMM_PT_H
 +
 +/*
 + * The HMM page table entry does not reflect any specific hardware. It is 
 just
 + * a common entry format use by HMM internal and expose to HMM user so they 
 can
 + * extract information out of HMM page table.
 + */
 +#define HMM_PTE_VALID(1  0)
 +#define HMM_PTE_WRITE(1  1)
 +#define HMM_PTE_DIRTY(1  2)
 +#define HMM_PFN_SHIFT4
 +#define HMM_PFN_MASK 

[PATCH 4/6] HMM: add HMM page table.

2015-01-05 Thread j . glisse
From: Jérôme Glisse 

Heterogeneous memory management main purpose is to mirror a process address.
To do so it must maintain a secondary page table that is use by the device
driver to program the device or build a device specific page table.

Radix tree can not be use to create this secondary page table because HMM
needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we
believe HMM will require so much flags that cost will becomes prohibitive
to others users of radix tree).

Moreover radix tree is built around long but for HMM we need to store dma
address and on some platform sizeof(dma_addr_t) > sizeof(long). Thus radix
tree is unsuitable to fulfill HMM requirement hence why we introduce this
code which allows to create page table that can grow and shrink dynamicly.

The design is very clause to CPU page table as it reuse some of the feature
such as spinlock embedded in struct page.

Signed-off-by: Jérôme Glisse 
Signed-off-by: Sherry Cheung 
Signed-off-by: Subhash Gutti 
Signed-off-by: Mark Hairgrove 
Signed-off-by: John Hubbard 
Signed-off-by: Jatin Kumar 
---
 MAINTAINERS|   2 +
 include/linux/hmm_pt.h | 261 ++
 mm/Makefile|   2 +-
 mm/hmm_pt.c| 425 +
 4 files changed, 689 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/hmm_pt.h
 create mode 100644 mm/hmm_pt.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3ec87c4..4090e86 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4539,6 +4539,8 @@ L:linux...@kvack.org
 S: Maintained
 F: mm/hmm.c
 F: include/linux/hmm.h
+F: mm/hmm_pt.c
+F: include/linux/hmm_pt.h
 
 HOST AP DRIVER
 M: Jouni Malinen 
diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
new file mode 100644
index 000..88fc519
--- /dev/null
+++ b/include/linux/hmm_pt.h
@@ -0,0 +1,261 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: Jérôme Glisse 
+ */
+/*
+ * This provide a set of helpers for HMM page table. See include/linux/hmm.h
+ * for a description of what HMM is.
+ *
+ * HMM page table rely on a locking mecanism similar to CPU page table for page
+ * table update. It use the spinlock embedded inside the struct page to protect
+ * change to page table directory which should minimize lock contention for
+ * concurrent update.
+ *
+ * It does also provide a directory tree protection mechanism. Unlike CPU page
+ * table there is no mmap semaphore to protect directory tree from removal and
+ * this is done intentionaly so that concurrent removal/insertion of directory
+ * inside the tree can happen.
+ *
+ * So anyone walking down the page table must protect directory it traverses so
+ * they are not free by some other thread. This is done by using a reference
+ * counter for each directory. Before traversing a directory a reference is
+ * taken and once traversal is done the reference is drop.
+ *
+ * A directory entry dereference and refcount increment of sub-directory page
+ * must happen in a critical rcu section so that directory page removal can
+ * gracefully wait for all possible other threads that might have dereferenced
+ * the directory.
+ */
+#ifndef _HMM_PT_H
+#define _HMM_PT_H
+
+/*
+ * The HMM page table entry does not reflect any specific hardware. It is just
+ * a common entry format use by HMM internal and expose to HMM user so they can
+ * extract information out of HMM page table.
+ */
+#define HMM_PTE_VALID  (1 << 0)
+#define HMM_PTE_WRITE  (1 << 1)
+#define HMM_PTE_DIRTY  (1 << 2)
+#define HMM_PFN_SHIFT  4
+#define HMM_PFN_MASK   (~((dma_addr_t)((1 << HMM_PFN_SHIFT) - 1)))
+
+static inline dma_addr_t hmm_pte_from_pfn(dma_addr_t pfn)
+{
+   return (pfn << HMM_PFN_SHIFT) | HMM_PTE_VALID;
+}
+
+static inline unsigned long hmm_pte_pfn(dma_addr_t pte)
+{
+   return pte >> HMM_PFN_SHIFT;
+}
+
+#define HMM_PT_MAX_LEVEL   6
+
+/* struct hmm_pt - HMM page table structure.
+ *
+ * @mask: Array of address mask value of each level.
+ * @directory_mask: Mask for directory index (see below).
+ * @last: Last valid address (inclusive).
+ * @pgd: page global directory (top first level of the directory tree).
+ * @lock: Share lock if spinlock_t does not fit in struct page.
+ * @shift: Array of address shift value of each level.
+ * @llevel: Last level.
+ *
+ * The index into each directory for a given address and level is :
+ *   (address >> 

[PATCH 4/6] HMM: add HMM page table.

2015-01-05 Thread j . glisse
From: Jérôme Glisse jgli...@redhat.com

Heterogeneous memory management main purpose is to mirror a process address.
To do so it must maintain a secondary page table that is use by the device
driver to program the device or build a device specific page table.

Radix tree can not be use to create this secondary page table because HMM
needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we
believe HMM will require so much flags that cost will becomes prohibitive
to others users of radix tree).

Moreover radix tree is built around long but for HMM we need to store dma
address and on some platform sizeof(dma_addr_t)  sizeof(long). Thus radix
tree is unsuitable to fulfill HMM requirement hence why we introduce this
code which allows to create page table that can grow and shrink dynamicly.

The design is very clause to CPU page table as it reuse some of the feature
such as spinlock embedded in struct page.

Signed-off-by: Jérôme Glisse jgli...@redhat.com
Signed-off-by: Sherry Cheung sche...@nvidia.com
Signed-off-by: Subhash Gutti sgu...@nvidia.com
Signed-off-by: Mark Hairgrove mhairgr...@nvidia.com
Signed-off-by: John Hubbard jhubb...@nvidia.com
Signed-off-by: Jatin Kumar jaku...@nvidia.com
---
 MAINTAINERS|   2 +
 include/linux/hmm_pt.h | 261 ++
 mm/Makefile|   2 +-
 mm/hmm_pt.c| 425 +
 4 files changed, 689 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/hmm_pt.h
 create mode 100644 mm/hmm_pt.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3ec87c4..4090e86 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4539,6 +4539,8 @@ L:linux...@kvack.org
 S: Maintained
 F: mm/hmm.c
 F: include/linux/hmm.h
+F: mm/hmm_pt.c
+F: include/linux/hmm_pt.h
 
 HOST AP DRIVER
 M: Jouni Malinen j...@w1.fi
diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
new file mode 100644
index 000..88fc519
--- /dev/null
+++ b/include/linux/hmm_pt.h
@@ -0,0 +1,261 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: Jérôme Glisse jgli...@redhat.com
+ */
+/*
+ * This provide a set of helpers for HMM page table. See include/linux/hmm.h
+ * for a description of what HMM is.
+ *
+ * HMM page table rely on a locking mecanism similar to CPU page table for page
+ * table update. It use the spinlock embedded inside the struct page to protect
+ * change to page table directory which should minimize lock contention for
+ * concurrent update.
+ *
+ * It does also provide a directory tree protection mechanism. Unlike CPU page
+ * table there is no mmap semaphore to protect directory tree from removal and
+ * this is done intentionaly so that concurrent removal/insertion of directory
+ * inside the tree can happen.
+ *
+ * So anyone walking down the page table must protect directory it traverses so
+ * they are not free by some other thread. This is done by using a reference
+ * counter for each directory. Before traversing a directory a reference is
+ * taken and once traversal is done the reference is drop.
+ *
+ * A directory entry dereference and refcount increment of sub-directory page
+ * must happen in a critical rcu section so that directory page removal can
+ * gracefully wait for all possible other threads that might have dereferenced
+ * the directory.
+ */
+#ifndef _HMM_PT_H
+#define _HMM_PT_H
+
+/*
+ * The HMM page table entry does not reflect any specific hardware. It is just
+ * a common entry format use by HMM internal and expose to HMM user so they can
+ * extract information out of HMM page table.
+ */
+#define HMM_PTE_VALID  (1  0)
+#define HMM_PTE_WRITE  (1  1)
+#define HMM_PTE_DIRTY  (1  2)
+#define HMM_PFN_SHIFT  4
+#define HMM_PFN_MASK   (~((dma_addr_t)((1  HMM_PFN_SHIFT) - 1)))
+
+static inline dma_addr_t hmm_pte_from_pfn(dma_addr_t pfn)
+{
+   return (pfn  HMM_PFN_SHIFT) | HMM_PTE_VALID;
+}
+
+static inline unsigned long hmm_pte_pfn(dma_addr_t pte)
+{
+   return pte  HMM_PFN_SHIFT;
+}
+
+#define HMM_PT_MAX_LEVEL   6
+
+/* struct hmm_pt - HMM page table structure.
+ *
+ * @mask: Array of address mask value of each level.
+ * @directory_mask: Mask for directory index (see below).
+ * @last: Last valid address (inclusive).
+ * @pgd: page global directory (top first level of the directory tree).
+ * @lock: Share lock if spinlock_t does not fit in struct page.
+ * @shift: Array of address shift