Re: [PATCH v3 02/14] mm/mmu_notifier: add an interval tree notifier

2019-11-25 Thread Jason Gunthorpe
On Fri, Nov 22, 2019 at 04:54:08PM -0800, Ralph Campbell wrote:

> Actually, I think you can remove the "need_wake" variable since it is
> unconditionally set to "true".

Oh, yes, thank you. An earlier revision had a different control flow
 
> Also, the comment in__mmu_interval_notifier_insert() says
> "mni->mr_invalidate_seq" and I think that should be
> "mni->invalidate_seq".

Got it.

I squashed this in:

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index b3a064b3b31807..30abbfdc25be55 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -129,7 +129,6 @@ static void mn_itree_inv_end(struct mmu_notifier_mm *mmn_mm)
 {
struct mmu_interval_notifier *mni;
struct hlist_node *next;
-   bool need_wake = false;
 
spin_lock(_mm->lock);
if (--mmn_mm->active_invalidate_ranges ||
@@ -140,7 +139,6 @@ static void mn_itree_inv_end(struct mmu_notifier_mm *mmn_mm)
 
/* Make invalidate_seq even */
mmn_mm->invalidate_seq++;
-   need_wake = true;
 
/*
 * The inv_end incorporates a deferred mechanism like rtnl_unlock().
@@ -160,8 +158,7 @@ static void mn_itree_inv_end(struct mmu_notifier_mm *mmn_mm)
}
spin_unlock(_mm->lock);
 
-   if (need_wake)
-   wake_up_all(_mm->wq);
+   wake_up_all(_mm->wq);
 }
 
 /**
@@ -884,7 +881,7 @@ static int __mmu_interval_notifier_insert(
 * possibility for live lock, instead defer the add to
 * mn_itree_inv_end() so this algorithm is deterministic.
 *
-* In all cases the value for the mni->mr_invalidate_seq should be
+* In all cases the value for the mni->invalidate_seq should be
 * odd, see mmu_interval_read_begin()
 */
spin_lock(_mm->lock);

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 02/14] mm/mmu_notifier: add an interval tree notifier

2019-11-22 Thread Ralph Campbell


On 11/13/19 8:46 AM, Jason Gunthorpe wrote:

On Wed, Nov 13, 2019 at 05:59:52AM -0800, Christoph Hellwig wrote:

+int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni,
+ struct mm_struct *mm, unsigned long start,
+ unsigned long length,
+ const struct mmu_interval_notifier_ops 
*ops);
+int mmu_interval_notifier_insert_locked(
+   struct mmu_interval_notifier *mni, struct mm_struct *mm,
+   unsigned long start, unsigned long length,
+   const struct mmu_interval_notifier_ops *ops);


Very inconsistent indentation between these two related functions.


clang-format.. The kernel config is set to prefer a line up under the
( if all the arguments will fit within the 80 cols otherwise it does a
1 tab continuation indent.


+   /*
+* The inv_end incorporates a deferred mechanism like
+* rtnl_unlock(). Adds and removes are queued until the final inv_end
+* happens then they are progressed. This arrangement for tree updates
+* is used to avoid using a blocking lock during
+* invalidate_range_start.


Nitpick:  That comment can be condensed into one less line:


The rtnl_unlock can move up a line too. My editor is failing me on
this.


+   /*
+* TODO: Since we already have a spinlock above, this would be faster
+* as wake_up_q
+*/
+   if (need_wake)
+   wake_up_all(_mm->wq);


So why is this important enough for a TODO comment, but not important
enough to do right away?


Lets drop the comment, I'm noto sure wake_up_q is even a function this
layer should be calling.


Actually, I think you can remove the "need_wake" variable since it is
unconditionally set to "true".

Also, the comment in__mmu_interval_notifier_insert() says
"mni->mr_invalidate_seq" and I think that should be
"mni->invalidate_seq".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 02/14] mm/mmu_notifier: add an interval tree notifier

2019-11-13 Thread Jason Gunthorpe
On Wed, Nov 13, 2019 at 05:59:52AM -0800, Christoph Hellwig wrote:
> > +int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni,
> > + struct mm_struct *mm, unsigned long start,
> > + unsigned long length,
> > + const struct mmu_interval_notifier_ops 
> > *ops);
> > +int mmu_interval_notifier_insert_locked(
> > +   struct mmu_interval_notifier *mni, struct mm_struct *mm,
> > +   unsigned long start, unsigned long length,
> > +   const struct mmu_interval_notifier_ops *ops);
> 
> Very inconsistent indentation between these two related functions.

clang-format.. The kernel config is set to prefer a line up under the
( if all the arguments will fit within the 80 cols otherwise it does a
1 tab continuation indent.

> > +   /*
> > +* The inv_end incorporates a deferred mechanism like
> > +* rtnl_unlock(). Adds and removes are queued until the final inv_end
> > +* happens then they are progressed. This arrangement for tree updates
> > +* is used to avoid using a blocking lock during
> > +* invalidate_range_start.
> 
> Nitpick:  That comment can be condensed into one less line:

The rtnl_unlock can move up a line too. My editor is failing me on
this.

> > +   /*
> > +* TODO: Since we already have a spinlock above, this would be faster
> > +* as wake_up_q
> > +*/
> > +   if (need_wake)
> > +   wake_up_all(_mm->wq);
> 
> So why is this important enough for a TODO comment, but not important
> enough to do right away?

Lets drop the comment, I'm noto sure wake_up_q is even a function this
layer should be calling.
 
> > +* release semantics on the initialization of the mmu_notifier_mm's
> > + * contents are provided for unlocked readers.  acquire can only be
> > + * used while holding the mmgrab or mmget, and is safe because once
> > + * created the mmu_notififer_mm is not freed until the mm is
> > + * destroyed.  As above, users holding the mmap_sem or one of the
> > + * mm_take_all_locks() do not need to use acquire semantics.
> >  */
> 
> Some spaces instead of tabs here.

Got it

> > +static int __mmu_interval_notifier_insert(
> > +   struct mmu_interval_notifier *mni, struct mm_struct *mm,
> > +   struct mmu_notifier_mm *mmn_mm, unsigned long start,
> > +   unsigned long length, const struct mmu_interval_notifier_ops *ops)
> 
> Odd indentation - we usuall do two tabs (my preference) or align after
> the opening brace.

This is one tab. I don't think one tab is odd, it seems pretty popular
even just in mm/.

But two tabs is considered usual? I didn't even know that.

> > + * This function must be paired with mmu_interval_notifier_insert(). It 
> > cannot be
> 
> line > 80 chars.

got it, was missed during the rename

> Otherwise this looks good and very similar to what I reviewed earlier:
> 
> Reviewed-by: Christoph Hellwig 

Thanks a bunch, your comments have been very helpful on this series!

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 02/14] mm/mmu_notifier: add an interval tree notifier

2019-11-13 Thread Christoph Hellwig
> +int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni,
> +  struct mm_struct *mm, unsigned long start,
> +  unsigned long length,
> +  const struct mmu_interval_notifier_ops *ops);
> +int mmu_interval_notifier_insert_locked(
> + struct mmu_interval_notifier *mni, struct mm_struct *mm,
> + unsigned long start, unsigned long length,
> + const struct mmu_interval_notifier_ops *ops);

Very inconsistent indentation between these two related functions.

> + /*
> +  * The inv_end incorporates a deferred mechanism like
> +  * rtnl_unlock(). Adds and removes are queued until the final inv_end
> +  * happens then they are progressed. This arrangement for tree updates
> +  * is used to avoid using a blocking lock during
> +  * invalidate_range_start.

Nitpick:  That comment can be condensed into one less line:

/*
 * The inv_end incorporates a deferred mechanism like rtnl_unlock().
 * Adds and removes are queued until the final inv_end happens then
 * they are progressed. This arrangement for tree updates is used to
 * avoid using a blocking lock during invalidate_range_start.
 */

> + /*
> +  * TODO: Since we already have a spinlock above, this would be faster
> +  * as wake_up_q
> +  */
> + if (need_wake)
> + wake_up_all(_mm->wq);

So why is this important enough for a TODO comment, but not important
enough to do right away?

> +  * release semantics on the initialization of the mmu_notifier_mm's
> + * contents are provided for unlocked readers.  acquire can only be
> + * used while holding the mmgrab or mmget, and is safe because once
> + * created the mmu_notififer_mm is not freed until the mm is
> + * destroyed.  As above, users holding the mmap_sem or one of the
> + * mm_take_all_locks() do not need to use acquire semantics.
>*/

Some spaces instead of tabs here.

> +static int __mmu_interval_notifier_insert(
> + struct mmu_interval_notifier *mni, struct mm_struct *mm,
> + struct mmu_notifier_mm *mmn_mm, unsigned long start,
> + unsigned long length, const struct mmu_interval_notifier_ops *ops)

Odd indentation - we usuall do two tabs (my preference) or align after
the opening brace.

> + * This function must be paired with mmu_interval_notifier_insert(). It 
> cannot be

line > 80 chars.

Otherwise this looks good and very similar to what I reviewed earlier:

Reviewed-by: Christoph Hellwig 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v3 02/14] mm/mmu_notifier: add an interval tree notifier

2019-11-12 Thread Jason Gunthorpe
From: Jason Gunthorpe 

Of the 13 users of mmu_notifiers, 8 of them use only
invalidate_range_start/end() and immediately intersect the
mmu_notifier_range with some kind of internal list of VAs.  4 use an
interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
of some kind (scif_dma, vhost, gntdev, hmm)

And the remaining 5 either don't use invalidate_range_start() or do some
special thing with it.

It turns out that building a correct scheme with an interval tree is
pretty complicated, particularly if the use case is synchronizing against
another thread doing get_user_pages().  Many of these implementations have
various subtle and difficult to fix races.

This approach puts the interval tree as common code at the top of the mmu
notifier call tree and implements a shareable locking scheme.

It includes:
 - An interval tree tracking VA ranges, with per-range callbacks
 - A read/write locking scheme for the interval tree that avoids
   sleeping in the notifier path (for OOM killer)
 - A sequence counter based collision-retry locking scheme to tell
   device page fault that a VA range is being concurrently invalidated.

This is based on various ideas:
- hmm accumulates invalidated VA ranges and releases them when all
  invalidates are done, via active_invalidate_ranges count.
  This approach avoids having to intersect the interval tree twice (as
  umem_odp does) at the potential cost of a longer device page fault.

- kvm/umem_odp use a sequence counter to drive the collision retry,
  via invalidate_seq

- a deferred work todo list on unlock scheme like RTNL, via deferred_list.
  This makes adding/removing interval tree members more deterministic

- seqlock, except this version makes the seqlock idea multi-holder on the
  write side by protecting it with active_invalidate_ranges and a spinlock

To minimize MM overhead when only the interval tree is being used, the
entire SRCU and hlist overheads are dropped using some simple
branches. Similarly the interval tree overhead is dropped when in hlist
mode.

The overhead from the mandatory spinlock is broadly the same as most of
existing users which already had a lock (or two) of some sort on the
invalidation path.

Acked-by: Christian König 
Tested-by: Philip Yang 
Tested-by: Ralph Campbell 
Reviewed-by: John Hubbard 
Signed-off-by: Jason Gunthorpe 
---
 include/linux/mmu_notifier.h | 101 +++
 mm/Kconfig   |   1 +
 mm/mmu_notifier.c| 552 +--
 3 files changed, 628 insertions(+), 26 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 12bd603d318ce7..9e6caa8ecd1938 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -6,10 +6,12 @@
 #include 
 #include 
 #include 
+#include 
 
 struct mmu_notifier_mm;
 struct mmu_notifier;
 struct mmu_notifier_range;
+struct mmu_interval_notifier;
 
 /**
  * enum mmu_notifier_event - reason for the mmu notifier callback
@@ -32,6 +34,9 @@ struct mmu_notifier_range;
  * access flags). User should soft dirty the page in the end callback to make
  * sure that anyone relying on soft dirtyness catch pages that might be written
  * through non CPU mappings.
+ *
+ * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
+ * that the mm refcount is zero and the range is no longer accessible.
  */
 enum mmu_notifier_event {
MMU_NOTIFY_UNMAP = 0,
@@ -39,6 +44,7 @@ enum mmu_notifier_event {
MMU_NOTIFY_PROTECTION_VMA,
MMU_NOTIFY_PROTECTION_PAGE,
MMU_NOTIFY_SOFT_DIRTY,
+   MMU_NOTIFY_RELEASE,
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
@@ -222,6 +228,26 @@ struct mmu_notifier {
unsigned int users;
 };
 
+/**
+ * struct mmu_interval_notifier_ops
+ * @invalidate: Upon return the caller must stop using any SPTEs within this
+ *  range. This function can sleep. Return false only if sleeping
+ *  was required but mmu_notifier_range_blockable(range) is false.
+ */
+struct mmu_interval_notifier_ops {
+   bool (*invalidate)(struct mmu_interval_notifier *mni,
+  const struct mmu_notifier_range *range,
+  unsigned long cur_seq);
+};
+
+struct mmu_interval_notifier {
+   struct interval_tree_node interval_tree;
+   const struct mmu_interval_notifier_ops *ops;
+   struct mm_struct *mm;
+   struct hlist_node deferred_item;
+   unsigned long invalidate_seq;
+};
+
 #ifdef CONFIG_MMU_NOTIFIER
 
 #ifdef CONFIG_LOCKDEP
@@ -263,6 +289,81 @@ extern int __mmu_notifier_register(struct mmu_notifier *mn,
   struct mm_struct *mm);
 extern void mmu_notifier_unregister(struct mmu_notifier *mn,
struct mm_struct *mm);
+
+unsigned long mmu_interval_read_begin(struct mmu_interval_notifier *mni);
+int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni,
+