Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-29 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 05:03:01PM -0800, Christoph Lameter wrote:
 I thought you wanted to get rid of the sync via pte lock?

Sure. _notify is happening inside the pt lock by coincidence, to
reduce the changes to mm/* as long as the mmu notifiers aren't
sleep capable.

 What changes to do_wp_page do you envision?

Converting it to invalidate_range_begin/end.

 What is the trouble with the current do_wp_page modifications? There is 
 no need for invalidate_page() there so far. invalidate_range() does the 
 trick there.

No trouble, it's just that I didn't want to mangle over the logic of
do_wp_page unless it was strictly required, the patch has to be
obviously safe. You need to keep that bit of your patch to make the
mmu notifiers sleepable.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-29 Thread Christoph Lameter
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:

 On Thu, Feb 28, 2008 at 05:03:01PM -0800, Christoph Lameter wrote:
  I thought you wanted to get rid of the sync via pte lock?
 
 Sure. _notify is happening inside the pt lock by coincidence, to
 reduce the changes to mm/* as long as the mmu notifiers aren't
 sleep capable.

Ok if this is a coincidence then it would be better to separate the 
notifier callouts from the pte macro calls.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Christoph Lameter
On Wed, 27 Feb 2008, Andrea Arcangeli wrote:

 What Christoph need to do when he's back from vacations to support
 sleepable mmu notifiers is to add a CONFIG_XPMEM config option that
 will switch the i_mmap_lock from a semaphore to a mutex (any other
 change to this patch will be minor compared to that) so XPMEM hardware
 will have kernels compiled that way. I don't see other sane ways to
 remove the atomic parameter from the API (apparently required by
 Andrew for merging something not restricted to the xpmem current usage
 with only anonymous memory) and I don't want to have such a
 locking-change intrusive dependency for all other non-blocking users
 that are fine without having to alter how the VM works (for example
 KVM and GRU). Very minor changes will be required to this patch to
 make it work after the VM locking will be altered (for example the
 CONFIG_XPMEM should also switch the mmu_register/unregister locking
 from RCU to mutex as well). XPMEM then will only compile if
 CONFIG_XPMEM=y and in turn the invalidate_range_* will support
 scheduling inside.

This is not going to work even if the mutex would work as easily as you 
think since the patch here still does an rcu_lock/unlock around a callback.

 I don't think pretending to merge all in one block (I mean including
 xpmem support that requires blocking methods) is good idea anymore as
 long as we agree the atomic parameter shouldn't be merged. But we
 can quite easily agree on the below to be optimal for GRU/KVM and
 trivially extendible once a CONFIG_XPMEM will be added. So this first
 part can go in now I think.

Changing the locking for the callouts for users of the mmu notivier that 
f.e. require a response via the network (RDMA, XPMEM etc) is not trivial 
at all. RCU lock cannot be used. So we are looking at totally disjunct 
methods for those users who have to sleep.

 +struct mmu_notifier_ops {
 + /*
 +  * Called when nobody can register any more notifier in the mm
 +  * and after the mn notifier has been disarmed already.
 +  */
 + void (*release)(struct mmu_notifier *mn,
 + struct mm_struct *mm);

Who disarms the notifier? Why is the method not called to disarm the 
notifier on exit?

 +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
 --- a/mm/filemap_xip.c
 +++ b/mm/filemap_xip.c
 @@ -194,7 +194,7 @@ __xip_unmap (struct address_space * mapp
   if (pte) {
   /* Nuke the page table entry. */
   flush_cache_page(vma, address, pte_pfn(*pte));
 - pteval = ptep_clear_flush(vma, address, pte);
 + pteval = ptep_clear_flush_notify(vma, address, pte);
   page_remove_rmap(page, vma);
   dec_mm_counter(mm, file_rss);
   BUG_ON(pte_dirty(pteval));

Well a bit better but now we have to modify both the macro and the code 
in teh VM. It would be easier to put the notify call in here.

 @@ -2048,6 +2050,7 @@ void exit_mmap(struct mm_struct *mm)
   vm_unacct_memory(nr_accounted);
   free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
   tlb_finish_mmu(tlb, 0, end);
 + mmu_notifier_release(mm);

The release should be called much earlier to allow the driver to release 
all resources in one go. This way each vma must be processed individually. 
For our gobs of memory this method may create a scaling problem on exit().


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 11:48:10AM -0800, Christoph Lameter wrote:
  make it work after the VM locking will be altered (for example the
   ^^^
  CONFIG_XPMEM should also switch the mmu_register/unregister locking
^^^
  from RCU to mutex as well). XPMEM then will only compile if
^
  CONFIG_XPMEM=y and in turn the invalidate_range_* will support
  scheduling inside.
 
 This is not going to work even if the mutex would work as easily as you 
 think since the patch here still does an rcu_lock/unlock around a callback.

See underlined.

  +struct mmu_notifier_ops {
  +   /*
  +* Called when nobody can register any more notifier in the mm
  +* and after the mn notifier has been disarmed already.
  +*/
  +   void (*release)(struct mmu_notifier *mn,
  +   struct mm_struct *mm);
 
 Who disarms the notifier? Why is the method not called to disarm the 
 notifier on exit?

The notifier is auto-disarmed by mmu_notifier_release, your patch
works the same way. -release is further called just in case anybody
wants to know the notifier was disarmed.

  @@ -2048,6 +2050,7 @@ void exit_mmap(struct mm_struct *mm)
  vm_unacct_memory(nr_accounted);
  free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
  tlb_finish_mmu(tlb, 0, end);
  +   mmu_notifier_release(mm);
 
 The release should be called much earlier to allow the driver to release 
 all resources in one go. This way each vma must be processed individually. 
 For our gobs of memory this method may create a scaling problem on exit().

Good point, it has to be called earlier for GRU, but it's not a
performance issue. GRU doesn't pin the pages so it should make the
global invalidate in -release _before_ unmap_vmas. Linux can't fault
in the ptes anymore because mm_users is zero so there's no need of a
-release_begin/end, the _begin is enough.

In #v6 I was invalidating inside unmap_vmas so it was ok. The
performance issues you're talking about refers to #v6 I guess, for #v7
there's a single call.

Thanks!

diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2039,6 +2039,7 @@ void exit_mmap(struct mm_struct *mm)
unsigned long end;
 
/* mm's last user has gone, and its about to be pulled down */
+   mmu_notifier_release(mm);
arch_exit_mmap(mm);
 
lru_add_drain();
@@ -2050,7 +2051,6 @@ void exit_mmap(struct mm_struct *mm)
vm_unacct_memory(nr_accounted);
free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
tlb_finish_mmu(tlb, 0, end);
-   mmu_notifier_release(mm);
 
/*
 * Walk the list again, actually closing and freeing it,

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Christoph Lameter
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:

  This is not going to work even if the mutex would work as easily as you 
  think since the patch here still does an rcu_lock/unlock around a callback.
 
 See underlined.

Mutex is not acceptable for performance reasons. I think we can just drop 
the RCU lock if we simply unregister the mmu notifier in release and 
forbid the drivers from removing themselves from the notification 
chain. They can simply do nothing until release. At that time there is no 
concurrency and thus its safe to remove even without rcu locking.

 Good point, it has to be called earlier for GRU, but it's not a
 performance issue. GRU doesn't pin the pages so it should make the
 global invalidate in -release _before_ unmap_vmas. Linux can't fault
 in the ptes anymore because mm_users is zero so there's no need of a
 -release_begin/end, the _begin is enough.

I do not follow you about the _begin without end but the following fix 
seems okay.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Christoph Lameter
On Wed, 27 Feb 2008, Andrea Arcangeli wrote:

 +struct mmu_notifier_head {
 + struct hlist_head head;
 + spinlock_t lock;
 +};

Still think that the lock here is not of too much use and can be easily 
replaced by mmap_sem.

 +#define mmu_notifier(function, mm, args...)  \
 + do {\
 + struct mmu_notifier *__mn;  \
 + struct hlist_node *__n; \
 + \
 + if (unlikely(!hlist_empty((mm)-mmu_notifier.head))) { \
 + rcu_read_lock();\
 + hlist_for_each_entry_rcu(__mn, __n, \
 +  (mm)-mmu_notifier.head, \
 +  hlist) \
 + if (__mn-ops-function)\
 + __mn-ops-function(__mn,   \
 + mm, \
 + args);  \
 + rcu_read_unlock();  \
 + }   \
 + } while (0)

Andrew recomended local variables for parameters used multile times. This 
means the mm parameter here.

 +/*
 + * Notifiers that use the parameters that they were passed so that the
 + * compiler does not complain about unused variables but does proper
 + * parameter checks even if !CONFIG_MMU_NOTIFIER.
 + * Macros generate no code.
 + */
 +#define mmu_notifier(function, mm, args...) \
 + do {   \
 + if (0) {   \
 + struct mmu_notifier *__mn; \
 +\
 + __mn = (struct mmu_notifier *)(0x00ff);\
 + __mn-ops-function(__mn, mm, args);   \
 + }; \
 + } while (0)

Note also Andrew's comments on the use of 0x00ff...

 +/*
 + * No synchronization. This function can only be called when only a single
 + * process remains that performs teardown.
 + */
 +void mmu_notifier_release(struct mm_struct *mm)
 +{
 + struct mmu_notifier *mn;
 + struct hlist_node *n, *tmp;
 +
 + if (unlikely(!hlist_empty(mm-mmu_notifier.head))) {
 + hlist_for_each_entry_safe(mn, n, tmp,
 +   mm-mmu_notifier.head, hlist) {
 + hlist_del(mn-hlist);
 + if (mn-ops-release)
 + mn-ops-release(mn, mm);
 + }
 + }
 +}

One could avoid a hlist_for_each_entry_safe here by simply always deleting 
the first object. 

Also re the _notify variants: The binding to pte_clear_flush_young etc 
will become a problem for notifiers that want to sleep because 
pte_clear_flush is usually called with the pte lock held. See f.e. 
try_to_unmap_one, page_mkclean_one etc.

It would be better if the notifier calls could be moved outside of the 
pte lock.




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Jack Steiner
  The release should be called much earlier to allow the driver to release 
  all resources in one go. This way each vma must be processed individually. 
  For our gobs of memory this method may create a scaling problem on exit().
 
 Good point, it has to be called earlier for GRU, but it's not a
 performance issue. GRU doesn't pin the pages so it should make the
 global invalidate in -release _before_ unmap_vmas. Linux can't fault
 in the ptes anymore because mm_users is zero so there's no need of a
 -release_begin/end, the _begin is enough.
 

I disagree. The location of the callout IS a performance issue. In simple
comparisons of the 2 patches (Christoph's vs. Andrea's), Andrea's has a 7X
increase in the number of TLB purges being issued to the GRU. TLB flushing
is slow and can impact the performance of of tasks using the GRU.

--- jack

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 05:17:33PM -0600, Jack Steiner wrote:
 I disagree. The location of the callout IS a performance issue. In simple
 comparisons of the 2 patches (Christoph's vs. Andrea's), Andrea's has a 7X
 increase in the number of TLB purges being issued to the GRU. TLB flushing

Are you sure that you're referring to #v7?

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 03:05:30PM -0800, Christoph Lameter wrote:
 Still think that the lock here is not of too much use and can be easily 
 replaced by mmap_sem.

I can use the mmap_sem.

  +#define mmu_notifier(function, mm, args...)
  \
  +   do {\
  +   struct mmu_notifier *__mn;  \
  +   struct hlist_node *__n; \
  +   \
  +   if (unlikely(!hlist_empty((mm)-mmu_notifier.head))) { \
  +   rcu_read_lock();\
  +   hlist_for_each_entry_rcu(__mn, __n, \
  +(mm)-mmu_notifier.head, \
  +hlist) \
  +   if (__mn-ops-function)\
  +   __mn-ops-function(__mn,   \
  +   mm, \
  +   args);  \
  +   rcu_read_unlock();  \
  +   }   \
  +   } while (0)
 
 Andrew recomended local variables for parameters used multile times. This 
 means the mm parameter here.

I don't exactly see what buggy macro meant? I already use
parenthesis as needed to avoid the need of local variables to be
safe. Not really sure what's buggy, sorry!

 Note also Andrew's comments on the use of 0x00ff...

I thought I tried the (void) but it didn't work and your solution
worked, but perhaps I did something wrong, I'll try again with (void)
nevertheless.

  +/*
  + * No synchronization. This function can only be called when only a single
  + * process remains that performs teardown.
  + */
  +void mmu_notifier_release(struct mm_struct *mm)
  +{
  +   struct mmu_notifier *mn;
  +   struct hlist_node *n, *tmp;
  +
  +   if (unlikely(!hlist_empty(mm-mmu_notifier.head))) {
  +   hlist_for_each_entry_safe(mn, n, tmp,
  + mm-mmu_notifier.head, hlist) {
  +   hlist_del(mn-hlist);
  +   if (mn-ops-release)
  +   mn-ops-release(mn, mm);
  +   }
  +   }
  +}
 
 One could avoid a hlist_for_each_entry_safe here by simply always deleting 
 the first object. 

Agreed, the current construct come from the fact we previously didn't
assume nobody could ever call mmu_notifier_unregister by the time
mm_users is 0.

 Also re the _notify variants: The binding to pte_clear_flush_young etc 
 will become a problem for notifiers that want to sleep because 
 pte_clear_flush is usually called with the pte lock held. See f.e. 
 try_to_unmap_one, page_mkclean_one etc.

Calling __free_page out of the PT lock is much bigger
change. do_wp_page will require changes anyway when the sleepable
notifiers are merged.

 It would be better if the notifier calls could be moved outside of the 
 pte lock.

The point is that it can't make a difference right now, and my
objective was to avoid unnecessary source code duplication (later it
will be necessary, right now it isn't). By the time you rework
do_wp_page, removing _notify will be a very minor detail compared to
the rest of the changes to do_wp_page IMHO. Expanding it now won't
provide a real advantage later.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrew Morton
On Fri, 29 Feb 2008 01:40:01 +0100 Andrea Arcangeli [EMAIL PROTECTED] wrote:

   +#define mmu_notifier(function, mm, args...)  
   \
   + do {\
   + struct mmu_notifier *__mn;  \
   + struct hlist_node *__n; \
   + \
   + if (unlikely(!hlist_empty((mm)-mmu_notifier.head))) { \
   + rcu_read_lock();\
   + hlist_for_each_entry_rcu(__mn, __n, \
   +  (mm)-mmu_notifier.head, \
   +  hlist) \
   + if (__mn-ops-function)\
   + __mn-ops-function(__mn,   \
   + mm, \
   + args);  \
   + rcu_read_unlock();  \
   + }   \
   + } while (0)
  
  Andrew recomended local variables for parameters used multile times. This 
  means the mm parameter here.
 
 I don't exactly see what buggy macro meant?

multiple refernces to the argument, so

mmu_notifier(foo, bar(), zot);

will call bar() either once or twice.

Unlikely in this case, but bad practice.  Easily fixable by using another
temporary.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Christoph Lameter
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:

  Also re the _notify variants: The binding to pte_clear_flush_young etc 
  will become a problem for notifiers that want to sleep because 
  pte_clear_flush is usually called with the pte lock held. See f.e. 
  try_to_unmap_one, page_mkclean_one etc.
 
 Calling __free_page out of the PT lock is much bigger
 change. do_wp_page will require changes anyway when the sleepable
 notifiers are merged.

I thought you wanted to get rid of the sync via pte lock?
What changes to do_wp_page do you envision?

  It would be better if the notifier calls could be moved outside of the 
  pte lock.
 
 The point is that it can't make a difference right now, and my
 objective was to avoid unnecessary source code duplication (later it
 will be necessary, right now it isn't). By the time you rework
 do_wp_page, removing _notify will be a very minor detail compared to
 the rest of the changes to do_wp_page IMHO. Expanding it now won't
 provide a real advantage later.

What is the trouble with the current do_wp_page modifications? There is 
no need for invalidate_page() there so far. invalidate_range() does the 
trick there.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Christoph Lameter
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:

 On Thu, Feb 28, 2008 at 05:17:33PM -0600, Jack Steiner wrote:
  I disagree. The location of the callout IS a performance issue. In simple
  comparisons of the 2 patches (Christoph's vs. Andrea's), Andrea's has a 7X
  increase in the number of TLB purges being issued to the GRU. TLB flushing
 
 Are you sure that you're referring to #v7?

Jack: AFAICT Andrea moved the release callout and things will be 
fine in the next release.

 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
Hi Izik  kvm-devel,

Just wanted to remind that if we'll converge on #v7, the ksm code
in replace_page will have to call ptep_clear_flush_notify too (just
like do_wp_page).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Christoph Lameter
On Wed, 27 Feb 2008, Andrea Arcangeli wrote:

 I hope this will can be considered final for .25 and be merged. Risk
 is zero, the only discussion here is to make an API that will last
 forever, functionality-wise all these patches provides zero risk and
 zero overhead when MMU_NOTIFIER=n. This last patch covers KVM and GRU
 and hopefully all other non-blocking users optimally, and the below

Ok so it somehow works slowly with GRU and you are happy with it. What 
about the RDMA folks etc etc?

 API will hopefully last forever (but even if it lasts just for .25 and
 .26 is changed that's fine with us, it's a kernel _internal_ API
 anyway, there's absolutely nothing visible to userland).

Would it not be better to have a solution that fits all instead of hacking 
something in now and then having to modify it later?

  What Christoph need to do when he's back from vacations to support
 sleepable mmu notifiers is to add a CONFIG_XPMEM config option that
 will switch the i_mmap_lock from a semaphore to a mutex (any other
 change to this patch will be minor compared to that) so XPMEM hardware
 will have kernels compiled that way. I don't see other sane ways to
 remove the atomic parameter from the API (apparently required by
 Andrew for merging something not restricted to the xpmem current usage
 with only anonymous memory) and I don't want to have such a
 locking-change intrusive dependency for all other non-blocking users
 that are fine without having to alter how the VM works (for example
 KVM and GRU). Very minor changes will be required to this patch to
 make it work after the VM locking will be altered (for example the
 CONFIG_XPMEM should also switch the mmu_register/unregister locking
 from RCU to mutex as well). XPMEM then will only compile if
 CONFIG_XPMEM=y and in turn the invalidate_range_* will support
 scheduling inside.

Hmmm.. There were earlier discussions of changing the anon vma lock to a 
rw lock because of contention issues in large systems. Maybe we can just 
generally switch the locks taken while walking rmaps to semaphores? That 
would still require to put the invalidate outside of the pte lock.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 03:06:10PM -0800, Christoph Lameter wrote:
 Ok so it somehow works slowly with GRU and you are happy with it. What 

As far as GRU is concerned, performance is the same as with your patch
(Jack can confirm).

 about the RDMA folks etc etc?

If RDMA/IB folks needed to block in invalidate_range, I guess they
need to do so on top of tmpfs too, and that never worked with your
patch anyway.

 Would it not be better to have a solution that fits all instead of hacking 
 something in now and then having to modify it later?

The whole point is that your solution fits only GRU and KVM too.

XPMEM in your patch works in a hacked mode limited to anonymous memory
only, Robin already received incoming mail asking to allow xpmem to
work on more than anonymous memory, so your solution-that-fits-all
doesn't actually fit some of Robin's customer needs. So if it doesn't
even entirely satisfy xpmem users, imagine the other potential
blocking-users of this code.

 Hmmm.. There were earlier discussions of changing the anon vma lock to a 
 rw lock because of contention issues in large systems. Maybe we can just 
 generally switch the locks taken while walking rmaps to semaphores? That 
 would still require to put the invalidate outside of the pte lock.

anon_vma lock can remain a spinlock unless you also want to schedule
inside try_to_unmap.

If converting the i_mmap_lock to a mutex is a big trouble, another way
that might work to allow invalidate_range to block, would be to try to
boost the mm_users to prevent the mmu_notifier_release to run in
another cpu the moment after i_mmap_lock spinlock is unlocked. But
even if that works, it'll run slower and the mmu notifiers RCU locking
should be switched to a mutex, so it'd be nice to have it as a
separate option.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Christoph Lameter
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:

 If RDMA/IB folks needed to block in invalidate_range, I guess they
 need to do so on top of tmpfs too, and that never worked with your
 patch anyway.

How about blocking in invalidate_page()? It can be made to work...

  Would it not be better to have a solution that fits all instead of hacking 
  something in now and then having to modify it later?
 
 The whole point is that your solution fits only GRU and KVM too.

Well so we do not address the issues?
 
 XPMEM in your patch works in a hacked mode limited to anonymous memory
 only, Robin already received incoming mail asking to allow xpmem to
 work on more than anonymous memory, so your solution-that-fits-all
 doesn't actually fit some of Robin's customer needs. So if it doesn't
 even entirely satisfy xpmem users, imagine the other potential
 blocking-users of this code.

The solutions have been mentioned...

 anon_vma lock can remain a spinlock unless you also want to schedule
 inside try_to_unmap.

Either that or a separate rmap as also mentioned before.



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 04:08:07PM -0800, Christoph Lameter wrote:
 On Thu, 28 Feb 2008, Andrea Arcangeli wrote:
 
  If RDMA/IB folks needed to block in invalidate_range, I guess they
  need to do so on top of tmpfs too, and that never worked with your
  patch anyway.
 
 How about blocking in invalidate_page()? It can be made to work...

Yes, it can be made to work with even more extended VM changes than to
only allow invalidate_range to schedule. Those core VM changes should
only be done by default (w/o CONFIG_XPMEM=y), if they're doing good
to the VM regardless of xpmem requirements. And I'm not really sure of
that. I think they don't do any good or they would be a mutex
already...

 Well so we do not address the issues?

I'm not suggesting not to address the issues, just that those issues
requires VM core changes, and likely those changes should be
switchable under a CONFIG_XPMEM, so I see no reason to delay the mmu
notifier until those changes are done and merged too. It's kind of a
separate problem.

 Either that or a separate rmap as also mentioned before.

DRI also wants invalidate_page by (mm,addr).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Christoph Lameter
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:

 I'm not suggesting not to address the issues, just that those issues
 requires VM core changes, and likely those changes should be
 switchable under a CONFIG_XPMEM, so I see no reason to delay the mmu
 notifier until those changes are done and merged too. It's kind of a
 separate problem.

No its the core problem of the mmu notifier. It needs to be usable for a 
lot of scenarios.

 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel