Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-09 Thread Peter Zijlstra
On Fri, 2008-05-09 at 20:55 +0200, Andrea Arcangeli wrote:
> On Fri, May 09, 2008 at 08:37:29PM +0200, Peter Zijlstra wrote:
> > Another possibility, would something like this work?
> > 
> >  
> >  /*
> >   * null out the begin function, no new begin calls can be made
> >   */
> >  rcu_assing_pointer(my_notifier.invalidate_start_begin, NULL); 
> > 
> >  /*
> >   * lock/unlock all rmap locks in any order - this ensures that any
> >   * pending start() will have its end() function called.
> >   */
> >  mm_barrier(mm);
> > 
> >  /*
> >   * now that no new start() call can be made and all start()/end() pairs
> >   * are complete we can remove the notifier.
> >   */
> >  mmu_notifier_remove(mm, my_notifier);
> > 
> > 
> > This requires a mmu_notifier instance per attached mm and that
> > __mmu_notifier_invalidate_range_start() uses rcu_dereference() to obtain
> > the function.
> > 
> > But I think its enough to ensure that:
> > 
> >   for each start an end will be called
> 
> We don't need that, it's perfectly ok if start is called but end is
> not, it's ok to unregister in the middle as I guarantee ->release is
> called before mmu_notifier_unregister returns (if ->release is needed
> at all, not the case for KVM/GRU).
> 
> Unregister is already solved with srcu/rcu without any additional
> complication as we don't need the guarantee that for each start an end
> will be called.
> 
> > It can however happen that end is called without start - but we could
> > handle that I think.
> 
> The only reason mm_lock() was introduced is to solve "register", to
> guarantee that for each end there was a start. We can't handle end
> called without start in the driver.
> 
> The reason the driver must be prevented to register in the middle of
> start/end, if that if it ever happens the driver has no way to know it
> must stop the secondary mmu page faults to call get_user_pages and
> instantiate sptes/secondarytlbs on pages that will be freed as soon as
> zap_page_range starts.

Right - then I got it backwards. Never mind me then..


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-09 Thread Peter Zijlstra
On Thu, 2008-05-08 at 09:11 -0700, Linus Torvalds wrote:
> 
> On Thu, 8 May 2008, Linus Torvalds wrote:
> > 
> > Also, we'd need to make it 
> > 
> > unsigned short flag:1;
> > 
> > _and_ change spinlock_types.h to make the spinlock size actually match the 
> > required size (right now we make it an "unsigned int slock" even when we 
> > actually only use 16 bits).
> 
> Btw, this is an issue only on 32-bit x86, because on 64-bit one we already 
> have the padding due to the alignment of the 64-bit pointers in the 
> list_head (so there's already empty space there).
> 
> On 32-bit, the alignment of list-head is obviously just 32 bits, so right 
> now the structure is "perfectly packed" and doesn't have any empty space. 
> But that's just because the spinlock is unnecessarily big.
> 
> (Of course, if anybody really uses NR_CPUS >= 256 on 32-bit x86, then the 
> structure really will grow. That's a very odd configuration, though, and 
> not one I feel we really need to care about).

Another possibility, would something like this work?

 
 /*
  * null out the begin function, no new begin calls can be made
  */
 rcu_assing_pointer(my_notifier.invalidate_start_begin, NULL); 

 /*
  * lock/unlock all rmap locks in any order - this ensures that any
  * pending start() will have its end() function called.
  */
 mm_barrier(mm);

 /*
  * now that no new start() call can be made and all start()/end() pairs
  * are complete we can remove the notifier.
  */
 mmu_notifier_remove(mm, my_notifier);


This requires a mmu_notifier instance per attached mm and that
__mmu_notifier_invalidate_range_start() uses rcu_dereference() to obtain
the function.

But I think its enough to ensure that:

  for each start an end will be called

It can however happen that end is called without start - but we could
handle that I think.




-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 01/10] emm: mm_lock: Lock a process against reclaim

2008-04-07 Thread Peter Zijlstra
On Sat, 2008-04-05 at 02:41 +0200, Andrea Arcangeli wrote:
> On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote:
> > I think you can break this if() down a bit:
> >
> > if (!(vma->vm_file && vma->vm_file->f_mapping))
> > continue;
> 
> It makes no difference at runtime, coding style preferences are quite
> subjective.

I'll have to concurr with Jeremy here, please break that monstrous if
stmt down. It might not matter to the compiler, but it sure as hell
helps for anyone trying to understand/maintain the thing.



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] EMM: disable other notifiers before register and unregister

2008-04-03 Thread Peter Zijlstra
On Wed, 2008-04-02 at 18:24 -0700, Christoph Lameter wrote:
> Ok lets forget about the single theaded thing to solve the registration 
> races. As Andrea pointed out this still has ssues with other subscribed 
> subsystems (and also try_to_unmap). We could do something like what 
> stop_machine_run does: First disable all running subsystems before 
> registering a new one.
> 
> Maybe this is a possible solution.
> 
> 
> Subject: EMM: disable other notifiers before register and unregister
> 
> As Andrea has pointed out: There are races during registration if other
> subsystem notifiers are active while we register a callback.
> 
> Solve that issue by adding two new notifiers:
> 
> emm_stop
>   Stops the notifier operations. Notifier must block on
>   invalidate_start and emm_referenced from this point on.
>   If an invalidate_start has not been completed by a call
>   to invalidate_end then the driver must wait until the
>   operation is complete before returning.
> 
> emm_start
>   Restart notifier operations.

Please use pause and resume or something like that. stop-start is an
unnatural order; we usually start before we stop, whereas we pause first
and resume later.

> Before registration all other subscribed subsystems are stopped.
> Then the new subsystem is subscribed and things can get running
> without consistency issues.
> 
> Subsystems are restarted after the lists have been updated.
> 
> This also works for unregistering. If we can get all subsystems
> to stop then we can also reliably unregister a subsystem. So
> provide that callback.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> ---
>  include/linux/rmap.h |   10 +++---
>  mm/rmap.c|   30 ++
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/include/linux/rmap.h
> ===
> --- linux-2.6.orig/include/linux/rmap.h   2008-04-02 18:16:07.906032549 
> -0700
> +++ linux-2.6/include/linux/rmap.h2008-04-02 18:17:10.291070009 -0700
> @@ -94,7 +94,9 @@ enum emm_operation {
>   emm_release,/* Process exiting, */
>   emm_invalidate_start,   /* Before the VM unmaps pages */
>   emm_invalidate_end, /* After the VM unmapped pages */
> - emm_referenced  /* Check if a range was referenced */
> + emm_referenced, /* Check if a range was referenced */
> + emm_stop,   /* Halt all faults/invalidate_starts */
> + emm_start,  /* Restart operations */
>  };
>  
>  struct emm_notifier {
> @@ -126,13 +128,15 @@ static inline int emm_notify(struct mm_s
>  
>  /*
>   * Register a notifier with an mm struct. Release occurs when the process
> - * terminates by calling the notifier function with emm_release.
> + * terminates by calling the notifier function with emm_release or when
> + * emm_notifier_unregister is called.
>   *
>   * Must hold the mmap_sem for write.
>   */
>  extern void emm_notifier_register(struct emm_notifier *e,
>   struct mm_struct *mm);
> -
> +extern void emm_notifier_unregister(struct emm_notifier *e,
> + struct mm_struct *mm);
>  
>  /*
>   * Called from mm/vmscan.c to handle paging out
> Index: linux-2.6/mm/rmap.c
> ===
> --- linux-2.6.orig/mm/rmap.c  2008-04-02 18:16:09.378057062 -0700
> +++ linux-2.6/mm/rmap.c   2008-04-02 18:16:10.710079201 -0700
> @@ -289,16 +289,46 @@ void emm_notifier_release(struct mm_stru
>  /* Register a notifier */
>  void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm)
>  {
> + /* Bring all other notifiers into a quiescent state */
> + emm_notify(mm, emm_stop, 0, TASK_SIZE);
> +
>   e->next = mm->emm_notifier;
> +
>   /*
>* The update to emm_notifier (e->next) must be visible
>* before the pointer becomes visible.
>* rcu_assign_pointer() does exactly what we need.
>*/
>   rcu_assign_pointer(mm->emm_notifier, e);
> +
> + /* Continue notifiers */
> + emm_notify(mm, emm_start, 0, TASK_SIZE);
>  }
>  EXPORT_SYMBOL_GPL(emm_notifier_register);
>  
> +/* Unregister a notifier */
> +void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm)
> +{
> + struct emm_notifier *p;
> +
> + emm_notify(mm, emm_stop, 0, TASK_SIZE);
> +
> + p = mm->emm_notifier;
> + if (e == p)
> + mm->emm_notifier = e->next;
> + else {
> + while (p->next != e)
> + p = p->next;
> +
> + p->next = e->next;
> + }
> + e->next = mm->emm_notifier;
> +
> + emm_notify(mm, emm_start, 0, TASK_SIZE);
> + e->callback(e, mm, emm_release, 0, TASK_SIZE);
> +}
> +EXPORT_SYMBOL_GPL(emm_notifier_unregister);
> +
>  /*
>   * Perform a callback
>   *
> 



Re: [kvm-devel] EMM: Fixup return value handling of emm_notify()

2008-04-03 Thread Peter Zijlstra
On Wed, 2008-04-02 at 14:33 -0700, Christoph Lameter wrote:
> On Wed, 2 Apr 2008, Andrea Arcangeli wrote:
> 
> > but anyway it's silly to be hardwired to such an interface that worst
> > of all requires switch statements instead of proper pointer to
> > functions and a fixed set of parameters and retval semantics for all
> > methods.
> 
> The EMM API with a single callback is the simplest approach at this point. 
> A common callback for all operations allows the driver to implement common 
> entry and exit code as seen in XPMem.

It seems to me that common code can be shared using functions? No need
to stuff everything into a single function. We have method vectors all
over the kernel, we could do a_ops as a single callback too, but we
dont.

FWIW I prefer separate methods.

> I guess we can complicate this more by switching to a different API or 
> adding additional emm_xxx() callback if need be but I really want to have 
> a strong case for why this would be needed. There is the danger of 
> adding frills with special callbacks in this and that situation that could 
> make the notifier complicated and specific to a certain usage scenario. 
> 
> Having this generic simple interface will hopefully avoid such things.
> 
> 


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

2008-03-04 Thread Peter Zijlstra

FWIW, I'll cut the kvm and openfabrics lists from any future posts.
I'm getting tired of the bounces.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Can Linux kernel handle unsynced TSC?

2008-02-29 Thread Peter Zijlstra

On Fri, 2008-02-29 at 22:20 +0800, Zhao Forrest wrote:
> On 2/29/08, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> >
> > On Fri, 2008-02-29 at 16:55 +0800, Zhao Forrest wrote:
> > > Sorry for reposting it.
> > >
> > > For example,
> > > 1 rdtsc() is invoked on CPU0
> > > 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1
> > > 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee
> > > that the second rdtsc() doesn't return a value smaller than the one
> > > returned by the first rdtsc()?
> >
> > No, rdtsc() goes directly to the hardware. You need a (preferably cheap)
> > clock abstraction layer on top if you need this.
> 
> Thank you for the clarification. I think gettimeofday() is such kind
> of clock abstraction layer, am I right?

Yes, gtod is one such a layer, however it fails the 'cheap' test for
many definitions of cheap.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Can Linux kernel handle unsynced TSC?

2008-02-29 Thread Peter Zijlstra

On Fri, 2008-02-29 at 16:55 +0800, Zhao Forrest wrote:
> Sorry for reposting it.
> 
> For example,
> 1 rdtsc() is invoked on CPU0
> 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1
> 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee
> that the second rdtsc() doesn't return a value smaller than the one
> returned by the first rdtsc()?

No, rdtsc() goes directly to the hardware. You need a (preferably cheap)
clock abstraction layer on top if you need this.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel