Re: [patch] my mmu notifiers

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 02:09:41AM +0100, Andrea Arcangeli wrote:
> On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
> > Sorry, I realise I still didn't get this through my head yet (and also
> > have not seen your patch recently). So I don't know exactly what you
> > are doing...
> 
> The last version was posted here:
> 
> http://marc.info/?l=kvm-devel=120321732521533=2
> 
> > But why does _anybody_ (why does Christoph's patches) need to invalidate
> > when they are going to be more permissive? This should be done lazily by
> > the driver, I would have thought.
> 
> This can be done lazily by the driver yes. The place where I've an
> invalidate_pages in mprotect however can also become less permissive.

That's OK, because we have to flush tlbs there too.


> It's simpler to invalidate always and it's not guaranteed the
> secondary mmu page fault is capable of refreshing the spte across a
> writeprotect fault.

I think we just have to make sure that it _can_ do writeprotect
faults. AFAIKS, that will be possible if the driver registers a
.page_mkwrite handler (actually not quite -- page_mkwrite is fairly
crap, so I have a patch to merge it together with .fault so we get
address information as well). Anyway, I really think we should do
it that way.

> In the future this can be changed to
> mprotect_pages though, so no page fault will happen in the secondary
> mmu.

Possibly, but hopefully not needed for performance. Let's wait and
see.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-20 Thread Nick Piggin
On Tue, Feb 19, 2008 at 05:40:50PM -0600, Jack Steiner wrote:
> On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
> > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > > On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> > > > anything when changing the pte to be _more_ permissive, and I don't
> > > 
> > > Note that in my patch the invalidate_pages in mprotect can be
> > > trivially switched to a mprotect_pages with proper params. This will
> > > prevent page faults completely in the secondary MMU (there will only
> > > be tlb misses after the tlb flush just like for the core linux pte),
> > > and it'll allow all the secondary MMU pte blocks (512/1024 at time
> > > with my PT lock design) to be updated to have proper permissions
> > > matching the core linux pte.
> > 
> > Sorry, I realise I still didn't get this through my head yet (and also
> > have not seen your patch recently). So I don't know exactly what you
> > are doing...
> > 
> > But why does _anybody_ (why does Christoph's patches) need to invalidate
> > when they are going to be more permissive? This should be done lazily by
> > the driver, I would have thought.
> 
> 
> Agree. Although for most real applications, the performance difference
> is probably negligible.

But importantly, doing it that way means you share test coverage with
the CPU TLB flushing code, and you don't introduce a new concept to the
VM.

So, it _has_ to be lazy flushing, IMO (as there doesn't seem to be a
good reason otherwise). mprotect shouldn't really be a special case,
because it still has to flush the CPU tlbs as well when restricting
access.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-20 Thread Nick Piggin
On Tue, Feb 19, 2008 at 05:40:50PM -0600, Jack Steiner wrote:
 On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
  On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
   On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
anything when changing the pte to be _more_ permissive, and I don't
   
   Note that in my patch the invalidate_pages in mprotect can be
   trivially switched to a mprotect_pages with proper params. This will
   prevent page faults completely in the secondary MMU (there will only
   be tlb misses after the tlb flush just like for the core linux pte),
   and it'll allow all the secondary MMU pte blocks (512/1024 at time
   with my PT lock design) to be updated to have proper permissions
   matching the core linux pte.
  
  Sorry, I realise I still didn't get this through my head yet (and also
  have not seen your patch recently). So I don't know exactly what you
  are doing...
  
  But why does _anybody_ (why does Christoph's patches) need to invalidate
  when they are going to be more permissive? This should be done lazily by
  the driver, I would have thought.
 
 
 Agree. Although for most real applications, the performance difference
 is probably negligible.

But importantly, doing it that way means you share test coverage with
the CPU TLB flushing code, and you don't introduce a new concept to the
VM.

So, it _has_ to be lazy flushing, IMO (as there doesn't seem to be a
good reason otherwise). mprotect shouldn't really be a special case,
because it still has to flush the CPU tlbs as well when restricting
access.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 02:09:41AM +0100, Andrea Arcangeli wrote:
 On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
  Sorry, I realise I still didn't get this through my head yet (and also
  have not seen your patch recently). So I don't know exactly what you
  are doing...
 
 The last version was posted here:
 
 http://marc.info/?l=kvm-develm=120321732521533w=2
 
  But why does _anybody_ (why does Christoph's patches) need to invalidate
  when they are going to be more permissive? This should be done lazily by
  the driver, I would have thought.
 
 This can be done lazily by the driver yes. The place where I've an
 invalidate_pages in mprotect however can also become less permissive.

That's OK, because we have to flush tlbs there too.


 It's simpler to invalidate always and it's not guaranteed the
 secondary mmu page fault is capable of refreshing the spte across a
 writeprotect fault.

I think we just have to make sure that it _can_ do writeprotect
faults. AFAIKS, that will be possible if the driver registers a
.page_mkwrite handler (actually not quite -- page_mkwrite is fairly
crap, so I have a patch to merge it together with .fault so we get
address information as well). Anyway, I really think we should do
it that way.

 In the future this can be changed to
 mprotect_pages though, so no page fault will happen in the secondary
 mmu.

Possibly, but hopefully not needed for performance. Let's wait and
see.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
> On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> > > anything when changing the pte to be _more_ permissive, and I don't
> > 
> > Note that in my patch the invalidate_pages in mprotect can be
> > trivially switched to a mprotect_pages with proper params. This will
> > prevent page faults completely in the secondary MMU (there will only
> > be tlb misses after the tlb flush just like for the core linux pte),
> > and it'll allow all the secondary MMU pte blocks (512/1024 at time
> > with my PT lock design) to be updated to have proper permissions
> > matching the core linux pte.
> 
> Sorry, I realise I still didn't get this through my head yet (and also
> have not seen your patch recently). So I don't know exactly what you
> are doing...
> 
> But why does _anybody_ (why does Christoph's patches) need to invalidate
> when they are going to be more permissive? This should be done lazily by
> the driver, I would have thought.

I don't believe it should, but it probably does right now.  I do know
the case where a write fault where there is no need for a COW does
not call out on the PTE change.  I see no reason the others should not
handle this as well.  Just off the top of my head, I can only think of
the mprotect case needing to special case the more permissive state and
I don't think that changes PTEs at all, merely updates the VMA.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:52:06AM +0100, Andrea Arcangeli wrote:
> On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote:
> > OK (thanks to Robin as well). Now I understand why you are using it,
> > but I don't understand why you don't defer new TLBs after the point
> > where the linux pte changes. If you can do that, then you look and
> > act much more like a TLB from the point of view of the Linux vm.
> 
> Christoph was forced to put the invalidate_range callback _after_
> dropping the PT lock because xpmem has to wait I/O there. But
> invalidate_range is called after freeing the VM reference on the pages
> so then GRU needed a _range_begin too because GRU has to flush the tlb
> before the VM reference on the page is released (xpmem and KVM pin the
> pages mapped by the secondary mmu, GRU doesn't). So then
> invalidate_range was renamed to invalidate_range_end.

Currently, xpmem blocks faults for the range specified at the _begin
callout, then shoots down remote TLBs and does the put_page for all the
pages in the specified range.  The _end callout merely removes the block.
We do not do any wait for I/O.  By the time we return from the _begin
callout, all activity by the remotes is stopped, pages are dereferenced,
and future faults are blocked until released by the _end callout.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
> Sorry, I realise I still didn't get this through my head yet (and also
> have not seen your patch recently). So I don't know exactly what you
> are doing...

The last version was posted here:

http://marc.info/?l=kvm-devel=120321732521533=2

> But why does _anybody_ (why does Christoph's patches) need to invalidate
> when they are going to be more permissive? This should be done lazily by
> the driver, I would have thought.

This can be done lazily by the driver yes. The place where I've an
invalidate_pages in mprotect however can also become less permissive.
It's simpler to invalidate always and it's not guaranteed the
secondary mmu page fault is capable of refreshing the spte across a
writeprotect fault. In the future this can be changed to
mprotect_pages though, so no page fault will happen in the secondary
mmu.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote:
> On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote:
> > > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > > > understand the need for invalidate_begin/invalidate_end pairs at all.
> > > 
> > > The need of the pairs is crystal clear to me: range_begin is needed
> > > for GRU _but_only_if_ range_end is called after releasing the
> > > reference that the VM holds on the page. _begin will flush the GRU tlb
> > > and at the same time it will take a mutex that will block further GRU
> > > tlb-miss-interrupts (no idea how they manange those nightmare locking,
> > > I didn't even try to add more locking to KVM and I get away with the
> > > fact KVM takes the pin on the page itself).
> > 
> > As it turns out, no actual mutex is required. _begin_ simply increments a
> > count of active range invalidates, _end_ decrements the count. New TLB
> > dropins are deferred while range callouts are active.
> > 
> > This would appear to be racy but the GRU has special hardware that
> > simplifies locking. When the GRU sees a TLB invalidate, all outstanding
> > misses & potentially inflight TLB dropins are marked by the GRU with a
> > "kill" bit. When the dropin finally occurs, the dropin is ignored & the
> > instruction is simply restarted. The instruction will fault again & the TLB
> > dropin will be repeated.  This is optimized for the case where invalidates
> > are rare - true for users of the GRU.
> 
> OK (thanks to Robin as well). Now I understand why you are using it,
> but I don't understand why you don't defer new TLBs after the point
> where the linux pte changes. If you can do that, then you look and
> act much more like a TLB from the point of view of the Linux vm.

Christoph was forced to put the invalidate_range callback _after_
dropping the PT lock because xpmem has to wait I/O there. But
invalidate_range is called after freeing the VM reference on the pages
so then GRU needed a _range_begin too because GRU has to flush the tlb
before the VM reference on the page is released (xpmem and KVM pin the
pages mapped by the secondary mmu, GRU doesn't). So then
invalidate_range was renamed to invalidate_range_end.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 11:59:23PM +0100, Nick Piggin wrote:
> That's why I don't understand the need for the pairs: it should be
> done like this.

Yes, except it can't be done like this for xpmem.

> OK, I didn't see the invalidate_pages call...

See the last patch I posted to Andrew, you've probably looked at the
old patches, the old patches didn't work for GRU and didn't work for
xpmem and they weren't optimized to cluster the invalidates for each
4k-large-pte.

> I thought that could be used by a non-sleeping user (not intending
> to try supporting sleeping users). If it is useless then it should
> go away (BTW. I didn't see your recent patch, some of my confusion
> I think stems from Christoph's novel way of merging and splitting
> patches).

I kept improving my patch in case the VM maintainers would consider
xpmem requirements not workable from a linux-VM point of view, and
they preferred to have something obviously safe, strightforward and
non intrusive, despite it doesn't support the only sleeping user out
there I know of (xpmem). My patch supports KVM and GRU (and any other
not sleeping user).

> > No idea why xpmem needs range_begin, I perfectly understand why GRU
> > needs _begin with Chrisotph's patch (gru lacks the page pin) but I
> > dunno why xpmem needs range_begin (xpmem has the page pin so I also
> > think it could avoid using range_begin). Still to support GRU you need
> > both to call invalidate_range in places that can sleep and you need
> > the external rmap notifier. The moment you add xpmem into the equation
> > your and my clean patches become Christoph's one...
> 
> Sorry, I kind of didn't have time to follow the conversation so well
> before; are there patches posted for gru and/or xpmem?

There's some xpmem code posted but the posted one isn't using the mmu
notifiers yet. GRU code may be available from Jack. I only know for
sure their requirements in terms of mmu notifiers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Jack Steiner
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
> On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> > > anything when changing the pte to be _more_ permissive, and I don't
> > 
> > Note that in my patch the invalidate_pages in mprotect can be
> > trivially switched to a mprotect_pages with proper params. This will
> > prevent page faults completely in the secondary MMU (there will only
> > be tlb misses after the tlb flush just like for the core linux pte),
> > and it'll allow all the secondary MMU pte blocks (512/1024 at time
> > with my PT lock design) to be updated to have proper permissions
> > matching the core linux pte.
> 
> Sorry, I realise I still didn't get this through my head yet (and also
> have not seen your patch recently). So I don't know exactly what you
> are doing...
> 
> But why does _anybody_ (why does Christoph's patches) need to invalidate
> when they are going to be more permissive? This should be done lazily by
> the driver, I would have thought.


Agree. Although for most real applications, the performance difference
is probably negligible.

--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> > anything when changing the pte to be _more_ permissive, and I don't
> 
> Note that in my patch the invalidate_pages in mprotect can be
> trivially switched to a mprotect_pages with proper params. This will
> prevent page faults completely in the secondary MMU (there will only
> be tlb misses after the tlb flush just like for the core linux pte),
> and it'll allow all the secondary MMU pte blocks (512/1024 at time
> with my PT lock design) to be updated to have proper permissions
> matching the core linux pte.

Sorry, I realise I still didn't get this through my head yet (and also
have not seen your patch recently). So I don't know exactly what you
are doing...

But why does _anybody_ (why does Christoph's patches) need to invalidate
when they are going to be more permissive? This should be done lazily by
the driver, I would have thought.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote:
> > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > > understand the need for invalidate_begin/invalidate_end pairs at all.
> > 
> > The need of the pairs is crystal clear to me: range_begin is needed
> > for GRU _but_only_if_ range_end is called after releasing the
> > reference that the VM holds on the page. _begin will flush the GRU tlb
> > and at the same time it will take a mutex that will block further GRU
> > tlb-miss-interrupts (no idea how they manange those nightmare locking,
> > I didn't even try to add more locking to KVM and I get away with the
> > fact KVM takes the pin on the page itself).
> 
> As it turns out, no actual mutex is required. _begin_ simply increments a
> count of active range invalidates, _end_ decrements the count. New TLB
> dropins are deferred while range callouts are active.
> 
> This would appear to be racy but the GRU has special hardware that
> simplifies locking. When the GRU sees a TLB invalidate, all outstanding
> misses & potentially inflight TLB dropins are marked by the GRU with a
> "kill" bit. When the dropin finally occurs, the dropin is ignored & the
> instruction is simply restarted. The instruction will fault again & the TLB
> dropin will be repeated.  This is optimized for the case where invalidates
> are rare - true for users of the GRU.

OK (thanks to Robin as well). Now I understand why you are using it,
but I don't understand why you don't defer new TLBs after the point
where the linux pte changes. If you can do that, then you look and
act much more like a TLB from the point of view of the Linux vm.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> > are rather similar. However I have tried to make a point of minimising the
> > impact the the core mm/. I don't see why we need to invalidate or flush
> 
> I also tried hard to minimise the impact of the core mm/, I also
> argued with Christoph that cluttering mm/ wasn't a good idea for
> things like age_page that could be a 1 liner change instead of a
> multiple-liner change, without any loss of flexibility or readability.
> 
> > anything when changing the pte to be _more_ permissive, and I don't
> 
> Note that in my patch the invalidate_pages in mprotect can be
> trivially switched to a mprotect_pages with proper params. This will
> prevent page faults completely in the secondary MMU (there will only
> be tlb misses after the tlb flush just like for the core linux pte),
> and it'll allow all the secondary MMU pte blocks (512/1024 at time
> with my PT lock design) to be updated to have proper permissions
> matching the core linux pte.
> 
> > understand the need for invalidate_begin/invalidate_end pairs at all.
> 
> The need of the pairs is crystal clear to me: range_begin is needed
> for GRU _but_only_if_ range_end is called after releasing the
> reference that the VM holds on the page. _begin will flush the GRU tlb
> and at the same time it will take a mutex that will block further GRU
> tlb-miss-interrupts (no idea how they manange those nightmare locking,
> I didn't even try to add more locking to KVM and I get away with the
> fact KVM takes the pin on the page itself).
> 
> My patch calls invalidate_page/pages before the reference is released
> on the page, so GRU will work fine despite lack of
> range_begin. Furthermore with my patch GRU will be auto-serialized by
> the PT lock w/o the need of any additional locking.

That's why I don't understand the need for the pairs: it should be
done like this.


> > What I have done is basically create it so that the notifiers get called
> > basically in the same place as the normal TLB flushing is done, and nowhere
> > else.
> 
> That was one of my objectives too.
> 
> > I also wanted to avoid calling notifier code from inside eg. hardware TLB
> > or pte manipulation primitives. These things are already pretty well
> > spaghetti, so I'd like to just place them right where needed first... I
> > think eventually it will need a bit of a rethink to make it more consistent
> > and more general. But I prefer to do put them in the caller for the moment.
> 
> Your patch should also work for KVM but it's suboptimal, my patch can
> be orders of magnitude more efficient for GRU thanks to the
> invalidate_pages optimization. Christoph complained about having to
> call one method per pte.

OK, I didn't see the invalidate_pages call...

 
> And adding invalidate_range is useless unless you fully support
> xpmem. You're calling invalidate_range in places that can't sleep...

I thought that could be used by a non-sleeping user (not intending
to try supporting sleeping users). If it is useless then it should
go away (BTW. I didn't see your recent patch, some of my confusion
I think stems from Christoph's novel way of merging and splitting
patches).


> No idea why xpmem needs range_begin, I perfectly understand why GRU
> needs _begin with Chrisotph's patch (gru lacks the page pin) but I
> dunno why xpmem needs range_begin (xpmem has the page pin so I also
> think it could avoid using range_begin). Still to support GRU you need
> both to call invalidate_range in places that can sleep and you need
> the external rmap notifier. The moment you add xpmem into the equation
> your and my clean patches become Christoph's one...

Sorry, I kind of didn't have time to follow the conversation so well
before; are there patches posted for gru and/or xpmem?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Jack Steiner
> On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > understand the need for invalidate_begin/invalidate_end pairs at all.
> 
> The need of the pairs is crystal clear to me: range_begin is needed
> for GRU _but_only_if_ range_end is called after releasing the
> reference that the VM holds on the page. _begin will flush the GRU tlb
> and at the same time it will take a mutex that will block further GRU
> tlb-miss-interrupts (no idea how they manange those nightmare locking,
> I didn't even try to add more locking to KVM and I get away with the
> fact KVM takes the pin on the page itself).

As it turns out, no actual mutex is required. _begin_ simply increments a
count of active range invalidates, _end_ decrements the count. New TLB
dropins are deferred while range callouts are active.

This would appear to be racy but the GRU has special hardware that
simplifies locking. When the GRU sees a TLB invalidate, all outstanding
misses & potentially inflight TLB dropins are marked by the GRU with a
"kill" bit. When the dropin finally occurs, the dropin is ignored & the
instruction is simply restarted. The instruction will fault again & the TLB
dropin will be repeated.  This is optimized for the case where invalidates
are rare - true for users of the GRU.


In general, though, I agree. Most users of mmu_notifiers would likely
required a mutex or something equivalent.


--- jack



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> are rather similar. However I have tried to make a point of minimising the
> impact the the core mm/. I don't see why we need to invalidate or flush

I also tried hard to minimise the impact of the core mm/, I also
argued with Christoph that cluttering mm/ wasn't a good idea for
things like age_page that could be a 1 liner change instead of a
multiple-liner change, without any loss of flexibility or readability.

> anything when changing the pte to be _more_ permissive, and I don't

Note that in my patch the invalidate_pages in mprotect can be
trivially switched to a mprotect_pages with proper params. This will
prevent page faults completely in the secondary MMU (there will only
be tlb misses after the tlb flush just like for the core linux pte),
and it'll allow all the secondary MMU pte blocks (512/1024 at time
with my PT lock design) to be updated to have proper permissions
matching the core linux pte.

> understand the need for invalidate_begin/invalidate_end pairs at all.

The need of the pairs is crystal clear to me: range_begin is needed
for GRU _but_only_if_ range_end is called after releasing the
reference that the VM holds on the page. _begin will flush the GRU tlb
and at the same time it will take a mutex that will block further GRU
tlb-miss-interrupts (no idea how they manange those nightmare locking,
I didn't even try to add more locking to KVM and I get away with the
fact KVM takes the pin on the page itself).

My patch calls invalidate_page/pages before the reference is released
on the page, so GRU will work fine despite lack of
range_begin. Furthermore with my patch GRU will be auto-serialized by
the PT lock w/o the need of any additional locking.

> What I have done is basically create it so that the notifiers get called
> basically in the same place as the normal TLB flushing is done, and nowhere
> else.

That was one of my objectives too.

> I also wanted to avoid calling notifier code from inside eg. hardware TLB
> or pte manipulation primitives. These things are already pretty well
> spaghetti, so I'd like to just place them right where needed first... I
> think eventually it will need a bit of a rethink to make it more consistent
> and more general. But I prefer to do put them in the caller for the moment.

Your patch should also work for KVM but it's suboptimal, my patch can
be orders of magnitude more efficient for GRU thanks to the
invalidate_pages optimization. Christoph complained about having to
call one method per pte.

And adding invalidate_range is useless unless you fully support
xpmem. You're calling invalidate_range in places that can't sleep...

No idea why xpmem needs range_begin, I perfectly understand why GRU
needs _begin with Chrisotph's patch (gru lacks the page pin) but I
dunno why xpmem needs range_begin (xpmem has the page pin so I also
think it could avoid using range_begin). Still to support GRU you need
both to call invalidate_range in places that can sleep and you need
the external rmap notifier. The moment you add xpmem into the equation
your and my clean patches become Christoph's one...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
> So I implemented mmu notifiers slightly differently. Andrea's mmu notifiers
> are rather similar. However I have tried to make a point of minimising the
> impact the the core mm/. I don't see why we need to invalidate or flush
> anything when changing the pte to be _more_ permissive, and I don't
> understand the need for invalidate_begin/invalidate_end pairs at all.
> What I have done is basically create it so that the notifiers get called
> basically in the same place as the normal TLB flushing is done, and nowhere
> else.

Because XPMEM needs to be able to sleep during its callout.  For that,
we need to move this outside of the page table lock and suddenly we
need the begin/end pair again.  There was considerable discussion about
this exact point numerous times.  We tried to develop the most inclusive
design possible.  Our design would even be extendable to IB, assuming they
made some very disruptive changes to their MPI and communication libraries.
IB would suffer the same problems XPMEM does in that the TLB entries
need to be removed on a remote host which is operating completely
independently.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
 So I implemented mmu notifiers slightly differently. Andrea's mmu notifiers
 are rather similar. However I have tried to make a point of minimising the
 impact the the core mm/. I don't see why we need to invalidate or flush
 anything when changing the pte to be _more_ permissive, and I don't
 understand the need for invalidate_begin/invalidate_end pairs at all.
 What I have done is basically create it so that the notifiers get called
 basically in the same place as the normal TLB flushing is done, and nowhere
 else.

Because XPMEM needs to be able to sleep during its callout.  For that,
we need to move this outside of the page table lock and suddenly we
need the begin/end pair again.  There was considerable discussion about
this exact point numerous times.  We tried to develop the most inclusive
design possible.  Our design would even be extendable to IB, assuming they
made some very disruptive changes to their MPI and communication libraries.
IB would suffer the same problems XPMEM does in that the TLB entries
need to be removed on a remote host which is operating completely
independently.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
 are rather similar. However I have tried to make a point of minimising the
 impact the the core mm/. I don't see why we need to invalidate or flush

I also tried hard to minimise the impact of the core mm/, I also
argued with Christoph that cluttering mm/ wasn't a good idea for
things like age_page that could be a 1 liner change instead of a
multiple-liner change, without any loss of flexibility or readability.

 anything when changing the pte to be _more_ permissive, and I don't

Note that in my patch the invalidate_pages in mprotect can be
trivially switched to a mprotect_pages with proper params. This will
prevent page faults completely in the secondary MMU (there will only
be tlb misses after the tlb flush just like for the core linux pte),
and it'll allow all the secondary MMU pte blocks (512/1024 at time
with my PT lock design) to be updated to have proper permissions
matching the core linux pte.

 understand the need for invalidate_begin/invalidate_end pairs at all.

The need of the pairs is crystal clear to me: range_begin is needed
for GRU _but_only_if_ range_end is called after releasing the
reference that the VM holds on the page. _begin will flush the GRU tlb
and at the same time it will take a mutex that will block further GRU
tlb-miss-interrupts (no idea how they manange those nightmare locking,
I didn't even try to add more locking to KVM and I get away with the
fact KVM takes the pin on the page itself).

My patch calls invalidate_page/pages before the reference is released
on the page, so GRU will work fine despite lack of
range_begin. Furthermore with my patch GRU will be auto-serialized by
the PT lock w/o the need of any additional locking.

 What I have done is basically create it so that the notifiers get called
 basically in the same place as the normal TLB flushing is done, and nowhere
 else.

That was one of my objectives too.

 I also wanted to avoid calling notifier code from inside eg. hardware TLB
 or pte manipulation primitives. These things are already pretty well
 spaghetti, so I'd like to just place them right where needed first... I
 think eventually it will need a bit of a rethink to make it more consistent
 and more general. But I prefer to do put them in the caller for the moment.

Your patch should also work for KVM but it's suboptimal, my patch can
be orders of magnitude more efficient for GRU thanks to the
invalidate_pages optimization. Christoph complained about having to
call one method per pte.

And adding invalidate_range is useless unless you fully support
xpmem. You're calling invalidate_range in places that can't sleep...

No idea why xpmem needs range_begin, I perfectly understand why GRU
needs _begin with Chrisotph's patch (gru lacks the page pin) but I
dunno why xpmem needs range_begin (xpmem has the page pin so I also
think it could avoid using range_begin). Still to support GRU you need
both to call invalidate_range in places that can sleep and you need
the external rmap notifier. The moment you add xpmem into the equation
your and my clean patches become Christoph's one...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Jack Steiner
 On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
  understand the need for invalidate_begin/invalidate_end pairs at all.
 
 The need of the pairs is crystal clear to me: range_begin is needed
 for GRU _but_only_if_ range_end is called after releasing the
 reference that the VM holds on the page. _begin will flush the GRU tlb
 and at the same time it will take a mutex that will block further GRU
 tlb-miss-interrupts (no idea how they manange those nightmare locking,
 I didn't even try to add more locking to KVM and I get away with the
 fact KVM takes the pin on the page itself).

As it turns out, no actual mutex is required. _begin_ simply increments a
count of active range invalidates, _end_ decrements the count. New TLB
dropins are deferred while range callouts are active.

This would appear to be racy but the GRU has special hardware that
simplifies locking. When the GRU sees a TLB invalidate, all outstanding
misses  potentially inflight TLB dropins are marked by the GRU with a
kill bit. When the dropin finally occurs, the dropin is ignored  the
instruction is simply restarted. The instruction will fault again  the TLB
dropin will be repeated.  This is optimized for the case where invalidates
are rare - true for users of the GRU.


In general, though, I agree. Most users of mmu_notifiers would likely
required a mutex or something equivalent.


--- jack



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
 On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
  are rather similar. However I have tried to make a point of minimising the
  impact the the core mm/. I don't see why we need to invalidate or flush
 
 I also tried hard to minimise the impact of the core mm/, I also
 argued with Christoph that cluttering mm/ wasn't a good idea for
 things like age_page that could be a 1 liner change instead of a
 multiple-liner change, without any loss of flexibility or readability.
 
  anything when changing the pte to be _more_ permissive, and I don't
 
 Note that in my patch the invalidate_pages in mprotect can be
 trivially switched to a mprotect_pages with proper params. This will
 prevent page faults completely in the secondary MMU (there will only
 be tlb misses after the tlb flush just like for the core linux pte),
 and it'll allow all the secondary MMU pte blocks (512/1024 at time
 with my PT lock design) to be updated to have proper permissions
 matching the core linux pte.
 
  understand the need for invalidate_begin/invalidate_end pairs at all.
 
 The need of the pairs is crystal clear to me: range_begin is needed
 for GRU _but_only_if_ range_end is called after releasing the
 reference that the VM holds on the page. _begin will flush the GRU tlb
 and at the same time it will take a mutex that will block further GRU
 tlb-miss-interrupts (no idea how they manange those nightmare locking,
 I didn't even try to add more locking to KVM and I get away with the
 fact KVM takes the pin on the page itself).
 
 My patch calls invalidate_page/pages before the reference is released
 on the page, so GRU will work fine despite lack of
 range_begin. Furthermore with my patch GRU will be auto-serialized by
 the PT lock w/o the need of any additional locking.

That's why I don't understand the need for the pairs: it should be
done like this.


  What I have done is basically create it so that the notifiers get called
  basically in the same place as the normal TLB flushing is done, and nowhere
  else.
 
 That was one of my objectives too.
 
  I also wanted to avoid calling notifier code from inside eg. hardware TLB
  or pte manipulation primitives. These things are already pretty well
  spaghetti, so I'd like to just place them right where needed first... I
  think eventually it will need a bit of a rethink to make it more consistent
  and more general. But I prefer to do put them in the caller for the moment.
 
 Your patch should also work for KVM but it's suboptimal, my patch can
 be orders of magnitude more efficient for GRU thanks to the
 invalidate_pages optimization. Christoph complained about having to
 call one method per pte.

OK, I didn't see the invalidate_pages call...

 
 And adding invalidate_range is useless unless you fully support
 xpmem. You're calling invalidate_range in places that can't sleep...

I thought that could be used by a non-sleeping user (not intending
to try supporting sleeping users). If it is useless then it should
go away (BTW. I didn't see your recent patch, some of my confusion
I think stems from Christoph's novel way of merging and splitting
patches).


 No idea why xpmem needs range_begin, I perfectly understand why GRU
 needs _begin with Chrisotph's patch (gru lacks the page pin) but I
 dunno why xpmem needs range_begin (xpmem has the page pin so I also
 think it could avoid using range_begin). Still to support GRU you need
 both to call invalidate_range in places that can sleep and you need
 the external rmap notifier. The moment you add xpmem into the equation
 your and my clean patches become Christoph's one...

Sorry, I kind of didn't have time to follow the conversation so well
before; are there patches posted for gru and/or xpmem?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote:
  On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
   understand the need for invalidate_begin/invalidate_end pairs at all.
  
  The need of the pairs is crystal clear to me: range_begin is needed
  for GRU _but_only_if_ range_end is called after releasing the
  reference that the VM holds on the page. _begin will flush the GRU tlb
  and at the same time it will take a mutex that will block further GRU
  tlb-miss-interrupts (no idea how they manange those nightmare locking,
  I didn't even try to add more locking to KVM and I get away with the
  fact KVM takes the pin on the page itself).
 
 As it turns out, no actual mutex is required. _begin_ simply increments a
 count of active range invalidates, _end_ decrements the count. New TLB
 dropins are deferred while range callouts are active.
 
 This would appear to be racy but the GRU has special hardware that
 simplifies locking. When the GRU sees a TLB invalidate, all outstanding
 misses  potentially inflight TLB dropins are marked by the GRU with a
 kill bit. When the dropin finally occurs, the dropin is ignored  the
 instruction is simply restarted. The instruction will fault again  the TLB
 dropin will be repeated.  This is optimized for the case where invalidates
 are rare - true for users of the GRU.

OK (thanks to Robin as well). Now I understand why you are using it,
but I don't understand why you don't defer new TLBs after the point
where the linux pte changes. If you can do that, then you look and
act much more like a TLB from the point of view of the Linux vm.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Nick Piggin
On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
 On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
  anything when changing the pte to be _more_ permissive, and I don't
 
 Note that in my patch the invalidate_pages in mprotect can be
 trivially switched to a mprotect_pages with proper params. This will
 prevent page faults completely in the secondary MMU (there will only
 be tlb misses after the tlb flush just like for the core linux pte),
 and it'll allow all the secondary MMU pte blocks (512/1024 at time
 with my PT lock design) to be updated to have proper permissions
 matching the core linux pte.

Sorry, I realise I still didn't get this through my head yet (and also
have not seen your patch recently). So I don't know exactly what you
are doing...

But why does _anybody_ (why does Christoph's patches) need to invalidate
when they are going to be more permissive? This should be done lazily by
the driver, I would have thought.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Jack Steiner
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
 On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
  On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
   anything when changing the pte to be _more_ permissive, and I don't
  
  Note that in my patch the invalidate_pages in mprotect can be
  trivially switched to a mprotect_pages with proper params. This will
  prevent page faults completely in the secondary MMU (there will only
  be tlb misses after the tlb flush just like for the core linux pte),
  and it'll allow all the secondary MMU pte blocks (512/1024 at time
  with my PT lock design) to be updated to have proper permissions
  matching the core linux pte.
 
 Sorry, I realise I still didn't get this through my head yet (and also
 have not seen your patch recently). So I don't know exactly what you
 are doing...
 
 But why does _anybody_ (why does Christoph's patches) need to invalidate
 when they are going to be more permissive? This should be done lazily by
 the driver, I would have thought.


Agree. Although for most real applications, the performance difference
is probably negligible.

--- jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 11:59:23PM +0100, Nick Piggin wrote:
 That's why I don't understand the need for the pairs: it should be
 done like this.

Yes, except it can't be done like this for xpmem.

 OK, I didn't see the invalidate_pages call...

See the last patch I posted to Andrew, you've probably looked at the
old patches, the old patches didn't work for GRU and didn't work for
xpmem and they weren't optimized to cluster the invalidates for each
4k-large-pte.

 I thought that could be used by a non-sleeping user (not intending
 to try supporting sleeping users). If it is useless then it should
 go away (BTW. I didn't see your recent patch, some of my confusion
 I think stems from Christoph's novel way of merging and splitting
 patches).

I kept improving my patch in case the VM maintainers would consider
xpmem requirements not workable from a linux-VM point of view, and
they preferred to have something obviously safe, strightforward and
non intrusive, despite it doesn't support the only sleeping user out
there I know of (xpmem). My patch supports KVM and GRU (and any other
not sleeping user).

  No idea why xpmem needs range_begin, I perfectly understand why GRU
  needs _begin with Chrisotph's patch (gru lacks the page pin) but I
  dunno why xpmem needs range_begin (xpmem has the page pin so I also
  think it could avoid using range_begin). Still to support GRU you need
  both to call invalidate_range in places that can sleep and you need
  the external rmap notifier. The moment you add xpmem into the equation
  your and my clean patches become Christoph's one...
 
 Sorry, I kind of didn't have time to follow the conversation so well
 before; are there patches posted for gru and/or xpmem?

There's some xpmem code posted but the posted one isn't using the mmu
notifiers yet. GRU code may be available from Jack. I only know for
sure their requirements in terms of mmu notifiers.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote:
 On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote:
   On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
understand the need for invalidate_begin/invalidate_end pairs at all.
   
   The need of the pairs is crystal clear to me: range_begin is needed
   for GRU _but_only_if_ range_end is called after releasing the
   reference that the VM holds on the page. _begin will flush the GRU tlb
   and at the same time it will take a mutex that will block further GRU
   tlb-miss-interrupts (no idea how they manange those nightmare locking,
   I didn't even try to add more locking to KVM and I get away with the
   fact KVM takes the pin on the page itself).
  
  As it turns out, no actual mutex is required. _begin_ simply increments a
  count of active range invalidates, _end_ decrements the count. New TLB
  dropins are deferred while range callouts are active.
  
  This would appear to be racy but the GRU has special hardware that
  simplifies locking. When the GRU sees a TLB invalidate, all outstanding
  misses  potentially inflight TLB dropins are marked by the GRU with a
  kill bit. When the dropin finally occurs, the dropin is ignored  the
  instruction is simply restarted. The instruction will fault again  the TLB
  dropin will be repeated.  This is optimized for the case where invalidates
  are rare - true for users of the GRU.
 
 OK (thanks to Robin as well). Now I understand why you are using it,
 but I don't understand why you don't defer new TLBs after the point
 where the linux pte changes. If you can do that, then you look and
 act much more like a TLB from the point of view of the Linux vm.

Christoph was forced to put the invalidate_range callback _after_
dropping the PT lock because xpmem has to wait I/O there. But
invalidate_range is called after freeing the VM reference on the pages
so then GRU needed a _range_begin too because GRU has to flush the tlb
before the VM reference on the page is released (xpmem and KVM pin the
pages mapped by the secondary mmu, GRU doesn't). So then
invalidate_range was renamed to invalidate_range_end.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
 Sorry, I realise I still didn't get this through my head yet (and also
 have not seen your patch recently). So I don't know exactly what you
 are doing...

The last version was posted here:

http://marc.info/?l=kvm-develm=120321732521533w=2

 But why does _anybody_ (why does Christoph's patches) need to invalidate
 when they are going to be more permissive? This should be done lazily by
 the driver, I would have thought.

This can be done lazily by the driver yes. The place where I've an
invalidate_pages in mprotect however can also become less permissive.
It's simpler to invalidate always and it's not guaranteed the
secondary mmu page fault is capable of refreshing the spte across a
writeprotect fault. In the future this can be changed to
mprotect_pages though, so no page fault will happen in the secondary
mmu.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:52:06AM +0100, Andrea Arcangeli wrote:
 On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote:
  OK (thanks to Robin as well). Now I understand why you are using it,
  but I don't understand why you don't defer new TLBs after the point
  where the linux pte changes. If you can do that, then you look and
  act much more like a TLB from the point of view of the Linux vm.
 
 Christoph was forced to put the invalidate_range callback _after_
 dropping the PT lock because xpmem has to wait I/O there. But
 invalidate_range is called after freeing the VM reference on the pages
 so then GRU needed a _range_begin too because GRU has to flush the tlb
 before the VM reference on the page is released (xpmem and KVM pin the
 pages mapped by the secondary mmu, GRU doesn't). So then
 invalidate_range was renamed to invalidate_range_end.

Currently, xpmem blocks faults for the range specified at the _begin
callout, then shoots down remote TLBs and does the put_page for all the
pages in the specified range.  The _end callout merely removes the block.
We do not do any wait for I/O.  By the time we return from the _begin
callout, all activity by the remotes is stopped, pages are dereferenced,
and future faults are blocked until released by the _end callout.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] my mmu notifiers

2008-02-19 Thread Robin Holt
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote:
 On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
  On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote:
   anything when changing the pte to be _more_ permissive, and I don't
  
  Note that in my patch the invalidate_pages in mprotect can be
  trivially switched to a mprotect_pages with proper params. This will
  prevent page faults completely in the secondary MMU (there will only
  be tlb misses after the tlb flush just like for the core linux pte),
  and it'll allow all the secondary MMU pte blocks (512/1024 at time
  with my PT lock design) to be updated to have proper permissions
  matching the core linux pte.
 
 Sorry, I realise I still didn't get this through my head yet (and also
 have not seen your patch recently). So I don't know exactly what you
 are doing...
 
 But why does _anybody_ (why does Christoph's patches) need to invalidate
 when they are going to be more permissive? This should be done lazily by
 the driver, I would have thought.

I don't believe it should, but it probably does right now.  I do know
the case where a write fault where there is no need for a COW does
not call out on the PTE change.  I see no reason the others should not
handle this as well.  Just off the top of my head, I can only think of
the mprotect case needing to special case the more permissive state and
I don't think that changes PTEs at all, merely updates the VMA.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/