Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Wed, Aug 22, 2012 at 11:19:04PM -0300, Rafael Aquini wrote:
 On Wed, Aug 22, 2012 at 12:33:17PM +0300, Michael S. Tsirkin wrote:
  Hmm, so this will busy wait which is unelegant.
  We need some event IMO.
 
 No, it does not busy wait. leak_balloon() is mutual exclusive with migration
 steps, so for the case we have one racing against the other, we really want
 leak_balloon() dropping the mutex temporarily to allow migration complete its
 work of refilling vb-pages list. Also, leak_balloon() calls tell_host(), 
 which
 will potentially make it to schedule for each round of vb-pfns leak_balloon()
 will release.

tell_host might not even cause an exit to host. Even if it does
it does not involve guest scheduler.

 So, when remove_common() calls leak_balloon() looping on
 vb-num_pages, that won't become a tight loop. 
 The scheme was apparently working before this series, and it will remain 
 working
 after it.

It seems that before we would always leak all requested memory
in one go. I can't tell why we have a while loop there at all.
Rusty, could you clarify please?

 
  Also, reading num_pages without a lock here
  which seems wrong.
 
 I'll protect it with vb-balloon_lock mutex. That will be consistent with the
 lock protection scheme this patch is introducing for struct virtio_balloon
 elements.
 
 
  A similar concern applies to normal leaking
  of the balloon: here we might leak less than
  required, then wait for the next config change
  event.
 
 Just as before, same thing here. If you leaked less than required, balloon()
 will keep calling leak_balloon() until the balloon target is reached. This
 scheme was working before, and it will keep working after this patch.


IIUC we never hit this path before.
 
  How about we signal config_change
  event when pages are back to pages_list?
 
 I really don't know what to tell you here, but, to me, it seems like an
 overcomplication that isn't directly entangled with this patch purposes.
 Besides, you cannot expect compation / migration happening and racing against
 leak_balloon() all the time to make them signal events to the later, so we 
 might
 just be creating a wait-forever condition for leak_balloon(), IMHO.

So use wait_event or similar, check for existance of isolated pages.

 Cheers!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 01:01:07PM +0300, Michael S. Tsirkin wrote:
  So, when remove_common() calls leak_balloon() looping on
  vb-num_pages, that won't become a tight loop. 
  The scheme was apparently working before this series, and it will remain 
  working
  after it.
 
 It seems that before we would always leak all requested memory
 in one go. I can't tell why we have a while loop there at all.
 Rusty, could you clarify please?


It seems that your claim isn't right. leak_balloon() cannot do it all at once,
as for each round it only releases 256 pages, at most; and the 'one go' would
require a couple of loop rounds at remove_common().
So, nothing has changed here.

 
  Just as before, same thing here. If you leaked less than required, balloon()
  will keep calling leak_balloon() until the balloon target is reached. This
  scheme was working before, and it will keep working after this patch.
 
 
 IIUC we never hit this path before.
  
So, how does balloon() works then?


   How about we signal config_change
   event when pages are back to pages_list?
  
  I really don't know what to tell you here, but, to me, it seems like an
  overcomplication that isn't directly entangled with this patch purposes.
  Besides, you cannot expect compation / migration happening and racing 
  against
  leak_balloon() all the time to make them signal events to the later, so we 
  might
  just be creating a wait-forever condition for leak_balloon(), IMHO.
 
 So use wait_event or similar, check for existance of isolated pages.
 

The thing here is expecting compaction as being an external event to signal
actions to the balloon driver won't work as you desire. Also, as far as the
balloon driver is concerned, it's only a matter of time to accomplish a total,
or partial, balloon leak, even when we have some pages isolated from balloon's
page list.

IMHO, you're attempting to complicate a simple thing that is already working
well. As said before, there are no guarantees you'll have isolated pages 
by the time you're leaking the balloon, so you might leave it waiting forever
on something that will not happen. And if there are isolated pages while balloon
is leaking, they'll have their chance to get back to the list before the device
finishes its leaking job.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 09:13:39AM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 01:01:07PM +0300, Michael S. Tsirkin wrote:
   So, when remove_common() calls leak_balloon() looping on
   vb-num_pages, that won't become a tight loop. 
   The scheme was apparently working before this series, and it will remain 
   working
   after it.
  
  It seems that before we would always leak all requested memory
  in one go. I can't tell why we have a while loop there at all.
  Rusty, could you clarify please?
 
 
 It seems that your claim isn't right. leak_balloon() cannot do it all at once,
 as for each round it only releases 256 pages, at most; and the 'one go' would
 require a couple of loop rounds at remove_common().

You are right in this respect.

 So, nothing has changed here.

Yes, your patch does change things:
leak_balloon now might return without freeing any pages.
In that case we will not be making any progress, and just
spin, pinning CPU.

  
   Just as before, same thing here. If you leaked less than required, 
   balloon()
   will keep calling leak_balloon() until the balloon target is reached. This
   scheme was working before, and it will keep working after this patch.
  
  
  IIUC we never hit this path before.
   
 So, how does balloon() works then?
 

It gets a request to leak a given number of pages
and executes it, then tells host that it is done.
It never needs to spin busy-waiting on a CPU for this.

How about we signal config_change
event when pages are back to pages_list?
   
   I really don't know what to tell you here, but, to me, it seems like an
   overcomplication that isn't directly entangled with this patch purposes.
   Besides, you cannot expect compation / migration happening and racing 
   against
   leak_balloon() all the time to make them signal events to the later, so 
   we might
   just be creating a wait-forever condition for leak_balloon(), IMHO.
  
  So use wait_event or similar, check for existance of isolated pages.
  
 
 The thing here is expecting compaction as being an external event to signal
 actions to the balloon driver won't work as you desire. Also, as far as the
 balloon driver is concerned, it's only a matter of time to accomplish a total,
 or partial, balloon leak, even when we have some pages isolated from balloon's
 page list.
 
 IMHO, you're attempting to complicate a simple thing that is already working
 well. As said before, there are no guarantees you'll have isolated pages 
 by the time you're leaking the balloon, so you might leave it waiting forever
 on something that will not happen. And if there are isolated pages while 
 balloon
 is leaking, they'll have their chance to get back to the list before the 
 device
 finishes its leaking job.

Well busy wait pinning CPU is ugly.  Instead we should block thread and
wake it up when done.  I don't mind how we fix it specifically.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 03:34:32PM +0300, Michael S. Tsirkin wrote:
  So, nothing has changed here.
 
 Yes, your patch does change things:
 leak_balloon now might return without freeing any pages.
 In that case we will not be making any progress, and just
 spin, pinning CPU.

That's a transitory condition, that migh happen if leak_balloon() takes place
when compaction, or migration are under their way and it might only affects the
module unload case. Also it won't pin CPU because it keeps releasing the locks
it grabs, as it loops. So, we are locubrating about rarities, IMHO. 

 
   
Just as before, same thing here. If you leaked less than required, 
balloon()
will keep calling leak_balloon() until the balloon target is reached. 
This
scheme was working before, and it will keep working after this patch.
   
   
   IIUC we never hit this path before.

  So, how does balloon() works then?
  
 
 It gets a request to leak a given number of pages
 and executes it, then tells host that it is done.
 It never needs to spin busy-waiting on a CPU for this.


So, what this patch changes for the ordinary leak_balloon() case?

 
 Well busy wait pinning CPU is ugly.  Instead we should block thread and
 wake it up when done.  I don't mind how we fix it specifically.


I already told you that we do not do that by any mean introduced by this patch.
You're just being stubborn here. If those bits are broken, they were already
broken before I did come up with this proposal.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 10:06:07AM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 03:34:32PM +0300, Michael S. Tsirkin wrote:
   So, nothing has changed here.
  
  Yes, your patch does change things:
  leak_balloon now might return without freeing any pages.
  In that case we will not be making any progress, and just
  spin, pinning CPU.
 
 That's a transitory condition, that migh happen if leak_balloon() takes place
 when compaction, or migration are under their way and it might only affects 
 the
 module unload case.

Regular operation seems even more broken: host might ask
you to leak memory but because it is under compaction
you might leak nothing. No?

 Also it won't pin CPU because it keeps releasing the locks
 it grabs, as it loops.

What has releazing locks have to do with it?

 So, we are locubrating about rarities, IMHO. 
  

 Just as before, same thing here. If you leaked less than required, 
 balloon()
 will keep calling leak_balloon() until the balloon target is reached. 
 This
 scheme was working before, and it will keep working after this patch.


IIUC we never hit this path before.
 
   So, how does balloon() works then?
   
  
  It gets a request to leak a given number of pages
  and executes it, then tells host that it is done.
  It never needs to spin busy-waiting on a CPU for this.
 
 
 So, what this patch changes for the ordinary leak_balloon() case?
 

That not all pages used by balloon are on vb-pages list.

  Well busy wait pinning CPU is ugly.  Instead we should block thread and
  wake it up when done.  I don't mind how we fix it specifically.
 
 
 I already told you that we do not do that by any mean introduced by this 
 patch.
 You're just being stubborn here. If those bits are broken, they were already
 broken before I did come up with this proposal.

Sorry you don't address the points I am making.  Maybe there are no
bugs. But it looks like there are.  And assuming I am just seeing things
this just means patch needs more comments, in commit log and in
code to explain the design so that it stops looking like that.

Basically it was very simple: we assumed page-lru was never
touched for an allocated page, so it's safe to use it for
internal book-keeping by the driver.

Now, this is not the case anymore, you add some logic in mm/ that might
or might not touch page-lru depending on things like reference count.
And you are asking why things break even though you change very little
in balloon itself?  Because the interface between balloon and mm is now
big, fragile and largely undocumented.

Another strangeness I just noticed: if we ever do an extra get_page in
balloon, compaction logic in mm will break, yes?  But one expects to be
able to do get_page after alloc_page without ill effects
as long as one does put_page before free.

Just a thought: maybe it is cleaner to move all balloon page tracking
into mm/?  Implement alloc_balloon/free_balloon with methods to fill and
leak pages, and callbacks to invoke when done.  This should be good for
other hypervisors too. If you like this idea, I can even try to help out
by refactoring current code in this way, so that you can build on it.
But this is just a thought, not a must.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 04:53:29PM +0300, Michael S. Tsirkin wrote:
 On Thu, Aug 23, 2012 at 10:06:07AM -0300, Rafael Aquini wrote:
  On Thu, Aug 23, 2012 at 03:34:32PM +0300, Michael S. Tsirkin wrote:
So, nothing has changed here.
   
   Yes, your patch does change things:
   leak_balloon now might return without freeing any pages.
   In that case we will not be making any progress, and just
   spin, pinning CPU.
  
  That's a transitory condition, that migh happen if leak_balloon() takes 
  place
  when compaction, or migration are under their way and it might only affects 
  the
  module unload case.
 
 Regular operation seems even more broken: host might ask
 you to leak memory but because it is under compaction
 you might leak nothing. No?


And that is exactely what it wants to do. If there is (temporarily) nothing to 
leak,
then not leaking is the only sane thing to do. Having balloon pages being 
migrated
does not break the leak at all, despite it can last a little longer.

 
 
  
  I already told you that we do not do that by any mean introduced by this 
  patch.
  You're just being stubborn here. If those bits are broken, they were already
  broken before I did come up with this proposal.
 
 Sorry you don't address the points I am making.  Maybe there are no
 bugs. But it looks like there are.  And assuming I am just seeing things
 this just means patch needs more comments, in commit log and in
 code to explain the design so that it stops looking like that.


Yes, I belive you're biased here.


 Basically it was very simple: we assumed page-lru was never
 touched for an allocated page, so it's safe to use it for
 internal book-keeping by the driver.

 Now, this is not the case anymore, you add some logic in mm/ that might
 or might not touch page-lru depending on things like reference count.
 And you are asking why things break even though you change very little
 in balloon itself?  Because the interface between balloon and mm is now
 big, fragile and largely undocumented.
 

The driver don't use page-lru as its bookeeping at all, it uses
vb-num_pages instead. 


 Another strangeness I just noticed: if we ever do an extra get_page in
 balloon, compaction logic in mm will break, yes?  But one expects to be
 able to do get_page after alloc_page without ill effects
 as long as one does put_page before free.


You can do it (bump up the balloon page refcount), and it will only prevent
balloon pages from being isolated and migrated, thus reducing the effectiveness 
of
defragmenting memory when balloon pages are present, just like it happens today.

It really doesn't seems the case of virtio_balloon driver, or any other driver,
which allocates pages directly from buddy to keep raising the page refcount,
though.
 

 Just a thought: maybe it is cleaner to move all balloon page tracking
 into mm/?  Implement alloc_balloon/free_balloon with methods to fill and
 leak pages, and callbacks to invoke when done.  This should be good for
 other hypervisors too. If you like this idea, I can even try to help out
 by refactoring current code in this way, so that you can build on it.
 But this is just a thought, not a must.


That seems to be a good thought to be on a future enhancements wish-list, for 
sure.
We can start thinking of it, and I surely would be more than happy on be doing
it along with you. But I don't think not having it right away is a dealbreaker
for this proposal, as is.

I'm not against your thoughts, and I'm really glad that you're providing such
good dicussion over this subject, but, now I'll wait for Rusty thoughts on 
this one question.

Cheers!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 12:21:29PM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 04:53:29PM +0300, Michael S. Tsirkin wrote:
  On Thu, Aug 23, 2012 at 10:06:07AM -0300, Rafael Aquini wrote:
   On Thu, Aug 23, 2012 at 03:34:32PM +0300, Michael S. Tsirkin wrote:
 So, nothing has changed here.

Yes, your patch does change things:
leak_balloon now might return without freeing any pages.
In that case we will not be making any progress, and just
spin, pinning CPU.
   
   That's a transitory condition, that migh happen if leak_balloon() takes 
   place
   when compaction, or migration are under their way and it might only 
   affects the
   module unload case.
  
  Regular operation seems even more broken: host might ask
  you to leak memory but because it is under compaction
  you might leak nothing. No?
 
 
 And that is exactely what it wants to do. If there is (temporarily) nothing 
 to leak,
 then not leaking is the only sane thing to do.

It's an internal issue between balloon and mm. User does not care.

 Having balloon pages being migrated
 does not break the leak at all, despite it can last a little longer.
 

Not longer - apparently forever unless user resend the leak command.
It's wrong - it should
1. not tell host if nothing was done
2. after migration finished leak and tell host

   
   I already told you that we do not do that by any mean introduced by this 
   patch.
   You're just being stubborn here. If those bits are broken, they were 
   already
   broken before I did come up with this proposal.
  
  Sorry you don't address the points I am making.  Maybe there are no
  bugs. But it looks like there are.  And assuming I am just seeing things
  this just means patch needs more comments, in commit log and in
  code to explain the design so that it stops looking like that.
 
 
 Yes, I belive you're biased here.
 
 
  Basically it was very simple: we assumed page-lru was never
  touched for an allocated page, so it's safe to use it for
  internal book-keeping by the driver.
 
  Now, this is not the case anymore, you add some logic in mm/ that might
  or might not touch page-lru depending on things like reference count.
  And you are asking why things break even though you change very little
  in balloon itself?  Because the interface between balloon and mm is now
  big, fragile and largely undocumented.
  
 
 The driver don't use page-lru as its bookeeping at all, it uses
 vb-num_pages instead. 

$ grep lru drivers/virtio/virtio_balloon.c
list_add(page-lru, vb-pages);
page = list_first_entry(vb-pages, struct page, lru);
list_del(page-lru);


 
  Another strangeness I just noticed: if we ever do an extra get_page in
  balloon, compaction logic in mm will break, yes?  But one expects to be
  able to do get_page after alloc_page without ill effects
  as long as one does put_page before free.
 
 
 You can do it (bump up the balloon page refcount), and it will only prevent
 balloon pages from being isolated and migrated, thus reducing the 
 effectiveness of
 defragmenting memory when balloon pages are present, just like it happens 
 today.
 
 It really doesn't seems the case of virtio_balloon driver, or any other 
 driver,
 which allocates pages directly from buddy to keep raising the page refcount,
 though.
  

E.g. network devices routinely play with pages they get from buddy,
this is used for sharing memory between skbs.

  Just a thought: maybe it is cleaner to move all balloon page tracking
  into mm/?  Implement alloc_balloon/free_balloon with methods to fill and
  leak pages, and callbacks to invoke when done.  This should be good for
  other hypervisors too. If you like this idea, I can even try to help out
  by refactoring current code in this way, so that you can build on it.
  But this is just a thought, not a must.
 
 
 That seems to be a good thought to be on a future enhancements wish-list, for 
 sure.
 We can start thinking of it, and I surely would be more than happy on be doing
 it along with you. But I don't think not having it right away is a dealbreaker
 for this proposal, as is.

I grant busywait on module unloading isn't a huge deal breaker.

Poking in mm internals is not a dealbreaker?
Not leaking as much as
you are asked to isn't?

 I'm not against your thoughts, and I'm really glad that you're providing such
 good dicussion over this subject, but, now I'll wait for Rusty thoughts on 
 this one question.
 
 Cheers!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rik van Riel

On 08/23/2012 11:54 AM, Michael S. Tsirkin wrote:

On Thu, Aug 23, 2012 at 12:21:29PM -0300, Rafael Aquini wrote:

On Thu, Aug 23, 2012 at 04:53:29PM +0300, Michael S. Tsirkin wrote:

On Thu, Aug 23, 2012 at 10:06:07AM -0300, Rafael Aquini wrote:

On Thu, Aug 23, 2012 at 03:34:32PM +0300, Michael S. Tsirkin wrote:

So, nothing has changed here.


Yes, your patch does change things:
leak_balloon now might return without freeing any pages.
In that case we will not be making any progress, and just
spin, pinning CPU.


That's a transitory condition, that migh happen if leak_balloon() takes place
when compaction, or migration are under their way and it might only affects the
module unload case.


Regular operation seems even more broken: host might ask
you to leak memory but because it is under compaction
you might leak nothing. No?



And that is exactely what it wants to do. If there is (temporarily) nothing to 
leak,
then not leaking is the only sane thing to do.


It's an internal issue between balloon and mm. User does not care.


Having balloon pages being migrated
does not break the leak at all, despite it can last a little longer.



Not longer - apparently forever unless user resend the leak command.
It's wrong - it should
1. not tell host if nothing was done
2. after migration finished leak and tell host


Agreed.  If the balloon is told to leak N pages, and could
not do so because those pages were locked, the balloon driver
needs to retry (maybe waiting on a page lock?) and not signal
completion until after the job has been completed.

Having the balloon driver wait on the page lock should be
fine, because compaction does not hold the page lock for
long.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 12:03:15PM -0400, Rik van Riel wrote:
 
 Not longer - apparently forever unless user resend the leak command.
 It's wrong - it should
 1. not tell host if nothing was done
 2. after migration finished leak and tell host
 
 Agreed.  If the balloon is told to leak N pages, and could
 not do so because those pages were locked, the balloon driver
 needs to retry (maybe waiting on a page lock?) and not signal
 completion until after the job has been completed.
 
 Having the balloon driver wait on the page lock should be
 fine, because compaction does not hold the page lock for
 long.

And that is precisely what leak_balloon is doing. When it stumbles across a
locked page it gets rid of that leak round to give a shot for compaction to
finish its task.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 01:06:48PM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 12:03:15PM -0400, Rik van Riel wrote:
  
  Not longer - apparently forever unless user resend the leak command.
  It's wrong - it should
  1. not tell host if nothing was done
  2. after migration finished leak and tell host
  
  Agreed.  If the balloon is told to leak N pages, and could
  not do so because those pages were locked, the balloon driver
  needs to retry (maybe waiting on a page lock?) and not signal
  completion until after the job has been completed.
  
  Having the balloon driver wait on the page lock should be
  fine, because compaction does not hold the page lock for
  long.
 
 And that is precisely what leak_balloon is doing. When it stumbles across a
 locked page it gets rid of that leak round to give a shot for compaction to
 finish its task.
 

Yes but I do not see where it will retry.
If it does, please add a comment pointing out where it happens.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 04:53:28PM +0300, Michael S. Tsirkin wrote:
 Basically it was very simple: we assumed page-lru was never
 touched for an allocated page, so it's safe to use it for
 internal book-keeping by the driver.
 
 Now, this is not the case anymore, you add some logic in mm/ that might
 or might not touch page-lru depending on things like reference count.

Another thought: would the issue go away if balloon used
page-private to link pages instead of LRU?
mm core could keep a reference on page to avoid it
being used while mm handles it (maybe it does already?).

If we do this, will not the only change to balloon be to tell mm that it
can use compaction for these pages when it allocates the page: using
some GPF flag or a new API?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 07:25:05PM +0300, Michael S. Tsirkin wrote:
 On Thu, Aug 23, 2012 at 04:53:28PM +0300, Michael S. Tsirkin wrote:
  Basically it was very simple: we assumed page-lru was never
  touched for an allocated page, so it's safe to use it for
  internal book-keeping by the driver.
  
  Now, this is not the case anymore, you add some logic in mm/ that might
  or might not touch page-lru depending on things like reference count.
 
 Another thought: would the issue go away if balloon used
 page-private to link pages instead of LRU?
 mm core could keep a reference on page to avoid it
 being used while mm handles it (maybe it does already?).

I don't think so. That would be a lot more trikier and complex, IMHO.
 
 If we do this, will not the only change to balloon be to tell mm that it
 can use compaction for these pages when it allocates the page: using
 some GPF flag or a new API?
 

What about keep a conter at virtio_balloon structure on how much pages are
isolated from balloon's list and check it at leak time?
if the counter gets  0 than we can safely put leak_balloon() to wait until
balloon page list gets completely refilled. I guess that is simple to get
accomplished and potentially addresses all your concerns on this issue.

Cheers!

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rik van Riel

On 08/23/2012 01:28 PM, Rafael Aquini wrote:


What about keep a conter at virtio_balloon structure on how much pages are
isolated from balloon's list and check it at leak time?
if the counter gets  0 than we can safely put leak_balloon() to wait until
balloon page list gets completely refilled.


We only have to wait if we failed to leak enough
pages, and then only for as many additional pages
as we require.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Michael S. Tsirkin
On Thu, Aug 23, 2012 at 02:28:45PM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 07:25:05PM +0300, Michael S. Tsirkin wrote:
  On Thu, Aug 23, 2012 at 04:53:28PM +0300, Michael S. Tsirkin wrote:
   Basically it was very simple: we assumed page-lru was never
   touched for an allocated page, so it's safe to use it for
   internal book-keeping by the driver.
   
   Now, this is not the case anymore, you add some logic in mm/ that might
   or might not touch page-lru depending on things like reference count.
  
  Another thought: would the issue go away if balloon used
  page-private to link pages instead of LRU?
  mm core could keep a reference on page to avoid it
  being used while mm handles it (maybe it does already?).
 
 I don't think so. That would be a lot more trikier and complex, IMHO.

What's tricky? Linking pages through a void * orivate pointer?
I can code it up in a couple of minutes.
It's middle of the night so too tired to test but still:

  If we do this, will not the only change to balloon be to tell mm that it
  can use compaction for these pages when it allocates the page: using
  some GPF flag or a new API?
  
 
 What about keep a conter at virtio_balloon structure on how much pages are
 isolated from balloon's list and check it at leak time?
 if the counter gets  0 than we can safely put leak_balloon() to wait until
 balloon page list gets completely refilled. I guess that is simple to get
 accomplished and potentially addresses all your concerns on this issue.
 
 Cheers!

I would wake it each time after adding a page, then it
can stop waiting when it leaks enough.
But again, it's cleaner to just keep tracking all
pages, let mm hang on to them by keeping a reference.

---

virtio-balloon: replace page-lru list with page-private.

The point is to free up page-lru for use by compaction.
Warning: completely untested, will provide tested version
if we agree on this direction.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..b38f57ce 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -56,7 +56,7 @@ struct virtio_balloon
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
 * to num_pages above.
 */
-   struct list_head pages;
+   void *pages;
 
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
@@ -141,7 +141,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
-   list_add(page-lru, vb-pages);
+   /* Add to list of pages */
+   page-private = vb-pages;
+   vb-pages = page-private;
}
 
/* Didn't get any?  Oh well. */
@@ -171,8 +173,9 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   page = list_first_entry(vb-pages, struct page, lru);
-   list_del(page-lru);
+   /* Delete from list of pages */
+   page = vb-pages;
+   vb-pages = page-private;
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -350,7 +353,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out;
}
 
-   INIT_LIST_HEAD(vb-pages);
+   vb-pages = NULL;
vb-num_pages = 0;
init_waitqueue_head(vb-config_change);
init_waitqueue_head(vb-acked);
-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Fri, Aug 24, 2012 at 02:36:16AM +0300, Michael S. Tsirkin wrote:
 On Thu, Aug 23, 2012 at 02:28:45PM -0300, Rafael Aquini wrote:
  On Thu, Aug 23, 2012 at 07:25:05PM +0300, Michael S. Tsirkin wrote:
   On Thu, Aug 23, 2012 at 04:53:28PM +0300, Michael S. Tsirkin wrote:
Basically it was very simple: we assumed page-lru was never
touched for an allocated page, so it's safe to use it for
internal book-keeping by the driver.

Now, this is not the case anymore, you add some logic in mm/ that might
or might not touch page-lru depending on things like reference count.
   
   Another thought: would the issue go away if balloon used
   page-private to link pages instead of LRU?
   mm core could keep a reference on page to avoid it
   being used while mm handles it (maybe it does already?).
  
  I don't think so. That would be a lot more trikier and complex, IMHO.
 
 What's tricky? Linking pages through a void * orivate pointer?
 I can code it up in a couple of minutes.
 It's middle of the night so too tired to test but still:
 
   If we do this, will not the only change to balloon be to tell mm that it
   can use compaction for these pages when it allocates the page: using
   some GPF flag or a new API?
   
  
  What about keep a conter at virtio_balloon structure on how much pages are
  isolated from balloon's list and check it at leak time?
  if the counter gets  0 than we can safely put leak_balloon() to wait until
  balloon page list gets completely refilled. I guess that is simple to get
  accomplished and potentially addresses all your concerns on this issue.
  
  Cheers!
 
 I would wake it each time after adding a page, then it
 can stop waiting when it leaks enough.
 But again, it's cleaner to just keep tracking all
 pages, let mm hang on to them by keeping a reference.
 
 ---
 
 virtio-balloon: replace page-lru list with page-private.
 
 The point is to free up page-lru for use by compaction.
 Warning: completely untested, will provide tested version
 if we agree on this direction.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com


This way balloon driver will potentially release pages that were already
migrated and doesn't belong to it anymore, since the page under migration never
gets isolated from balloon's page list. It's a lot more dangerous than it was
before. 

I'm working on having leak_balloon on the right way, as you correctly has
pointed. I was blind and biased. So, thank you for pointing me the way.


 ---
 
 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index 0908e60..b38f57ce 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -56,7 +56,7 @@ struct virtio_balloon
* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
* to num_pages above.
*/
 - struct list_head pages;
 + void *pages;
  
   /* The array of pfns we tell the Host about. */
   unsigned int num_pfns;
 @@ -141,7 +141,9 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   set_page_pfns(vb-pfns + vb-num_pfns, page);
   vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
   totalram_pages--;
 - list_add(page-lru, vb-pages);
 + /* Add to list of pages */
 + page-private = vb-pages;
 + vb-pages = page-private;
   }
  
   /* Didn't get any?  Oh well. */
 @@ -171,8 +173,9 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
  
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 - page = list_first_entry(vb-pages, struct page, lru);
 - list_del(page-lru);
 + /* Delete from list of pages */
 + page = vb-pages;
 + vb-pages = page-private;
   set_page_pfns(vb-pfns + vb-num_pfns, page);
   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
   }
 @@ -350,7 +353,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
   goto out;
   }
  
 - INIT_LIST_HEAD(vb-pages);
 + vb-pages = NULL;
   vb-num_pages = 0;
   init_waitqueue_head(vb-config_change);
   init_waitqueue_head(vb-acked);
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Fri, Aug 24, 2012 at 02:36:16AM +0300, Michael S. Tsirkin wrote:
 I would wake it each time after adding a page, then it
 can stop waiting when it leaks enough.
 But again, it's cleaner to just keep tracking all
 pages, let mm hang on to them by keeping a reference.
 
Here is a rough idea on how it's getting:

Basically, I'm have introducing an atomic counter to track isolated pages, I
also have changed vb-num_pages into an atomic conter. All inc/dec operations
take place under pages_lock spinlock, and we only perform work under page lock.

It's still missing the wait-part (I'll write it during the weekend) and your
concerns (and mine) will be addressed, IMHO.

---8---
+/*
+ *
+ */
+static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
+   size_t num)
+{
+   /* There are no isolated pages for this balloon device */
+   if (!atomic_read(vb-num_isolated_pages))
+   return;
+
+   /* the leak target is smaller than # of pages on vb-pages list */
+   if (num  (atomic_read(vb-num_pages) -
+   atomic_read(vb-num_isolated_pages)))
+   return;
+   else {
+   spin_unlock(vb-pages_lock);
+   /* wait stuff goes here */
+   spin_lock(vb-pages_lock);
+   }
+}
+
 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
-   struct page *page;
+   /* The array of pfns we tell the Host about. */
+   unsigned int num_pfns;
+   u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];

/* We can only do one array worth at a time. */
-   num = min(num, ARRAY_SIZE(vb-pfns));
+   num = min(num, ARRAY_SIZE(pfns));

-   for (vb-num_pfns = 0; vb-num_pfns  num;
-vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   page = list_first_entry(vb-pages, struct page, lru);
-   list_del(page-lru);
-   set_page_pfns(vb-pfns + vb-num_pfns, page);
-   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
+   for (num_pfns = 0; num_pfns  num;
+num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   struct page *page = NULL;
+   spin_lock(vb-pages_lock);
+   __wait_on_isolated_pages(vb, num);
+
+   if (!list_empty(vb-pages))
+   page = list_first_entry(vb-pages, struct page, lru);
+   /*
+* Grab the page lock to avoid racing against threads isolating
+* pages from, or migrating pages back to vb-pages list.
+* (both tasks are done under page lock protection)
+*
+* Failing to grab the page lock here means this page is being
+* isolated already, or its migration has not finished yet.
+*/
+   if (page  trylock_page(page)) {
+   clear_balloon_mapping(page);
+   list_del(page-lru);
+   set_page_pfns(pfns + num_pfns, page);
+   atomic_sub(VIRTIO_BALLOON_PAGES_PER_PAGE,
+  vb-num_pages);
+   unlock_page(page);
+   }
+   spin_unlock(vb-pages_lock);
}

/*
@@ -182,8 +251,10 @@ static void leak_balloon(struct virtio_balloon *vb, size_t
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to do it in this order
 */
+   mutex_lock(vb-balloon_lock);
tell_host(vb, vb-deflate_vq);
-   release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   mutex_unlock(vb-balloon_lock);
+   release_pages_by_pfn(pfns, num_pfns);
 }
---8---
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 09:33:53PM -0300, Rafael Aquini wrote:
 On Fri, Aug 24, 2012 at 02:36:16AM +0300, Michael S. Tsirkin wrote:
  I would wake it each time after adding a page, then it
  can stop waiting when it leaks enough.
  But again, it's cleaner to just keep tracking all
  pages, let mm hang on to them by keeping a reference.
  

Btw, it's also late here, and there still some work to be done around those
bits, but I guess that has potential to get this issue nailed.

 Here is a rough idea on how it's getting:
 
 Basically, I'm have introducing an atomic counter to track isolated pages, I
 also have changed vb-num_pages into an atomic conter. All inc/dec operations
 take place under pages_lock spinlock, and we only perform work under page 
 lock.
 
 It's still missing the wait-part (I'll write it during the weekend) and your
 concerns (and mine) will be addressed, IMHO.
 
 ---8---
 +/*
 + *
 + */
 +static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
 +   size_t num)
 +{
 +   /* There are no isolated pages for this balloon device */
 +   if (!atomic_read(vb-num_isolated_pages))
 +   return;
 +
 +   /* the leak target is smaller than # of pages on vb-pages list */
 +   if (num  (atomic_read(vb-num_pages) -
 +   atomic_read(vb-num_isolated_pages)))
 +   return;
 +   else {
 +   spin_unlock(vb-pages_lock);
 +   /* wait stuff goes here */
 +   spin_lock(vb-pages_lock);
 +   }
 +}
 +
  static void leak_balloon(struct virtio_balloon *vb, size_t num)
  {
 -   struct page *page;
 +   /* The array of pfns we tell the Host about. */
 +   unsigned int num_pfns;
 +   u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
 
 /* We can only do one array worth at a time. */
 -   num = min(num, ARRAY_SIZE(vb-pfns));
 +   num = min(num, ARRAY_SIZE(pfns));
 
 -   for (vb-num_pfns = 0; vb-num_pfns  num;
 -vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 -   page = list_first_entry(vb-pages, struct page, lru);
 -   list_del(page-lru);
 -   set_page_pfns(vb-pfns + vb-num_pfns, page);
 -   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 +   for (num_pfns = 0; num_pfns  num;
 +num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 +   struct page *page = NULL;
 +   spin_lock(vb-pages_lock);
 +   __wait_on_isolated_pages(vb, num);
 +
 +   if (!list_empty(vb-pages))
 +   page = list_first_entry(vb-pages, struct page, lru);
 +   /*
 +* Grab the page lock to avoid racing against threads 
 isolating
 +* pages from, or migrating pages back to vb-pages list.
 +* (both tasks are done under page lock protection)
 +*
 +* Failing to grab the page lock here means this page is being
 +* isolated already, or its migration has not finished yet.
 +*/
 +   if (page  trylock_page(page)) {
 +   clear_balloon_mapping(page);
 +   list_del(page-lru);
 +   set_page_pfns(pfns + num_pfns, page);
 +   atomic_sub(VIRTIO_BALLOON_PAGES_PER_PAGE,
 +  vb-num_pages);
 +   unlock_page(page);
 +   }
 +   spin_unlock(vb-pages_lock);
 }
 
 /*
 @@ -182,8 +251,10 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t
 num)
  * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
  * is true, we *have* to do it in this order
  */
 +   mutex_lock(vb-balloon_lock);
 tell_host(vb, vb-deflate_vq);
 -   release_pages_by_pfn(vb-pfns, vb-num_pfns);
 +   mutex_unlock(vb-balloon_lock);
 +   release_pages_by_pfn(pfns, num_pfns);
  }
 ---8---
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rafael Aquini
On Thu, Aug 23, 2012 at 09:38:48PM -0300, Rafael Aquini wrote:
 On Thu, Aug 23, 2012 at 09:33:53PM -0300, Rafael Aquini wrote:
  On Fri, Aug 24, 2012 at 02:36:16AM +0300, Michael S. Tsirkin wrote:
   I would wake it each time after adding a page, then it
   can stop waiting when it leaks enough.
   But again, it's cleaner to just keep tracking all
   pages, let mm hang on to them by keeping a reference.
   
 
 Btw, it's also late here, and there still some work to be done around those
 bits, but I guess that has potential to get this issue nailed.
 
  Here is a rough idea on how it's getting:
  
  Basically, I'm have introducing an atomic counter to track isolated pages, I
  also have changed vb-num_pages into an atomic conter. All inc/dec 
  operations
  take place under pages_lock spinlock, and we only perform work under page 
  lock.
  
  It's still missing the wait-part (I'll write it during the weekend) and your
  concerns (and mine) will be addressed, IMHO.
  
  ---8---
  +/*
  + *
  + */
  +static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
  +   size_t num)
  +{
  +   /* There are no isolated pages for this balloon device */
  +   if (!atomic_read(vb-num_isolated_pages))
  +   return;
  +
  +   /* the leak target is smaller than # of pages on vb-pages list */
  +   if (num  (atomic_read(vb-num_pages) -
  +   atomic_read(vb-num_isolated_pages)))
  +   return;
  +   else {
  +   spin_unlock(vb-pages_lock);
  +   /* wait stuff goes here */
  +   spin_lock(vb-pages_lock);
  +   }
  +}
  +
   static void leak_balloon(struct virtio_balloon *vb, size_t num)
   {
  -   struct page *page;
  +   /* The array of pfns we tell the Host about. */
  +   unsigned int num_pfns;
  +   u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
  
  /* We can only do one array worth at a time. */
  -   num = min(num, ARRAY_SIZE(vb-pfns));
  +   num = min(num, ARRAY_SIZE(pfns));
  
  -   for (vb-num_pfns = 0; vb-num_pfns  num;
  -vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
  -   page = list_first_entry(vb-pages, struct page, lru);
  -   list_del(page-lru);
  -   set_page_pfns(vb-pfns + vb-num_pfns, page);
  -   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
  +   for (num_pfns = 0; num_pfns  num;
  +num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {

Doh, here I need a while(num_pfns  num;)


  +   struct page *page = NULL;
  +   spin_lock(vb-pages_lock);
  +   __wait_on_isolated_pages(vb, num);
  +
  +   if (!list_empty(vb-pages))
  +   page = list_first_entry(vb-pages, struct page, 
  lru);
  +   /*
  +* Grab the page lock to avoid racing against threads 
  isolating
  +* pages from, or migrating pages back to vb-pages list.
  +* (both tasks are done under page lock protection)
  +*
  +* Failing to grab the page lock here means this page is 
  being
  +* isolated already, or its migration has not finished yet.
  +*/
  +   if (page  trylock_page(page)) {
  +   clear_balloon_mapping(page);
  +   list_del(page-lru);
  +   set_page_pfns(pfns + num_pfns, page);
  +   atomic_sub(VIRTIO_BALLOON_PAGES_PER_PAGE,
  +  vb-num_pages);

and here: num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;

  +   unlock_page(page);
  +   }
  +   spin_unlock(vb-pages_lock);
  }
  
  /*
  @@ -182,8 +251,10 @@ static void leak_balloon(struct virtio_balloon *vb, 
  size_t
  num)
   * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
   * is true, we *have* to do it in this order
   */
  +   mutex_lock(vb-balloon_lock);
  tell_host(vb, vb-deflate_vq);
  -   release_pages_by_pfn(vb-pfns, vb-num_pfns);
  +   mutex_unlock(vb-balloon_lock);
  +   release_pages_by_pfn(pfns, num_pfns);
   }
  ---8---

Well, I'll assume that's because I'm working for 14h already and running out of
caffeine.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-23 Thread Rik van Riel

On 08/23/2012 07:36 PM, Michael S. Tsirkin wrote:


---

virtio-balloon: replace page-lru list with page-private.

The point is to free up page-lru for use by compaction.
Warning: completely untested, will provide tested version
if we agree on this direction.


A singly linked list is not going to work for page migration,
which needs to get pages that might be in the middle of the
balloon list.

--
All rights reversed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-22 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 10:19:31PM -0300, Rafael Aquini wrote:
 On Wed, Aug 22, 2012 at 03:07:41AM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 21, 2012 at 05:45:56PM -0300, Rafael Aquini wrote:
   On Tue, Aug 21, 2012 at 10:30:31PM +0300, Michael S. Tsirkin wrote:
On Tue, Aug 21, 2012 at 04:23:58PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 10:13:30PM +0300, Michael S. Tsirkin wrote:
   
   I believe rcu_dereference_protected() is what I want/need here, 
   since this code
   is always called for pages which we hold locked (PG_locked bit).
  
  It would only help if we locked the page while updating the mapping,
  as far as I can see we don't.
 
 
 But we can do it. In fact, by doing it (locking the page) we can 
 easily avoid
 the nasty race balloon_isolate_page / leak_balloon, in a much simpler 
 way, IMHO.

Absolutely. Further, we should look hard at whether most RCU uses
in this patchset can be replaced with page lock.
   
   
   Yeah, In fact, by testing/grabbing the page lock at leak_balloon() even 
   the
   module unload X migration / putback race seems to fade away, since 
   migration
   code holds the page locked all the way.
   And that seems a quite easy task to be accomplished:
   
   
   @@ -169,21 +197,61 @@ static void leak_balloon(struct virtio_balloon *vb, 
   size_t
   num)
   /* We can only do one array worth at a time. */
   num = min(num, ARRAY_SIZE(vb-pfns));
   
   +   mutex_lock(vb-balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
   +   spin_lock(vb-pages_lock);
   +   /*
   +* 'virtballoon_isolatepage()' can drain vb-pages list
   +* making us to stumble across a _temporarily_ empty list.
  
  This still worries me. If this happens we do not
  lock the page so module can go away?
  if not need to document why.
 
 The module won't unload unless it leaks all its pages. If we hit that test 
 that
 worries you, leak_balloon() will get back to its caller -- remove_common(), 
 and
 it will kept looping at:
 
 /* There might be pages left in the balloon: free them. */
 while (vb-num_pages)
 leak_balloon(vb, vb-num_pages);
 
 This is true because we do not mess with vb-num_pages while 
 isolating/migrating
 balloon pages, so the module will only unload when all isolated pages get back
 to vb-pages_list and leak_balloon() reap them appropriatelly. As we will be
 doing isolation/migration/putback steps under 'page lock' that race is gone.


Hmm, so this will busy wait which is unelegant.
We need some event IMO.
Also, reading num_pages without a lock here
which seems wrong.

A similar concern applies to normal leaking
of the balloon: here we might leak less than
required, then wait for the next config change
event.

How about we signal config_change
event when pages are back to pages_list?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-22 Thread Rafael Aquini
On Wed, Aug 22, 2012 at 12:33:17PM +0300, Michael S. Tsirkin wrote:
 Hmm, so this will busy wait which is unelegant.
 We need some event IMO.

No, it does not busy wait. leak_balloon() is mutual exclusive with migration
steps, so for the case we have one racing against the other, we really want
leak_balloon() dropping the mutex temporarily to allow migration complete its
work of refilling vb-pages list. Also, leak_balloon() calls tell_host(), which
will potentially make it to schedule for each round of vb-pfns leak_balloon()
will release. So, when remove_common() calls leak_balloon() looping on
vb-num_pages, that won't become a tight loop. 
The scheme was apparently working before this series, and it will remain working
after it.


 Also, reading num_pages without a lock here
 which seems wrong.

I'll protect it with vb-balloon_lock mutex. That will be consistent with the
lock protection scheme this patch is introducing for struct virtio_balloon
elements.


 A similar concern applies to normal leaking
 of the balloon: here we might leak less than
 required, then wait for the next config change
 event.

Just as before, same thing here. If you leaked less than required, balloon()
will keep calling leak_balloon() until the balloon target is reached. This
scheme was working before, and it will keep working after this patch.


 How about we signal config_change
 event when pages are back to pages_list?

I really don't know what to tell you here, but, to me, it seems like an
overcomplication that isn't directly entangled with this patch purposes.
Besides, you cannot expect compation / migration happening and racing against
leak_balloon() all the time to make them signal events to the later, so we might
just be creating a wait-forever condition for leak_balloon(), IMHO.

Cheers!

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 include/linux/balloon_compaction.h | 113 +
 include/linux/pagemap.h|  18 +
 mm/Kconfig |  15 
 mm/Makefile|   2 +-
 mm/balloon_compaction.c| 145 +
 5 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
new file mode 100644
index 000..5fbf036
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,113 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini aqu...@redhat.com
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+#ifdef __KERNEL__
+
+#include linux/rcupdate.h
+#include linux/pagemap.h
+#include linux/gfp.h
+
+#if defined(CONFIG_BALLOON_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern void putback_balloon_page(struct page *);
+
+static inline bool movable_balloon_page(struct page *page)
+{
+   struct address_space *mapping = NULL;
+   bool ret = false;
+   /*
+* Before dereferencing and testing mapping-flags, lets make sure
+* this is not a page that uses -mapping in a different way
+*/
+   if (!PageSlab(page)  !PageSwapCache(page) 
+   !PageAnon(page)  !page_mapped(page)) {
+   rcu_read_lock();
+   mapping = rcu_dereference(page-mapping);
+   if (mapping_balloon(mapping))
+   ret = true;
+   rcu_read_unlock();
+   }
+   return ret;
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+   return GFP_HIGHUSER_MOVABLE;
+}
+
+/*
+ * __page_balloon_device - return the balloon device owing the page.
+ *
+ * This shall only be used at driver callbacks under proper page_lock,
+ * to get access to the balloon device structure that owns @page.
+ */
+static inline void *__page_balloon_device(struct page *page)
+{
+   struct address_space *mapping;
+   mapping = rcu_access_pointer(page-mapping);
+   if (mapping)
+   mapping = mapping-assoc_mapping;
+   return (void *)mapping;
+}
+
+#define count_balloon_event(e) count_vm_event(e)
+/*
+ * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
+ *  to be used as balloon page-mapping-a_ops.
+ *
+ * @label : declaration identifier (var name)
+ * @migratepg : callback symbol name for performing the page migration step
+ * @isolatepg : callback symbol name for performing the page isolation step
+ * @putbackpg : callback symbol name for performing the page putback step
+ *
+ * address_space_operations utilized methods for ballooned pages:
+ *   .migratepage- used to perform balloon's page migration (as is)
+ *   .launder_page   - used to isolate a page from balloon's page list
+ *   .freepage   - used to reinsert an isolated page to balloon's page list
+ */
+#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, putbackpg) \
+   const struct address_space_operations (label) = {   \
+   .migratepage  = (migratepg),\
+   .launder_page = (isolatepg),\
+   .freepage = (putbackpg),\
+   }
+
+#else
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline bool movable_balloon_page(struct page *page) { return false; }
+static inline void putback_balloon_page(struct page *page) { return; }
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+   return GFP_HIGHUSER;
+}
+
+#define count_balloon_event(e) {}
+#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, putbackpg) \
+   const struct address_space_operations *(label) = NULL
+
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+extern struct address_space *alloc_balloon_mapping(void *balloon_device,
+   const struct address_space_operations *a_ops);
+extern void *page_balloon_device(struct page *page);
+
+static inline void assign_balloon_mapping(struct page *page,
+   

Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 09:47:44AM -0300, Rafael Aquini wrote:
 Memory fragmentation introduced by ballooning might reduce significantly
 the number of 2MB contiguous memory blocks that can be used within a guest,
 thus imposing performance penalties associated with the reduced number of
 transparent huge pages that could be used by the guest workload.
 
 This patch introduces a common interface to help a balloon driver on
 making its page set movable to compaction, and thus allowing the system
 to better leverage the compation efforts on memory defragmentation.
 
 Signed-off-by: Rafael Aquini aqu...@redhat.com
 ---
  include/linux/balloon_compaction.h | 113 +
  include/linux/pagemap.h|  18 +
  mm/Kconfig |  15 
  mm/Makefile|   2 +-
  mm/balloon_compaction.c| 145 
 +
  5 files changed, 292 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/balloon_compaction.h
  create mode 100644 mm/balloon_compaction.c
 
 diff --git a/include/linux/balloon_compaction.h 
 b/include/linux/balloon_compaction.h
 new file mode 100644
 index 000..5fbf036
 --- /dev/null
 +++ b/include/linux/balloon_compaction.h
 @@ -0,0 +1,113 @@
 +/*
 + * include/linux/balloon_compaction.h
 + *
 + * Common interface definitions for making balloon pages movable to 
 compaction.
 + *
 + * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini aqu...@redhat.com
 + */
 +#ifndef _LINUX_BALLOON_COMPACTION_H
 +#define _LINUX_BALLOON_COMPACTION_H
 +#ifdef __KERNEL__
 +
 +#include linux/rcupdate.h
 +#include linux/pagemap.h
 +#include linux/gfp.h
 +
 +#if defined(CONFIG_BALLOON_COMPACTION)
 +extern bool isolate_balloon_page(struct page *);
 +extern void putback_balloon_page(struct page *);
 +
 +static inline bool movable_balloon_page(struct page *page)
 +{
 + struct address_space *mapping = NULL;
 + bool ret = false;
 + /*
 +  * Before dereferencing and testing mapping-flags, lets make sure
 +  * this is not a page that uses -mapping in a different way
 +  */
 + if (!PageSlab(page)  !PageSwapCache(page) 
 + !PageAnon(page)  !page_mapped(page)) {
 + rcu_read_lock();
 + mapping = rcu_dereference(page-mapping);
 + if (mapping_balloon(mapping))
 + ret = true;
 + rcu_read_unlock();

This looks suspicious: you drop rcu_read_unlock
so can't page switch from balloon to non balloon?

Even if correct, it's probably cleaner to just make caller
invoke this within an rcu critical section.
You will notice that all callers immediately re-enter
rcu critical section anyway.

Alternatively, I noted that accesses to page-mapping
seem protected by page lock bit.
If true we can rely on that instead of RCU, just need
assign_balloon_mapping to lock_page/unlock_page.

 + }
 + return ret;
 +}
 +
 +static inline gfp_t balloon_mapping_gfp_mask(void)
 +{
 + return GFP_HIGHUSER_MOVABLE;
 +}
 +
 +/*
 + * __page_balloon_device - return the balloon device owing the page.
 + *
 + * This shall only be used at driver callbacks under proper page_lock,
 + * to get access to the balloon device structure that owns @page.
 + */
 +static inline void *__page_balloon_device(struct page *page)
 +{
 + struct address_space *mapping;
 + mapping = rcu_access_pointer(page-mapping);
 + if (mapping)
 + mapping = mapping-assoc_mapping;
 + return (void *)mapping;
 +}
 +
 +#define count_balloon_event(e) count_vm_event(e)
 +/*
 + * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback 
 descriptor
 + *to be used as balloon page-mapping-a_ops.
 + *
 + * @label : declaration identifier (var name)
 + * @migratepg : callback symbol name for performing the page migration step
 + * @isolatepg : callback symbol name for performing the page isolation step
 + * @putbackpg : callback symbol name for performing the page putback step
 + *
 + * address_space_operations utilized methods for ballooned pages:
 + *   .migratepage- used to perform balloon's page migration (as is)
 + *   .launder_page   - used to isolate a page from balloon's page list
 + *   .freepage   - used to reinsert an isolated page to balloon's page 
 list
 + */

It would be a good idea to document the assumptions here.
Looks like .launder_page and .freepage are called in rcu critical
section.
But migratepage isn't - why is that safe?

 +#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, putbackpg) \
 + const struct address_space_operations (label) = {   \
 + .migratepage  = (migratepg),\
 + .launder_page = (isolatepg),\
 + .freepage = (putbackpg),\
 + }
 +
 +#else
 +static inline bool isolate_balloon_page(struct page *page) { return 

Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 04:52:23PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 09:47:44AM -0300, Rafael Aquini wrote:
  Memory fragmentation introduced by ballooning might reduce significantly
  the number of 2MB contiguous memory blocks that can be used within a guest,
  thus imposing performance penalties associated with the reduced number of
  transparent huge pages that could be used by the guest workload.
  
  This patch introduces a common interface to help a balloon driver on
  making its page set movable to compaction, and thus allowing the system
  to better leverage the compation efforts on memory defragmentation.
  
  Signed-off-by: Rafael Aquini aqu...@redhat.com
  ---
   include/linux/balloon_compaction.h | 113 +
   include/linux/pagemap.h|  18 +
   mm/Kconfig |  15 
   mm/Makefile|   2 +-
   mm/balloon_compaction.c| 145 
  +
   5 files changed, 292 insertions(+), 1 deletion(-)
   create mode 100644 include/linux/balloon_compaction.h
   create mode 100644 mm/balloon_compaction.c
  
  diff --git a/include/linux/balloon_compaction.h 
  b/include/linux/balloon_compaction.h
  new file mode 100644
  index 000..5fbf036
  --- /dev/null
  +++ b/include/linux/balloon_compaction.h
  @@ -0,0 +1,113 @@
  +/*
  + * include/linux/balloon_compaction.h
  + *
  + * Common interface definitions for making balloon pages movable to 
  compaction.
  + *
  + * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini aqu...@redhat.com
  + */
  +#ifndef _LINUX_BALLOON_COMPACTION_H
  +#define _LINUX_BALLOON_COMPACTION_H
  +#ifdef __KERNEL__
  +
  +#include linux/rcupdate.h
  +#include linux/pagemap.h
  +#include linux/gfp.h
  +
  +#if defined(CONFIG_BALLOON_COMPACTION)
  +extern bool isolate_balloon_page(struct page *);
  +extern void putback_balloon_page(struct page *);
  +
  +static inline bool movable_balloon_page(struct page *page)
  +{
  +   struct address_space *mapping = NULL;
  +   bool ret = false;
  +   /*
  +* Before dereferencing and testing mapping-flags, lets make sure
  +* this is not a page that uses -mapping in a different way
  +*/
  +   if (!PageSlab(page)  !PageSwapCache(page) 
  +   !PageAnon(page)  !page_mapped(page)) {
  +   rcu_read_lock();
  +   mapping = rcu_dereference(page-mapping);
  +   if (mapping_balloon(mapping))
  +   ret = true;
  +   rcu_read_unlock();
 
 This looks suspicious: you drop rcu_read_unlock
 so can't page switch from balloon to non balloon?
 
 Even if correct, it's probably cleaner to just make caller
 invoke this within an rcu critical section.
 You will notice that all callers immediately re-enter
 rcu critical section anyway.
 
 Alternatively, I noted that accesses to page-mapping
 seem protected by page lock bit.
 If true we can rely on that instead of RCU, just need
 assign_balloon_mapping to lock_page/unlock_page.

Actually not true in all cases: not in case
of putback.  Not sure whether code can be changed
to make it true.


  +   }
  +   return ret;
  +}
  +
  +static inline gfp_t balloon_mapping_gfp_mask(void)
  +{
  +   return GFP_HIGHUSER_MOVABLE;
  +}
  +
  +/*
  + * __page_balloon_device - return the balloon device owing the page.
  + *
  + * This shall only be used at driver callbacks under proper page_lock,
  + * to get access to the balloon device structure that owns @page.
  + */
  +static inline void *__page_balloon_device(struct page *page)
  +{
  +   struct address_space *mapping;
  +   mapping = rcu_access_pointer(page-mapping);
  +   if (mapping)
  +   mapping = mapping-assoc_mapping;
  +   return (void *)mapping;
  +}
  +
  +#define count_balloon_event(e) count_vm_event(e)
  +/*
  + * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback 
  descriptor
  + *  to be used as balloon page-mapping-a_ops.
  + *
  + * @label : declaration identifier (var name)
  + * @migratepg : callback symbol name for performing the page migration step
  + * @isolatepg : callback symbol name for performing the page isolation step
  + * @putbackpg : callback symbol name for performing the page putback step
  + *
  + * address_space_operations utilized methods for ballooned pages:
  + *   .migratepage- used to perform balloon's page migration (as is)
  + *   .launder_page   - used to isolate a page from balloon's page list
  + *   .freepage   - used to reinsert an isolated page to balloon's page 
  list
  + */
 
 It would be a good idea to document the assumptions here.
 Looks like .launder_page and .freepage are called in rcu critical
 section.
 But migratepage isn't - why is that safe?
 
  +#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, 
  putbackpg) \
  +   const struct address_space_operations (label) = {   \
  +   .migratepage  = (migratepg),

Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 05:16:06PM +0200, Peter Zijlstra wrote:
 On Tue, 2012-08-21 at 16:52 +0300, Michael S. Tsirkin wrote:
   + rcu_read_lock();
   + mapping = rcu_dereference(page-mapping);
   + if (mapping_balloon(mapping))
   + ret = true;
   + rcu_read_unlock();
  
  This looks suspicious: you drop rcu_read_unlock
  so can't page switch from balloon to non balloon? 
 
 RCU read lock is a non-exclusive lock, it cannot avoid anything like
 that.

You are right, of course. So even keeping rcu_read_lock across both test
and operation won't be enough - you need to make this function return
the mapping and pass it to isolate_page/putback_page so that it is only
dereferenced once.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Peter Zijlstra
On Tue, 2012-08-21 at 16:52 +0300, Michael S. Tsirkin wrote:
  + rcu_read_lock();
  + mapping = rcu_dereference(page-mapping);
  + if (mapping_balloon(mapping))
  + ret = true;
  + rcu_read_unlock();
 
 This looks suspicious: you drop rcu_read_unlock
 so can't page switch from balloon to non balloon? 

RCU read lock is a non-exclusive lock, it cannot avoid anything like
that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Peter Zijlstra
On Tue, 2012-08-21 at 09:47 -0300, Rafael Aquini wrote:
 +   mapping = rcu_access_pointer(page-mapping);
 +   if (mapping)
 +   mapping = mapping-assoc_mapping; 

The comment near rcu_access_pointer() explicitly says:

 * Return the value of the specified RCU-protected pointer, but omit the
 * smp_read_barrier_depends() and keep the ACCESS_ONCE().  This is useful
 * when the value of this pointer is accessed, but the pointer is not
 * dereferenced,

Yet you dereference the pointer... smells like fail to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Paul E. McKenney
On Tue, Aug 21, 2012 at 05:20:11PM +0200, Peter Zijlstra wrote:
 On Tue, 2012-08-21 at 09:47 -0300, Rafael Aquini wrote:
  +   mapping = rcu_access_pointer(page-mapping);
  +   if (mapping)
  +   mapping = mapping-assoc_mapping; 
 
 The comment near rcu_access_pointer() explicitly says:
 
  * Return the value of the specified RCU-protected pointer, but omit the
  * smp_read_barrier_depends() and keep the ACCESS_ONCE().  This is useful
  * when the value of this pointer is accessed, but the pointer is not
  * dereferenced,
 
 Yet you dereference the pointer... smells like fail to me.

Indeed!

This will break DEC Alpha.  In addition, if -mapping can transition
from non-NULL to NULL, and if you used rcu_access_pointer() rather
than rcu_dereference() to avoid lockdep-RCU from yelling at you about
not either being in an RCU read-side critical section or holding an
update-side lock, you can see failures as follows:

1.  CPU 0 runs the above code, picks up mapping, and finds it non-NULL.

2.  CPU 0 is preempted or otherwise delayed.  (Keep in mind that
even disabling interrupts in a guest OS does not prevent the
host hypervisor from preempting!)

3.  Some other CPU NULLs page-mapping.  Because CPU 0 isn't doing
anything to prevent it, this other CPU frees the memory.

4.  CPU 0 resumes, and then accesses what is now the freelist.
Arbitrarily bad things start happening.

If you are in a read-side critical section, use rcu_dereference() instead
of rcu_access_pointer().  If you are holding an update-side lock, use
rcu_dereference_protected() and say what lock you are holding.  If you
are doing something else, please say what it is.

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 06:41:42PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 05:16:06PM +0200, Peter Zijlstra wrote:
  On Tue, 2012-08-21 at 16:52 +0300, Michael S. Tsirkin wrote:
+ rcu_read_lock();
+ mapping = rcu_dereference(page-mapping);
+ if (mapping_balloon(mapping))
+ ret = true;
+ rcu_read_unlock();
   
   This looks suspicious: you drop rcu_read_unlock
   so can't page switch from balloon to non balloon? 
  
  RCU read lock is a non-exclusive lock, it cannot avoid anything like
  that.
 
 You are right, of course. So even keeping rcu_read_lock across both test
 and operation won't be enough - you need to make this function return
 the mapping and pass it to isolate_page/putback_page so that it is only
 dereferenced once.

No, I need to dereference page-mapping to check -mapping flags here, before
returning. Remember this function is used at MM's compaction/migration inner
circles to identify ballooned pages and decide what's the next step. This
function is doing the right thing, IMHO.

Also, looking at how compaction/migration work, we verify the only critical path
for this function is the page isolation step. The other steps (migration and
putback) perform their work on private lists previouly isolated from a given
source.

So, we just need to make sure that the isolation part does not screw things up
by isolating pages that balloon driver is about to release. That's why there are
so many checkpoints down the page isolation path assuring we really are
isolating a balloon page. 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 04:52:23PM +0300, Michael S. Tsirkin wrote:
  + * address_space_operations utilized methods for ballooned pages:
  + *   .migratepage- used to perform balloon's page migration (as is)
  + *   .launder_page   - used to isolate a page from balloon's page list
  + *   .freepage   - used to reinsert an isolated page to balloon's page 
  list
  + */
 
 It would be a good idea to document the assumptions here.
 Looks like .launder_page and .freepage are called in rcu critical
 section.
 But migratepage isn't - why is that safe?
 

The migratepage callback for virtio_balloon can sleep, and IIUC we cannot sleep
within a RCU critical section. 

Also, The migratepage callback is called at inner migration's circle function
move_to_new_page(), and I don't think embedding it in a RCU critical section
would be a good idea, for the same understanding aforementioned.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 09:24:32AM -0700, Paul E. McKenney wrote:
 On Tue, Aug 21, 2012 at 05:20:11PM +0200, Peter Zijlstra wrote:
  On Tue, 2012-08-21 at 09:47 -0300, Rafael Aquini wrote:
   +   mapping = rcu_access_pointer(page-mapping);
   +   if (mapping)
   +   mapping = mapping-assoc_mapping; 
  
  The comment near rcu_access_pointer() explicitly says:
  
   * Return the value of the specified RCU-protected pointer, but omit the
   * smp_read_barrier_depends() and keep the ACCESS_ONCE().  This is useful
   * when the value of this pointer is accessed, but the pointer is not
   * dereferenced,
  
  Yet you dereference the pointer... smells like fail to me.
 
 Indeed!
 
 This will break DEC Alpha.  In addition, if -mapping can transition
 from non-NULL to NULL, and if you used rcu_access_pointer() rather
 than rcu_dereference() to avoid lockdep-RCU from yelling at you about
 not either being in an RCU read-side critical section or holding an
 update-side lock, you can see failures as follows:
 
 1.CPU 0 runs the above code, picks up mapping, and finds it non-NULL.
 
 2.CPU 0 is preempted or otherwise delayed.  (Keep in mind that
   even disabling interrupts in a guest OS does not prevent the
   host hypervisor from preempting!)
 
 3.Some other CPU NULLs page-mapping.  Because CPU 0 isn't doing
   anything to prevent it, this other CPU frees the memory.
 
 4.CPU 0 resumes, and then accesses what is now the freelist.
   Arbitrarily bad things start happening.
 
 If you are in a read-side critical section, use rcu_dereference() instead
 of rcu_access_pointer().  If you are holding an update-side lock, use
 rcu_dereference_protected() and say what lock you are holding.  If you
 are doing something else, please say what it is.
 
   Thanx, Paul

Paul  Peter,

Thanks for looking into this stuff and providing me such valuable feedback, and
RCU usage crashcourse.

I believe rcu_dereference_protected() is what I want/need here, since this code
is always called for pages which we hold locked (PG_locked bit). So, it brings 
me
to ask you if the following usage looks sane enough to fix the well pointed 
issue,
or if it's another misuse of RCU API:

+   mapping = rcu_dereference_protecetd(page-mapping, PageLocked(page));
+   if (mapping)
+   mapping = mapping-assoc_mapping; 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 02:28:20PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 09:24:32AM -0700, Paul E. McKenney wrote:
  On Tue, Aug 21, 2012 at 05:20:11PM +0200, Peter Zijlstra wrote:
   On Tue, 2012-08-21 at 09:47 -0300, Rafael Aquini wrote:
+   mapping = rcu_access_pointer(page-mapping);
+   if (mapping)
+   mapping = mapping-assoc_mapping; 
   
   The comment near rcu_access_pointer() explicitly says:
   
* Return the value of the specified RCU-protected pointer, but omit the
* smp_read_barrier_depends() and keep the ACCESS_ONCE().  This is useful
* when the value of this pointer is accessed, but the pointer is not
* dereferenced,
   
   Yet you dereference the pointer... smells like fail to me.
  
  Indeed!
  
  This will break DEC Alpha.  In addition, if -mapping can transition
  from non-NULL to NULL, and if you used rcu_access_pointer() rather
  than rcu_dereference() to avoid lockdep-RCU from yelling at you about
  not either being in an RCU read-side critical section or holding an
  update-side lock, you can see failures as follows:
  
  1.  CPU 0 runs the above code, picks up mapping, and finds it non-NULL.
  
  2.  CPU 0 is preempted or otherwise delayed.  (Keep in mind that
  even disabling interrupts in a guest OS does not prevent the
  host hypervisor from preempting!)
  
  3.  Some other CPU NULLs page-mapping.  Because CPU 0 isn't doing
  anything to prevent it, this other CPU frees the memory.
  
  4.  CPU 0 resumes, and then accesses what is now the freelist.
  Arbitrarily bad things start happening.
  
  If you are in a read-side critical section, use rcu_dereference() instead
  of rcu_access_pointer().  If you are holding an update-side lock, use
  rcu_dereference_protected() and say what lock you are holding.  If you
  are doing something else, please say what it is.
  
  Thanx, Paul
 
 Paul  Peter,
 
 Thanks for looking into this stuff and providing me such valuable feedback, 
 and
 RCU usage crashcourse.
 
 I believe rcu_dereference_protected() is what I want/need here, since this 
 code
 is always called for pages which we hold locked (PG_locked bit).

It would only help if we locked the page while updating the mapping,
as far as I can see we don't.

 So, it brings me
 to ask you if the following usage looks sane enough to fix the well pointed 
 issue,
 or if it's another misuse of RCU API:
 
 +   mapping = rcu_dereference_protecetd(page-mapping, PageLocked(page));
 +   if (mapping)
 +   mapping = mapping-assoc_mapping; 
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 02:55:03PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 04:52:23PM +0300, Michael S. Tsirkin wrote:
   + * address_space_operations utilized methods for ballooned pages:
   + *   .migratepage- used to perform balloon's page migration (as is)
   + *   .launder_page   - used to isolate a page from balloon's page list
   + *   .freepage   - used to reinsert an isolated page to balloon's 
   page list
   + */
  
  It would be a good idea to document the assumptions here.
  Looks like .launder_page and .freepage are called in rcu critical
  section.
  But migratepage isn't - why is that safe?
  
 
 The migratepage callback for virtio_balloon can sleep, and IIUC we cannot 
 sleep
 within a RCU critical section. 
 
 Also, The migratepage callback is called at inner migration's circle function
 move_to_new_page(), and I don't think embedding it in a RCU critical section
 would be a good idea, for the same understanding aforementioned.

Yes but this means it is still exposed to the module unloading
races that RCU was supposed to fix.
So need to either rework that code so it won't sleep
or switch to some other synchronization.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 10:13:30PM +0300, Michael S. Tsirkin wrote:
  
  I believe rcu_dereference_protected() is what I want/need here, since this 
  code
  is always called for pages which we hold locked (PG_locked bit).
 
 It would only help if we locked the page while updating the mapping,
 as far as I can see we don't.


But we can do it. In fact, by doing it (locking the page) we can easily avoid
the nasty race balloon_isolate_page / leak_balloon, in a much simpler way, IMHO.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 02:42:52PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 06:41:42PM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 21, 2012 at 05:16:06PM +0200, Peter Zijlstra wrote:
   On Tue, 2012-08-21 at 16:52 +0300, Michael S. Tsirkin wrote:
 + rcu_read_lock();
 + mapping = rcu_dereference(page-mapping);
 + if (mapping_balloon(mapping))
 + ret = true;
 + rcu_read_unlock();

This looks suspicious: you drop rcu_read_unlock
so can't page switch from balloon to non balloon? 
   
   RCU read lock is a non-exclusive lock, it cannot avoid anything like
   that.
  
  You are right, of course. So even keeping rcu_read_lock across both test
  and operation won't be enough - you need to make this function return
  the mapping and pass it to isolate_page/putback_page so that it is only
  dereferenced once.
 
 No, I need to dereference page-mapping to check -mapping flags here, before
 returning. Remember this function is used at MM's compaction/migration inner
 circles to identify ballooned pages and decide what's the next step. This
 function is doing the right thing, IMHO.

Yes but the calling code is not doing the right thing.

What Peter pointed out here is that two calls to rcu dereference pointer
can return different values: rcu critical section is not a lock.
So the test for balloon page is not effective: it can change
after the fact.

To fix, get the pointer once and then pass the mapping
around.


 Also, looking at how compaction/migration work, we verify the only critical 
 path
 for this function is the page isolation step. The other steps (migration and
 putback) perform their work on private lists previouly isolated from a given
 source.

I vaguely understand but it would be nice to document this properly.
The interaction between page-lru handling in balloon and in mm
is especially confusing.

 So, we just need to make sure that the isolation part does not screw things up
 by isolating pages that balloon driver is about to release. That's why there 
 are
 so many checkpoints down the page isolation path assuring we really are
 isolating a balloon page. 

Well, testing same thing multiple times is just confusing.  It is very
hard to make sure there are no races with so much complexity,
and the requirements from the balloon driver are unclear to me -
it very much looks like it is poking in mm internals.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 04:23:58PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 10:13:30PM +0300, Michael S. Tsirkin wrote:
   
   I believe rcu_dereference_protected() is what I want/need here, since 
   this code
   is always called for pages which we hold locked (PG_locked bit).
  
  It would only help if we locked the page while updating the mapping,
  as far as I can see we don't.
 
 
 But we can do it. In fact, by doing it (locking the page) we can easily avoid
 the nasty race balloon_isolate_page / leak_balloon, in a much simpler way, 
 IMHO.

Absolutely. Further, we should look hard at whether most RCU uses
in this patchset can be replaced with page lock.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 10:16:12PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 02:55:03PM -0300, Rafael Aquini wrote:
  On Tue, Aug 21, 2012 at 04:52:23PM +0300, Michael S. Tsirkin wrote:
+ * address_space_operations utilized methods for ballooned pages:
+ *   .migratepage- used to perform balloon's page migration (as is)
+ *   .launder_page   - used to isolate a page from balloon's page list
+ *   .freepage   - used to reinsert an isolated page to balloon's 
page list
+ */
   
   It would be a good idea to document the assumptions here.
   Looks like .launder_page and .freepage are called in rcu critical
   section.
   But migratepage isn't - why is that safe?
   
  
  The migratepage callback for virtio_balloon can sleep, and IIUC we cannot 
  sleep
  within a RCU critical section. 
  
  Also, The migratepage callback is called at inner migration's circle 
  function
  move_to_new_page(), and I don't think embedding it in a RCU critical section
  would be a good idea, for the same understanding aforementioned.
 
 Yes but this means it is still exposed to the module unloading
 races that RCU was supposed to fix.
 So need to either rework that code so it won't sleep
 or switch to some other synchronization.

Can you refactor tell_host() to not sleep? Or, can I get rid of calling it at
virtballoon_migratepage()? If 'no' is the answer for both questions, that's the
way that code has to remain, even if we find a way around to hack the
migratepage callback and have it embedded into a RCU crit section.

That's why I believe once the balloon driver is commanded to unload, we must
flag virtballoon_migratepage to skip it's work. By doing this, the thread
performing memory compaction will have to recur to the 'putback' path which is
RCU protected. (IMHO).

As the module will not uload utill it leaks all pages on its list, that unload
race you pointed before will be covered.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Tue, Aug 21, 2012 at 10:30:31PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 04:23:58PM -0300, Rafael Aquini wrote:
  On Tue, Aug 21, 2012 at 10:13:30PM +0300, Michael S. Tsirkin wrote:

I believe rcu_dereference_protected() is what I want/need here, since 
this code
is always called for pages which we hold locked (PG_locked bit).
   
   It would only help if we locked the page while updating the mapping,
   as far as I can see we don't.
  
  
  But we can do it. In fact, by doing it (locking the page) we can easily 
  avoid
  the nasty race balloon_isolate_page / leak_balloon, in a much simpler way, 
  IMHO.
 
 Absolutely. Further, we should look hard at whether most RCU uses
 in this patchset can be replaced with page lock.


Yeah, In fact, by testing/grabbing the page lock at leak_balloon() even the
module unload X migration / putback race seems to fade away, since migration
code holds the page locked all the way.

And that seems a quite easy task to be accomplished:


@@ -169,21 +197,61 @@ static void leak_balloon(struct virtio_balloon *vb, size_t
num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));

+   mutex_lock(vb-balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   spin_lock(vb-pages_lock);
+   /*
+* 'virtballoon_isolatepage()' can drain vb-pages list
+* making us to stumble across a _temporarily_ empty list.
+*
+* Release the spinlock and resume from here in order to
+* give page migration a shot to refill vb-pages list.
+*/
+   if (unlikely(list_empty(vb-pages))) {
+   spin_unlock(vb-pages_lock);
+   break;
+   }
+
page = list_first_entry(vb-pages, struct page, lru);
+
+   /*
+* Grab the page lock to avoid racing against threads isolating
+* pages from vb-pages list (it's done under page lock).
+*
+* Failing to grab the page lock here means this page has been
+* selected for isolation already.
+*/
+   if (!trylock_page(page)) {
+   spin_unlock(vb-pages_lock);
+   break;
+   }
+
+   clear_balloon_mapping(page);
list_del(page-lru);
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
+   unlock_page(page);
+   spin_unlock(vb-pages_lock);
}

.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 04:34:39PM -0300, Rafael Aquini wrote:
 On Tue, Aug 21, 2012 at 10:16:12PM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 21, 2012 at 02:55:03PM -0300, Rafael Aquini wrote:
   On Tue, Aug 21, 2012 at 04:52:23PM +0300, Michael S. Tsirkin wrote:
 + * address_space_operations utilized methods for ballooned pages:
 + *   .migratepage- used to perform balloon's page migration (as 
 is)
 + *   .launder_page   - used to isolate a page from balloon's page 
 list
 + *   .freepage   - used to reinsert an isolated page to 
 balloon's page list
 + */

It would be a good idea to document the assumptions here.
Looks like .launder_page and .freepage are called in rcu critical
section.
But migratepage isn't - why is that safe?

   
   The migratepage callback for virtio_balloon can sleep, and IIUC we cannot 
   sleep
   within a RCU critical section. 
   
   Also, The migratepage callback is called at inner migration's circle 
   function
   move_to_new_page(), and I don't think embedding it in a RCU critical 
   section
   would be a good idea, for the same understanding aforementioned.
  
  Yes but this means it is still exposed to the module unloading
  races that RCU was supposed to fix.
  So need to either rework that code so it won't sleep
  or switch to some other synchronization.
 
 Can you refactor tell_host() to not sleep? Or, can I get rid of calling it at
 virtballoon_migratepage()? If 'no' is the answer for both questions, that's 
 the
 way that code has to remain, even if we find a way around to hack the
 migratepage callback and have it embedded into a RCU crit section.
 
 That's why I believe once the balloon driver is commanded to unload, we must
 flag virtballoon_migratepage to skip it's work. By doing this, the thread
 performing memory compaction will have to recur to the 'putback' path which is
 RCU protected. (IMHO).
 
 As the module will not uload utill it leaks all pages on its list, that unload
 race you pointed before will be covered.


It can not be: nothing callback does can prevent it from
running after module unload: you must have some synchronization
in the calling code.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Rafael Aquini
On Wed, Aug 22, 2012 at 03:07:41AM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 05:45:56PM -0300, Rafael Aquini wrote:
  On Tue, Aug 21, 2012 at 10:30:31PM +0300, Michael S. Tsirkin wrote:
   On Tue, Aug 21, 2012 at 04:23:58PM -0300, Rafael Aquini wrote:
On Tue, Aug 21, 2012 at 10:13:30PM +0300, Michael S. Tsirkin wrote:
  
  I believe rcu_dereference_protected() is what I want/need here, 
  since this code
  is always called for pages which we hold locked (PG_locked bit).
 
 It would only help if we locked the page while updating the mapping,
 as far as I can see we don't.


But we can do it. In fact, by doing it (locking the page) we can easily 
avoid
the nasty race balloon_isolate_page / leak_balloon, in a much simpler 
way, IMHO.
   
   Absolutely. Further, we should look hard at whether most RCU uses
   in this patchset can be replaced with page lock.
  
  
  Yeah, In fact, by testing/grabbing the page lock at leak_balloon() even the
  module unload X migration / putback race seems to fade away, since migration
  code holds the page locked all the way.
  And that seems a quite easy task to be accomplished:
  
  
  @@ -169,21 +197,61 @@ static void leak_balloon(struct virtio_balloon *vb, 
  size_t
  num)
  /* We can only do one array worth at a time. */
  num = min(num, ARRAY_SIZE(vb-pfns));
  
  +   mutex_lock(vb-balloon_lock);
  for (vb-num_pfns = 0; vb-num_pfns  num;
   vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
  +   spin_lock(vb-pages_lock);
  +   /*
  +* 'virtballoon_isolatepage()' can drain vb-pages list
  +* making us to stumble across a _temporarily_ empty list.
 
 This still worries me. If this happens we do not
 lock the page so module can go away?
 if not need to document why.

The module won't unload unless it leaks all its pages. If we hit that test that
worries you, leak_balloon() will get back to its caller -- remove_common(), and
it will kept looping at:

/* There might be pages left in the balloon: free them. */
while (vb-num_pages)
leak_balloon(vb, vb-num_pages);

This is true because we do not mess with vb-num_pages while isolating/migrating
balloon pages, so the module will only unload when all isolated pages get back
to vb-pages_list and leak_balloon() reap them appropriatelly. As we will be
doing isolation/migration/putback steps under 'page lock' that race is gone.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization