Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 03:01:37PM +0930, Rusty Russell wrote:
> On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> > > I was thinking of exactly that page->mapping == balloon_mapping check. As 
> > > I
> > > do not know how many active balloon drivers there might be I cannot guess
> > > in advance how much of a scalability problem it will be.
> > 
> > Not at all sure multiple drivers are worth supporting, but multiple
> > *devices* is I think worth supporting, if for no other reason than that
> > they can work today. For that, we need a device pointer which Rafael
> > wants to put into the mapping, this means multiple balloon mappings.
> 
> Rafael, please make sure that the balloon driver fails on the second and
> subsequent balloon devices.
> 
> Michael, we only allow multiple balloon devices because it fell out of
> the implementation.  If it causes us even the slightest issue, we should
> not support it.  It's not a sensible setup.
> 
> Cheers,
> Rusty.

Looks like latest revision does it using a flag which seems cleaner,
so I think the point is moot.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-21 Thread Rusty Russell
On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin"  
wrote:
> On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> > I was thinking of exactly that page->mapping == balloon_mapping check. As I
> > do not know how many active balloon drivers there might be I cannot guess
> > in advance how much of a scalability problem it will be.
> 
> Not at all sure multiple drivers are worth supporting, but multiple
> *devices* is I think worth supporting, if for no other reason than that
> they can work today. For that, we need a device pointer which Rafael
> wants to put into the mapping, this means multiple balloon mappings.

Rafael, please make sure that the balloon driver fails on the second and
subsequent balloon devices.

Michael, we only allow multiple balloon devices because it fell out of
the implementation.  If it causes us even the slightest issue, we should
not support it.  It's not a sensible setup.

Cheers,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-21 Thread Rusty Russell
On Wed, 15 Aug 2012 14:28:51 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
  I was thinking of exactly that page-mapping == balloon_mapping check. As I
  do not know how many active balloon drivers there might be I cannot guess
  in advance how much of a scalability problem it will be.
 
 Not at all sure multiple drivers are worth supporting, but multiple
 *devices* is I think worth supporting, if for no other reason than that
 they can work today. For that, we need a device pointer which Rafael
 wants to put into the mapping, this means multiple balloon mappings.

Rafael, please make sure that the balloon driver fails on the second and
subsequent balloon devices.

Michael, we only allow multiple balloon devices because it fell out of
the implementation.  If it causes us even the slightest issue, we should
not support it.  It's not a sensible setup.

Cheers,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 03:01:37PM +0930, Rusty Russell wrote:
 On Wed, 15 Aug 2012 14:28:51 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
   I was thinking of exactly that page-mapping == balloon_mapping check. As 
   I
   do not know how many active balloon drivers there might be I cannot guess
   in advance how much of a scalability problem it will be.
  
  Not at all sure multiple drivers are worth supporting, but multiple
  *devices* is I think worth supporting, if for no other reason than that
  they can work today. For that, we need a device pointer which Rafael
  wants to put into the mapping, this means multiple balloon mappings.
 
 Rafael, please make sure that the balloon driver fails on the second and
 subsequent balloon devices.
 
 Michael, we only allow multiple balloon devices because it fell out of
 the implementation.  If it causes us even the slightest issue, we should
 not support it.  It's not a sensible setup.
 
 Cheers,
 Rusty.

Looks like latest revision does it using a flag which seems cleaner,
so I think the point is moot.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-19 Thread Michael S. Tsirkin
On Mon, Aug 20, 2012 at 11:59:11AM +0930, Rusty Russell wrote:
> On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > > > now CPU1 executes the next instruction:
> > > > > > 
> > > > > > }
> > > > > > 
> > > > > > which would normally return to function's caller,
> > > > > > but it has been overwritten by CPU2 so we get corruption.
> > > > > > 
> > > > > > No?
> > > > > 
> > > > > At the point CPU2 is unloading the module, it will be kept looping at 
> > > > > the
> > > > > snippet Rusty pointed out because the isolation / migration steps do 
> > > > > not mess
> > > > > with 'vb->num_pages'. The driver will only unload after leaking the 
> > > > > total amount
> > > > > of balloon's inflated pages, which means (for this hypothetical case) 
> > > > > CPU2 will
> > > > > wait until CPU1 finishes the putaback procedure.
> > > > > 
> > > > 
> > > > Yes but only until unlock finishes. The last return from function
> > > > is not guarded and can be overwritten.
> > > 
> > > CPU1 will be returning to putback_balloon_page() which code is located at 
> > > core
> > > mm/compaction.c, outside the driver.
> > 
> > Sorry, I don't seem to be able to articulate this clearly.
> > But this is a correctness issue so I am compelled to try again.
> 
> But if there are 0 balloon pages, how is it migrating a page?

It could be we just finished migrating a page
dropped page lock and are 1 instruction away from
returning from callback.

> > In the end the rule is simple: you can not
> > prevent module unloading from within module
> > itself. It always must be the caller of your
> > module that uses some lock to do this.
> 
> Not quite.  If you clean up everything in your cleanup function, it also
> works,

No, we also need a way to make sure we returned
to caller, this is missing here.

> which is what this does, right?
> 
> Cheers,
> Rusty.


This makes sure callback was invoked but not that it returned
to caller.

All will be well if callbacks are done in rcu critical section
and we synchronise it before unload.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-19 Thread Rusty Russell
On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin"  
wrote:
> On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > > now CPU1 executes the next instruction:
> > > > > 
> > > > > }
> > > > > 
> > > > > which would normally return to function's caller,
> > > > > but it has been overwritten by CPU2 so we get corruption.
> > > > > 
> > > > > No?
> > > > 
> > > > At the point CPU2 is unloading the module, it will be kept looping at 
> > > > the
> > > > snippet Rusty pointed out because the isolation / migration steps do 
> > > > not mess
> > > > with 'vb->num_pages'. The driver will only unload after leaking the 
> > > > total amount
> > > > of balloon's inflated pages, which means (for this hypothetical case) 
> > > > CPU2 will
> > > > wait until CPU1 finishes the putaback procedure.
> > > > 
> > > 
> > > Yes but only until unlock finishes. The last return from function
> > > is not guarded and can be overwritten.
> > 
> > CPU1 will be returning to putback_balloon_page() which code is located at 
> > core
> > mm/compaction.c, outside the driver.
> 
> Sorry, I don't seem to be able to articulate this clearly.
> But this is a correctness issue so I am compelled to try again.

But if there are 0 balloon pages, how is it migrating a page?

> In the end the rule is simple: you can not
> prevent module unloading from within module
> itself. It always must be the caller of your
> module that uses some lock to do this.

Not quite.  If you clean up everything in your cleanup function, it also
works, which is what this does, right?

Cheers,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-19 Thread Rusty Russell
On Wed, 15 Aug 2012 17:40:19 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
 now CPU1 executes the next instruction:
 
 }
 
 which would normally return to function's caller,
 but it has been overwritten by CPU2 so we get corruption.
 
 No?

At the point CPU2 is unloading the module, it will be kept looping at 
the
snippet Rusty pointed out because the isolation / migration steps do 
not mess
with 'vb-num_pages'. The driver will only unload after leaking the 
total amount
of balloon's inflated pages, which means (for this hypothetical case) 
CPU2 will
wait until CPU1 finishes the putaback procedure.

   
   Yes but only until unlock finishes. The last return from function
   is not guarded and can be overwritten.
  
  CPU1 will be returning to putback_balloon_page() which code is located at 
  core
  mm/compaction.c, outside the driver.
 
 Sorry, I don't seem to be able to articulate this clearly.
 But this is a correctness issue so I am compelled to try again.

But if there are 0 balloon pages, how is it migrating a page?

 In the end the rule is simple: you can not
 prevent module unloading from within module
 itself. It always must be the caller of your
 module that uses some lock to do this.

Not quite.  If you clean up everything in your cleanup function, it also
works, which is what this does, right?

Cheers,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-19 Thread Michael S. Tsirkin
On Mon, Aug 20, 2012 at 11:59:11AM +0930, Rusty Russell wrote:
 On Wed, 15 Aug 2012 17:40:19 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
   On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
  now CPU1 executes the next instruction:
  
  }
  
  which would normally return to function's caller,
  but it has been overwritten by CPU2 so we get corruption.
  
  No?
 
 At the point CPU2 is unloading the module, it will be kept looping at 
 the
 snippet Rusty pointed out because the isolation / migration steps do 
 not mess
 with 'vb-num_pages'. The driver will only unload after leaking the 
 total amount
 of balloon's inflated pages, which means (for this hypothetical case) 
 CPU2 will
 wait until CPU1 finishes the putaback procedure.
 

Yes but only until unlock finishes. The last return from function
is not guarded and can be overwritten.
   
   CPU1 will be returning to putback_balloon_page() which code is located at 
   core
   mm/compaction.c, outside the driver.
  
  Sorry, I don't seem to be able to articulate this clearly.
  But this is a correctness issue so I am compelled to try again.
 
 But if there are 0 balloon pages, how is it migrating a page?

It could be we just finished migrating a page
dropped page lock and are 1 instruction away from
returning from callback.

  In the end the rule is simple: you can not
  prevent module unloading from within module
  itself. It always must be the caller of your
  module that uses some lock to do this.
 
 Not quite.  If you clean up everything in your cleanup function, it also
 works,

No, we also need a way to make sure we returned
to caller, this is missing here.

 which is what this does, right?
 
 Cheers,
 Rusty.


This makes sure callback was invoked but not that it returned
to caller.

All will be well if callbacks are done in rcu critical section
and we synchronise it before unload.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Rik van Riel

On 08/14/2012 05:38 PM, Michael S. Tsirkin wrote:


And even ignoring that, global pointer to a device
is an ugly hack and ugly hacks tend to explode.

And even ignoring estetics, and if we decide we are fine
with a single balloon, it needs to fail gracefully not
crash like it does now.


Fair enough.  That certainly seems easy enough to fix.

Each balloon driver can have its own struct address_space,
and simply point mapping->host (or any of the others) at
a global balloon thing somewhere.

if (page->mapping && page->mapping->host == balloon_magic)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > now CPU1 executes the next instruction:
> > > > 
> > > > }
> > > > 
> > > > which would normally return to function's caller,
> > > > but it has been overwritten by CPU2 so we get corruption.
> > > > 
> > > > No?
> > > 
> > > At the point CPU2 is unloading the module, it will be kept looping at the
> > > snippet Rusty pointed out because the isolation / migration steps do not 
> > > mess
> > > with 'vb->num_pages'. The driver will only unload after leaking the total 
> > > amount
> > > of balloon's inflated pages, which means (for this hypothetical case) 
> > > CPU2 will
> > > wait until CPU1 finishes the putaback procedure.
> > > 
> > 
> > Yes but only until unlock finishes. The last return from function
> > is not guarded and can be overwritten.
> 
> CPU1 will be returning to putback_balloon_page() which code is located at core
> mm/compaction.c, outside the driver.

Sorry, I don't seem to be able to articulate this clearly.
But this is a correctness issue so I am compelled to try again.

Here is some pseudo code:

int pages_lock;

void virtballoon_isolatepage(void *page, unsigned long mode)
{
   pages_lock = 0;
}

assignment of 0 emulates spin unlock.
I removed all other content. Now look at disassembly:

080483d0 :
virtballoon_isolatepage():
 80483d0:   c7 05 88 96 04 08 00movl   $0x0,0x8049688
 80483d7:   00 00 00 

<--- Above is "spin unlock"

 80483da:   c3  ret


Here we are still executing module code (one instruction of it!)
after we have dropped the lock.


So if module goes away at the point marked by <
above (and nothing seems to prevent that,
since pages_lock is unlocked), the last instruction can get overwritten
and then random code will get executed instead.

In the end the rule is simple: you can not
prevent module unloading from within module
itself. It always must be the caller of your
module that uses some lock to do this.

My proposal is to use rcu for this since it is lightweight
and also does not require us to pass extra
state around.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > now CPU1 executes the next instruction:
> > > 
> > > }
> > > 
> > > which would normally return to function's caller,
> > > but it has been overwritten by CPU2 so we get corruption.
> > > 
> > > No?
> > 
> > At the point CPU2 is unloading the module, it will be kept looping at the
> > snippet Rusty pointed out because the isolation / migration steps do not 
> > mess
> > with 'vb->num_pages'. The driver will only unload after leaking the total 
> > amount
> > of balloon's inflated pages, which means (for this hypothetical case) CPU2 
> > will
> > wait until CPU1 finishes the putaback procedure.
> > 
> 
> Yes but only until unlock finishes. The last return from function
> is not guarded and can be overwritten.

CPU1 will be returning to putback_balloon_page() which code is located at core
mm/compaction.c, outside the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > > What I think you should do is use rcu for access.
> > > > > > > And here sync rcu before freeing.
> > > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > > primitive, and it is very light weight.
> > > > > > > 
> > > > > > 
> > > > > > I liked your suggestion on barriers, as well.
> > > > > > 
> > > > > 
> > > > > I have not thought about this as deeply as I shouold but is simply 
> > > > > rechecking
> > > > > the mapping under the pages_lock to make sure the page is still a 
> > > > > balloon
> > > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > > > 
> > > > To clarify, are you concerned about cost of rcu_read_lock
> > > > for non balloon pages?
> > > > 
> > > 
> > > Not as such, but given the choice between introducing RCU locking and
> > > rechecking page->mapping under a spinlock I would choose the latter as it
> > > is more straight-forward.
> > 
> > OK but checking it how? page->mapping == balloon_mapping does not scale to
> > multiple balloons,
> 
> I was thinking of exactly that page->mapping == balloon_mapping check. As I
> do not know how many active balloon drivers there might be I cannot guess
> in advance how much of a scalability problem it will be.

Not at all sure multiple drivers are worth supporting, but multiple
*devices* is I think worth supporting, if for no other reason than that
they can work today. For that, we need a device pointer which Rafael
wants to put into the mapping, this means multiple balloon mappings.


> > so I hoped we can switch to
> > page->mapping->flags & BALLOON_MAPPING or some such,
> > but this means we dereference it outside the lock ...
> > 
> 
> That also sounded like future stuff to me that would be justified with
> profiling if necessary. Personally I would have started with the spinlock
> and a simple check and moved to RCU later when either scalability was a
> problem or it was found there was a need to stabilise whether a page was
> a balloon page or not outside a spinlock.
> 
> This is not a NAK to the idea and I'm not objecting to RCU being used now
> if that is what is really desired. I just suspect it's making the series
> more complex than it needs to be right now.
> 
> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > What I think you should do is use rcu for access.
> > > > > > And here sync rcu before freeing.
> > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > primitive, and it is very light weight.
> > > > > > 
> > > > > 
> > > > > I liked your suggestion on barriers, as well.
> > > > > 
> > > > 
> > > > I have not thought about this as deeply as I shouold but is simply 
> > > > rechecking
> > > > the mapping under the pages_lock to make sure the page is still a 
> > > > balloon
> > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > > 
> > > To clarify, are you concerned about cost of rcu_read_lock
> > > for non balloon pages?
> > > 
> > 
> > Not as such, but given the choice between introducing RCU locking and
> > rechecking page->mapping under a spinlock I would choose the latter as it
> > is more straight-forward.
> 
> OK but checking it how? page->mapping == balloon_mapping does not scale to
> multiple balloons,

I was thinking of exactly that page->mapping == balloon_mapping check. As I
do not know how many active balloon drivers there might be I cannot guess
in advance how much of a scalability problem it will be.

> so I hoped we can switch to
> page->mapping->flags & BALLOON_MAPPING or some such,
> but this means we dereference it outside the lock ...
> 

That also sounded like future stuff to me that would be justified with
profiling if necessary. Personally I would have started with the spinlock
and a simple check and moved to RCU later when either scalability was a
problem or it was found there was a need to stabilise whether a page was
a balloon page or not outside a spinlock.

This is not a NAK to the idea and I'm not objecting to RCU being used now
if that is what is really desired. I just suspect it's making the series
more complex than it needs to be right now.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > What I think you should do is use rcu for access.
> > > > > And here sync rcu before freeing.
> > > > > Maybe an overkill but at least a documented synchronization
> > > > > primitive, and it is very light weight.
> > > > > 
> > > > 
> > > > I liked your suggestion on barriers, as well.
> > > > 
> > > 
> > > I have not thought about this as deeply as I shouold but is simply 
> > > rechecking
> > > the mapping under the pages_lock to make sure the page is still a balloon
> > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > 
> > To clarify, are you concerned about cost of rcu_read_lock
> > for non balloon pages?
> > 
> 
> Not as such, but given the choice between introducing RCU locking and
> rechecking page->mapping under a spinlock I would choose the latter as it
> is more straight-forward.

OK but checking it how? page->mapping == balloon_mapping does not scale to
multiple balloons, so I hoped we can switch to
page->mapping->flags & BALLOON_MAPPING or some such,
but this means we dereference it outside the lock ...

We will also need to add some API to set/clear mapping so that driver
does not need to poke in mm internals, but that's easy.

> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > What I think you should do is use rcu for access.
> > > > And here sync rcu before freeing.
> > > > Maybe an overkill but at least a documented synchronization
> > > > primitive, and it is very light weight.
> > > > 
> > > 
> > > I liked your suggestion on barriers, as well.
> > > 
> > 
> > I have not thought about this as deeply as I shouold but is simply 
> > rechecking
> > the mapping under the pages_lock to make sure the page is still a balloon
> > page an option? i.e. use pages_lock to stabilise page->mapping.
> 
> To clarify, are you concerned about cost of rcu_read_lock
> for non balloon pages?
> 

Not as such, but given the choice between introducing RCU locking and
rechecking page->mapping under a spinlock I would choose the latter as it
is more straight-forward.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > What I think you should do is use rcu for access.
> > > And here sync rcu before freeing.
> > > Maybe an overkill but at least a documented synchronization
> > > primitive, and it is very light weight.
> > > 
> > 
> > I liked your suggestion on barriers, as well.
> > 
> 
> I have not thought about this as deeply as I shouold but is simply rechecking
> the mapping under the pages_lock to make sure the page is still a balloon
> page an option? i.e. use pages_lock to stabilise page->mapping.

To clarify, are you concerned about cost of rcu_read_lock
for non balloon pages?

> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > What I think you should do is use rcu for access.
> > And here sync rcu before freeing.
> > Maybe an overkill but at least a documented synchronization
> > primitive, and it is very light weight.
> > 
> 
> I liked your suggestion on barriers, as well.
> 

I have not thought about this as deeply as I shouold but is simply rechecking
the mapping under the pages_lock to make sure the page is still a balloon
page an option? i.e. use pages_lock to stabilise page->mapping.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
  What I think you should do is use rcu for access.
  And here sync rcu before freeing.
  Maybe an overkill but at least a documented synchronization
  primitive, and it is very light weight.
  
 
 I liked your suggestion on barriers, as well.
 

I have not thought about this as deeply as I shouold but is simply rechecking
the mapping under the pages_lock to make sure the page is still a balloon
page an option? i.e. use pages_lock to stabilise page-mapping.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
 On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
   What I think you should do is use rcu for access.
   And here sync rcu before freeing.
   Maybe an overkill but at least a documented synchronization
   primitive, and it is very light weight.
   
  
  I liked your suggestion on barriers, as well.
  
 
 I have not thought about this as deeply as I shouold but is simply rechecking
 the mapping under the pages_lock to make sure the page is still a balloon
 page an option? i.e. use pages_lock to stabilise page-mapping.

To clarify, are you concerned about cost of rcu_read_lock
for non balloon pages?

 -- 
 Mel Gorman
 SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
 On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
  On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
   On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
What I think you should do is use rcu for access.
And here sync rcu before freeing.
Maybe an overkill but at least a documented synchronization
primitive, and it is very light weight.

   
   I liked your suggestion on barriers, as well.
   
  
  I have not thought about this as deeply as I shouold but is simply 
  rechecking
  the mapping under the pages_lock to make sure the page is still a balloon
  page an option? i.e. use pages_lock to stabilise page-mapping.
 
 To clarify, are you concerned about cost of rcu_read_lock
 for non balloon pages?
 

Not as such, but given the choice between introducing RCU locking and
rechecking page-mapping under a spinlock I would choose the latter as it
is more straight-forward.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
 On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
  On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
   On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
 What I think you should do is use rcu for access.
 And here sync rcu before freeing.
 Maybe an overkill but at least a documented synchronization
 primitive, and it is very light weight.
 

I liked your suggestion on barriers, as well.

   
   I have not thought about this as deeply as I shouold but is simply 
   rechecking
   the mapping under the pages_lock to make sure the page is still a balloon
   page an option? i.e. use pages_lock to stabilise page-mapping.
  
  To clarify, are you concerned about cost of rcu_read_lock
  for non balloon pages?
  
 
 Not as such, but given the choice between introducing RCU locking and
 rechecking page-mapping under a spinlock I would choose the latter as it
 is more straight-forward.

OK but checking it how? page-mapping == balloon_mapping does not scale to
multiple balloons, so I hoped we can switch to
page-mapping-flags  BALLOON_MAPPING or some such,
but this means we dereference it outside the lock ...

We will also need to add some API to set/clear mapping so that driver
does not need to poke in mm internals, but that's easy.

 -- 
 Mel Gorman
 SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman
On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
 On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
  On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
   On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
  What I think you should do is use rcu for access.
  And here sync rcu before freeing.
  Maybe an overkill but at least a documented synchronization
  primitive, and it is very light weight.
  
 
 I liked your suggestion on barriers, as well.
 

I have not thought about this as deeply as I shouold but is simply 
rechecking
the mapping under the pages_lock to make sure the page is still a 
balloon
page an option? i.e. use pages_lock to stabilise page-mapping.
   
   To clarify, are you concerned about cost of rcu_read_lock
   for non balloon pages?
   
  
  Not as such, but given the choice between introducing RCU locking and
  rechecking page-mapping under a spinlock I would choose the latter as it
  is more straight-forward.
 
 OK but checking it how? page-mapping == balloon_mapping does not scale to
 multiple balloons,

I was thinking of exactly that page-mapping == balloon_mapping check. As I
do not know how many active balloon drivers there might be I cannot guess
in advance how much of a scalability problem it will be.

 so I hoped we can switch to
 page-mapping-flags  BALLOON_MAPPING or some such,
 but this means we dereference it outside the lock ...
 

That also sounded like future stuff to me that would be justified with
profiling if necessary. Personally I would have started with the spinlock
and a simple check and moved to RCU later when either scalability was a
problem or it was found there was a need to stabilise whether a page was
a balloon page or not outside a spinlock.

This is not a NAK to the idea and I'm not objecting to RCU being used now
if that is what is really desired. I just suspect it's making the series
more complex than it needs to be right now.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
 On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
  On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
   On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
 On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
   What I think you should do is use rcu for access.
   And here sync rcu before freeing.
   Maybe an overkill but at least a documented synchronization
   primitive, and it is very light weight.
   
  
  I liked your suggestion on barriers, as well.
  
 
 I have not thought about this as deeply as I shouold but is simply 
 rechecking
 the mapping under the pages_lock to make sure the page is still a 
 balloon
 page an option? i.e. use pages_lock to stabilise page-mapping.

To clarify, are you concerned about cost of rcu_read_lock
for non balloon pages?

   
   Not as such, but given the choice between introducing RCU locking and
   rechecking page-mapping under a spinlock I would choose the latter as it
   is more straight-forward.
  
  OK but checking it how? page-mapping == balloon_mapping does not scale to
  multiple balloons,
 
 I was thinking of exactly that page-mapping == balloon_mapping check. As I
 do not know how many active balloon drivers there might be I cannot guess
 in advance how much of a scalability problem it will be.

Not at all sure multiple drivers are worth supporting, but multiple
*devices* is I think worth supporting, if for no other reason than that
they can work today. For that, we need a device pointer which Rafael
wants to put into the mapping, this means multiple balloon mappings.


  so I hoped we can switch to
  page-mapping-flags  BALLOON_MAPPING or some such,
  but this means we dereference it outside the lock ...
  
 
 That also sounded like future stuff to me that would be justified with
 profiling if necessary. Personally I would have started with the spinlock
 and a simple check and moved to RCU later when either scalability was a
 problem or it was found there was a need to stabilise whether a page was
 a balloon page or not outside a spinlock.
 
 This is not a NAK to the idea and I'm not objecting to RCU being used now
 if that is what is really desired. I just suspect it's making the series
 more complex than it needs to be right now.
 
 -- 
 Mel Gorman
 SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
   now CPU1 executes the next instruction:
   
   }
   
   which would normally return to function's caller,
   but it has been overwritten by CPU2 so we get corruption.
   
   No?
  
  At the point CPU2 is unloading the module, it will be kept looping at the
  snippet Rusty pointed out because the isolation / migration steps do not 
  mess
  with 'vb-num_pages'. The driver will only unload after leaking the total 
  amount
  of balloon's inflated pages, which means (for this hypothetical case) CPU2 
  will
  wait until CPU1 finishes the putaback procedure.
  
 
 Yes but only until unlock finishes. The last return from function
 is not guarded and can be overwritten.

CPU1 will be returning to putback_balloon_page() which code is located at core
mm/compaction.c, outside the driver.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin
On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
now CPU1 executes the next instruction:

}

which would normally return to function's caller,
but it has been overwritten by CPU2 so we get corruption.

No?
   
   At the point CPU2 is unloading the module, it will be kept looping at the
   snippet Rusty pointed out because the isolation / migration steps do not 
   mess
   with 'vb-num_pages'. The driver will only unload after leaking the total 
   amount
   of balloon's inflated pages, which means (for this hypothetical case) 
   CPU2 will
   wait until CPU1 finishes the putaback procedure.
   
  
  Yes but only until unlock finishes. The last return from function
  is not guarded and can be overwritten.
 
 CPU1 will be returning to putback_balloon_page() which code is located at core
 mm/compaction.c, outside the driver.

Sorry, I don't seem to be able to articulate this clearly.
But this is a correctness issue so I am compelled to try again.

Here is some pseudo code:

int pages_lock;

void virtballoon_isolatepage(void *page, unsigned long mode)
{
   pages_lock = 0;
}

assignment of 0 emulates spin unlock.
I removed all other content. Now look at disassembly:

080483d0 virtballoon_isolatepage:
virtballoon_isolatepage():
 80483d0:   c7 05 88 96 04 08 00movl   $0x0,0x8049688
 80483d7:   00 00 00 

--- Above is spin unlock

 80483da:   c3  ret


Here we are still executing module code (one instruction of it!)
after we have dropped the lock.


So if module goes away at the point marked by 
above (and nothing seems to prevent that,
since pages_lock is unlocked), the last instruction can get overwritten
and then random code will get executed instead.

In the end the rule is simple: you can not
prevent module unloading from within module
itself. It always must be the caller of your
module that uses some lock to do this.

My proposal is to use rcu for this since it is lightweight
and also does not require us to pass extra
state around.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Rik van Riel

On 08/14/2012 05:38 PM, Michael S. Tsirkin wrote:


And even ignoring that, global pointer to a device
is an ugly hack and ugly hacks tend to explode.

And even ignoring estetics, and if we decide we are fine
with a single balloon, it needs to fail gracefully not
crash like it does now.


Fair enough.  That certainly seems easy enough to fix.

Each balloon driver can have its own struct address_space,
and simply point mapping-host (or any of the others) at
a global balloon thing somewhere.

if (page-mapping  page-mapping-host == balloon_magic)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rusty Russell
On Tue, 14 Aug 2012 11:33:20 +0300, "Michael S. Tsirkin"  
wrote:
> On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > +/*
> > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction 
> > > > on
> > > > + * re-inserting an isolated page into the balloon page list.
> > > > + */
> > > > +void virtballoon_putbackpage(struct page *page)
> > > > +{
> > > > +   spin_lock(_lock);
> > > > +   list_add(>lru, _ptr->pages);
> > > > +   spin_unlock(_lock);
> > > 
> > > Could the following race trigger:
> > > migration happens while module unloading is in progress,
> > > module goes away between here and when the function
> > > returns, then code for this function gets overwritten?
> > > If yes we need locking external to module to prevent this.
> > > Maybe add a spinlock to struct address_space?
> > 
> > The balloon module cannot be unloaded until it has leaked all its pages,
> > so I think this is safe:
> > 
> > static void remove_common(struct virtio_balloon *vb)
> > {
> > /* There might be pages left in the balloon: free them. */
> > while (vb->num_pages)
> > leak_balloon(vb, vb->num_pages);
> > 
> > Cheers,
> > Rusty.
> 
> I know I meant something else.
> Let me lay this out:
> 
> CPU1 executes:
> void virtballoon_putbackpage(struct page *page)
> {
>   spin_lock(_lock);
>   list_add(>lru, _ptr->pages);
>   spin_unlock(_lock);
> 
> 
>   at this point CPU2 unloads module:
>   leak_balloon
>   ..
> 
>   next CPU2 loads another module so code memory gets overwritten
> 
> now CPU1 executes the next instruction:
> 
> }
> 
> which would normally return to function's caller,
> but it has been overwritten by CPU2 so we get corruption.

Actually, I have no idea.

Where does virtballoon_putbackpage get called from?  It's some weird mm
thing, and I stay out of that mess.

The vb thread is stopped before we spin checking vb->num_pages, so it's
not touching pages; who would be calling this?

Confused,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 06:34:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > What if there is more than one balloon device?
> > > > > > > > 
> > > > > > > > Is it possible to load this driver twice, or are you foreseeing 
> > > > > > > > a future case
> > > > > > > > where this driver will be able to manage several distinct 
> > > > > > > > memory balloons for
> > > > > > > > the same guest?
> > > > > > > > 
> > > > > > > 
> > > > > > > Second.
> > > > > > > It is easy to create several balloons they are just
> > > > > > > pci devices.
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > and it might not be too important to make it work but
> > > > > > at least would be nice not to have a crash in this
> > > > > > setup.
> > > > > >
> > > > > Fair enough. For now, as I believe it's safe to assume we are only 
> > > > > inflating one
> > > > > balloon per guest, I'd like to propose this as a future enhancement. 
> > > > > Sounds
> > > > > good?
> > > > >  
> > > > 
> > > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > > >
> > > Ok, but right now this driver only takes care of 1 balloon per guest,
> > 
> > It does? Are you sure? There is no global state as far as I can see. So
> > I can create 2 devices and driver will happily create two instances,
> > each one can be inflated/deflated independently.
> > 
> > > so how
> > > could this approach crash it? 
> > 
> > Add device. inflate. Add another device. inflate. deflate. unplug.
> > Now you have pointer to freed memory and when mm touches
> > page from first device, you ge use after free.
> > 
> > > Your point is a good thing to be on a to-do list for future enhancements, 
> > > but
> > > it's not a dealbreaker for the present balloon driver implementation, 
> > > IMHO.
> > > 
> > 
> > Yes it looks like a dealbreaker to me.
> 
> Sorry. You're right, I'm wrong.
> 
> I'll get back to the scracthpad to overcome this constraint. I believe the way
> this patch was at its v4 revision (wrt this particular case) could possibly
> address this concern of yours.

Almost. We still have a global balloon_mapping. The only reason for
it to exist seems solely to detect balloon mappings, so it
can just be replaced by a flag in the mapping, or in mapping
ops, or elsewhere. Also, please add APIs to mm so we can
avoid doing internal mm stuff like

INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);

in the driver. It should be
alloc_address_mapping(_balloon_aops);
free_address_mapping

Make page->mapping use rcu, and sync rcu in
free_address_mapping.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 04:56:59PM -0400, Rik van Riel wrote:
> On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:
> 
> >To clarify, the global state that this patch adds, is ugly
> >even if we didn't support multiple balloons yet.
> >So I don't think I can accept such a patch.
> >Rusty has a final word here, maybe he thinks differently.
> 
> Before deciding that "does not support multiple balloon drivers
> at once" is an issue, is there any use case at all for having
> multiple balloon drivers active at a time?
> 
> I do not see any.

For example, we had a proposal for a page-cache backed
device. So it could be useful to have two, a regular balloon
and a pagecache backed one.
There could be other uses - it certainly looks like it
works so how can you be sure it's useless?

And even ignoring that, global pointer to a device
is an ugly hack and ugly hacks tend to explode.

And even ignoring estetics, and if we decide we are fine
with a single balloon, it needs to fail gracefully not
crash like it does now.

> -- 
> All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > What if there is more than one balloon device?
> > > > > > > 
> > > > > > > Is it possible to load this driver twice, or are you foreseeing a 
> > > > > > > future case
> > > > > > > where this driver will be able to manage several distinct memory 
> > > > > > > balloons for
> > > > > > > the same guest?
> > > > > > > 
> > > > > > 
> > > > > > Second.
> > > > > > It is easy to create several balloons they are just
> > > > > > pci devices.
> > > > >  
> > > > > 
> > > > > 
> > > > > and it might not be too important to make it work but
> > > > > at least would be nice not to have a crash in this
> > > > > setup.
> > > > >
> > > > Fair enough. For now, as I believe it's safe to assume we are only 
> > > > inflating one
> > > > balloon per guest, I'd like to propose this as a future enhancement. 
> > > > Sounds
> > > > good?
> > > >  
> > > 
> > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > >
> > Ok, but right now this driver only takes care of 1 balloon per guest,
> 
> It does? Are you sure? There is no global state as far as I can see. So
> I can create 2 devices and driver will happily create two instances,
> each one can be inflated/deflated independently.
> 
> > so how
> > could this approach crash it? 
> 
> Add device. inflate. Add another device. inflate. deflate. unplug.
> Now you have pointer to freed memory and when mm touches
> page from first device, you ge use after free.
> 
> > Your point is a good thing to be on a to-do list for future enhancements, 
> > but
> > it's not a dealbreaker for the present balloon driver implementation, IMHO.
> > 
> 
> Yes it looks like a dealbreaker to me.

Sorry. You're right, I'm wrong.

I'll get back to the scracthpad to overcome this constraint. I believe the way
this patch was at its v4 revision (wrt this particular case) could possibly
address this concern of yours.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rik van Riel

On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:


To clarify, the global state that this patch adds, is ugly
even if we didn't support multiple balloons yet.
So I don't think I can accept such a patch.
Rusty has a final word here, maybe he thinks differently.


Before deciding that "does not support multiple balloon drivers
at once" is an issue, is there any use case at all for having
multiple balloon drivers active at a time?

I do not see any.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > What if there is more than one balloon device?
> > > > > > > 
> > > > > > > Is it possible to load this driver twice, or are you foreseeing a 
> > > > > > > future case
> > > > > > > where this driver will be able to manage several distinct memory 
> > > > > > > balloons for
> > > > > > > the same guest?
> > > > > > > 
> > > > > > 
> > > > > > Second.
> > > > > > It is easy to create several balloons they are just
> > > > > > pci devices.
> > > > >  
> > > > > 
> > > > > 
> > > > > and it might not be too important to make it work but
> > > > > at least would be nice not to have a crash in this
> > > > > setup.
> > > > >
> > > > Fair enough. For now, as I believe it's safe to assume we are only 
> > > > inflating one
> > > > balloon per guest, I'd like to propose this as a future enhancement. 
> > > > Sounds
> > > > good?
> > > >  
> > > 
> > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > >
> > Ok, but right now this driver only takes care of 1 balloon per guest,
> 
> It does? Are you sure? There is no global state as far as I can see. So
> I can create 2 devices and driver will happily create two instances,
> each one can be inflated/deflated independently.
> 
> > so how
> > could this approach crash it? 
> 
> Add device. inflate. Add another device. inflate. deflate. unplug.
> Now you have pointer to freed memory and when mm touches
> page from first device, you ge use after free.
> 
> > Your point is a good thing to be on a to-do list for future enhancements, 
> > but
> > it's not a dealbreaker for the present balloon driver implementation, IMHO.
> > 
> 
> Yes it looks like a dealbreaker to me.

To clarify, the global state that this patch adds, is ugly
even if we didn't support multiple balloons yet.
So I don't think I can accept such a patch.
Rusty has a final word here, maybe he thinks differently.

> -- 
> MST 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > What if there is more than one balloon device?
> > > > > > 
> > > > > > Is it possible to load this driver twice, or are you foreseeing a 
> > > > > > future case
> > > > > > where this driver will be able to manage several distinct memory 
> > > > > > balloons for
> > > > > > the same guest?
> > > > > > 
> > > > > 
> > > > > Second.
> > > > > It is easy to create several balloons they are just
> > > > > pci devices.
> > > >  
> > > > 
> > > > 
> > > > and it might not be too important to make it work but
> > > > at least would be nice not to have a crash in this
> > > > setup.
> > > >
> > > Fair enough. For now, as I believe it's safe to assume we are only 
> > > inflating one
> > > balloon per guest, I'd like to propose this as a future enhancement. 
> > > Sounds
> > > good?
> > >  
> > 
> > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> >
> Ok, but right now this driver only takes care of 1 balloon per guest,

It does? Are you sure? There is no global state as far as I can see. So
I can create 2 devices and driver will happily create two instances,
each one can be inflated/deflated independently.

> so how
> could this approach crash it? 

Add device. inflate. Add another device. inflate. deflate. unplug.
Now you have pointer to freed memory and when mm touches
page from first device, you ge use after free.

> Your point is a good thing to be on a to-do list for future enhancements, but
> it's not a dealbreaker for the present balloon driver implementation, IMHO.
> 

Yes it looks like a dealbreaker to me.

-- 
MST 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > What I think you should do is use rcu for access.
> > And here sync rcu before freeing.
> > Maybe an overkill but at least a documented synchronization
> > primitive, and it is very light weight.
> > 
> 
> I liked your suggestion on barriers, as well.
> 
> Rik, Mel ?

Further instead of simple assignment I would add an api
in mm to set/clear the balloon mapping, with proper locking.

This could fail if already set, and thus fix crash with
many ballons.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > What if there is more than one balloon device?
> > > > > 
> > > > > Is it possible to load this driver twice, or are you foreseeing a 
> > > > > future case
> > > > > where this driver will be able to manage several distinct memory 
> > > > > balloons for
> > > > > the same guest?
> > > > > 
> > > > 
> > > > Second.
> > > > It is easy to create several balloons they are just
> > > > pci devices.
> > >  
> > > 
> > > 
> > > and it might not be too important to make it work but
> > > at least would be nice not to have a crash in this
> > > setup.
> > >
> > Fair enough. For now, as I believe it's safe to assume we are only 
> > inflating one
> > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > good?
> >  
> 
> Since guest crashes when it's not the case, no it doesn't, sorry :(.
>
Ok, but right now this driver only takes care of 1 balloon per guest, so how
could this approach crash it? 

Your point is a good thing to be on a to-do list for future enhancements, but
it's not a dealbreaker for the present balloon driver implementation, IMHO.
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > What if there is more than one balloon device?
> > > > 
> > > > Is it possible to load this driver twice, or are you foreseeing a 
> > > > future case
> > > > where this driver will be able to manage several distinct memory 
> > > > balloons for
> > > > the same guest?
> > > > 
> > > 
> > > Second.
> > > It is easy to create several balloons they are just
> > > pci devices.
> >  
> > 
> > 
> > and it might not be too important to make it work but
> > at least would be nice not to have a crash in this
> > setup.
> >
> Fair enough. For now, as I believe it's safe to assume we are only inflating 
> one
> balloon per guest, I'd like to propose this as a future enhancement. Sounds
> good?
>  

Since guest crashes when it's not the case, no it doesn't, sorry :(.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> What I think you should do is use rcu for access.
> And here sync rcu before freeing.
> Maybe an overkill but at least a documented synchronization
> primitive, and it is very light weight.
> 

I liked your suggestion on barriers, as well.

Rik, Mel ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > What if there is more than one balloon device?
> > > 
> > > Is it possible to load this driver twice, or are you foreseeing a future 
> > > case
> > > where this driver will be able to manage several distinct memory balloons 
> > > for
> > > the same guest?
> > > 
> > 
> > Second.
> > It is easy to create several balloons they are just
> > pci devices.
>  
> 
> 
> and it might not be too important to make it work but
> at least would be nice not to have a crash in this
> setup.
>
Fair enough. For now, as I believe it's safe to assume we are only inflating one
balloon per guest, I'd like to propose this as a future enhancement. Sounds
good?
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
> > On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon 
> > > > *vb, size_t num)
> > > > set_page_pfns(vb->pfns + vb->num_pfns, page);
> > > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > > totalram_pages--;
> > > > +   spin_lock(_lock);
> > > > list_add(>lru, >pages);
> > > 
> > > If list_add above is reordered with mapping assignment below,
> > > then nothing bad happens because balloon_mapping takes
> > > pages_lock.
> > > 
> > > > +   page->mapping = balloon_mapping;
> > > > +   spin_unlock(_lock);
> > > > }
> > > >  
> > > > /* Didn't get any?  Oh well. */
> > > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
> > > > size_t num)
> > > > return;
> > > >  
> > > > tell_host(vb, vb->inflate_vq);
> > > > +   mutex_unlock(_lock);
> > > >  }
> > > >  
> > > >  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon 
> > > > *vb, size_t num)
> > > > /* We can only do one array worth at a time. */
> > > > num = min(num, ARRAY_SIZE(vb->pfns));
> > > >  
> > > > +   mutex_lock(_lock);
> > > > for (vb->num_pfns = 0; vb->num_pfns < num;
> > > >  vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > > > +   /*
> > > > +* We can race against virtballoon_isolatepage() and 
> > > > end up
> > > > +* stumbling across a _temporarily_ empty 'pages' list.
> > > > +*/
> > > > +   spin_lock(_lock);
> > > > +   if (unlikely(list_empty(>pages))) {
> > > > +   spin_unlock(_lock);
> > > > +   break;
> > > > +   }
> > > > page = list_first_entry(>pages, struct page, lru);
> > > > +   page->mapping = NULL;
> > > 
> > > Unlike the case above, here
> > > if = NULL write above is reordered with list_del below,
> > > then isolate_page can run on a page that is not
> > > on lru.
> > > 
> > > So I think this needs a wmb().
> > > And maybe a comment above explaining why it is safe?
> > 
> > Good point. Presumably, this nit has potential to explain your guessing on 
> > the
> > read barrier stuff at movable_balloon_page() on the other patch.
> > 
> 
> 
> What I think you should do is use rcu for access.
> And here sync rcu before freeing.
> Maybe an overkill but at least a documented synchronization
> primitive, and it is very light weight.
> 
> > > > list_del(>lru);
> > > 
> > > I wonder why changing page->lru here is safe against
> > > races with unmap_and_move in the previous patch.
> > 
> > leak_balloon() doesn't race against unmap_and_move() because the later 
> > works on
> > an isolated page set. So, theoretically, pages being dequeued from balloon 
> > page
> > list here are either migrated (already) or they were not isolated yet.
> > 
> 
> So add a comment explaining why it is safe pls.
> 
> > > > +/*
> > > > + * '*vb_ptr' allows virtballoon_migratepage() & 
> > > > virtballoon_putbackpage() to
> > > > + * access pertinent elements from struct virtio_balloon
> > > > + */
> > > 
> > > What if there is more than one balloon device?
> > 
> > Is it possible to load this driver twice, or are you foreseeing a future 
> > case
> > where this driver will be able to manage several distinct memory balloons 
> > for
> > the same guest?
> > 
> 
> Second.
> It is easy to create several balloons they are just
> pci devices.
 


and it might not be too important to make it work but
at least would be nice not to have a crash in this
setup.


> > > > +   /* Init the ballooned page->mapping special balloon_mapping */
> > > > +   balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > > > +   if (!balloon_mapping) {
> > > > +   err = -ENOMEM;
> > > > +   goto out_free_vb;
> > > > +   }
> > > 
> > > Can balloon_mapping be dereferenced at this point?
> > > Then what happens?
> > 
> > Since there's no balloon page enqueued for this balloon driver yet, there's 
> > no
> > chance of balloon_mapping being dereferenced at this point.
> > 
> >  
> > > > +
> > > > +   INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | 
> > > > __GFP_NOWARN);
> > > > +   INIT_LIST_HEAD(_mapping->i_mmap_nonlinear);
> > > > +   spin_lock_init(_mapping->tree_lock);
> > > > +   balloon_mapping->a_ops = _balloon_aops;
> > > >  
> > > > err = init_vqs(vb);
> > > > if (err)
> > > > @@ -373,6 +493,7 @@ out_del_vqs:
> > > > vdev->config->del_vqs(vdev);
> > > >  out_free_vb:
> > > > 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
> On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
> > > size_t num)
> > >   set_page_pfns(vb->pfns + vb->num_pfns, page);
> > >   vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > >   totalram_pages--;
> > > + spin_lock(_lock);
> > >   list_add(>lru, >pages);
> > 
> > If list_add above is reordered with mapping assignment below,
> > then nothing bad happens because balloon_mapping takes
> > pages_lock.
> > 
> > > + page->mapping = balloon_mapping;
> > > + spin_unlock(_lock);
> > >   }
> > >  
> > >   /* Didn't get any?  Oh well. */
> > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
> > > size_t num)
> > >   return;
> > >  
> > >   tell_host(vb, vb->inflate_vq);
> > > + mutex_unlock(_lock);
> > >  }
> > >  
> > >  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
> > > size_t num)
> > >   /* We can only do one array worth at a time. */
> > >   num = min(num, ARRAY_SIZE(vb->pfns));
> > >  
> > > + mutex_lock(_lock);
> > >   for (vb->num_pfns = 0; vb->num_pfns < num;
> > >vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > > + /*
> > > +  * We can race against virtballoon_isolatepage() and end up
> > > +  * stumbling across a _temporarily_ empty 'pages' list.
> > > +  */
> > > + spin_lock(_lock);
> > > + if (unlikely(list_empty(>pages))) {
> > > + spin_unlock(_lock);
> > > + break;
> > > + }
> > >   page = list_first_entry(>pages, struct page, lru);
> > > + page->mapping = NULL;
> > 
> > Unlike the case above, here
> > if = NULL write above is reordered with list_del below,
> > then isolate_page can run on a page that is not
> > on lru.
> > 
> > So I think this needs a wmb().
> > And maybe a comment above explaining why it is safe?
> 
> Good point. Presumably, this nit has potential to explain your guessing on the
> read barrier stuff at movable_balloon_page() on the other patch.
> 


What I think you should do is use rcu for access.
And here sync rcu before freeing.
Maybe an overkill but at least a documented synchronization
primitive, and it is very light weight.

> > >   list_del(>lru);
> > 
> > I wonder why changing page->lru here is safe against
> > races with unmap_and_move in the previous patch.
> 
> leak_balloon() doesn't race against unmap_and_move() because the later works 
> on
> an isolated page set. So, theoretically, pages being dequeued from balloon 
> page
> list here are either migrated (already) or they were not isolated yet.
> 

So add a comment explaining why it is safe pls.

> > > +/*
> > > + * '*vb_ptr' allows virtballoon_migratepage() & 
> > > virtballoon_putbackpage() to
> > > + * access pertinent elements from struct virtio_balloon
> > > + */
> > 
> > What if there is more than one balloon device?
> 
> Is it possible to load this driver twice, or are you foreseeing a future case
> where this driver will be able to manage several distinct memory balloons for
> the same guest?
> 

Second.
It is easy to create several balloons they are just
pci devices.

> > > + /* Init the ballooned page->mapping special balloon_mapping */
> > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > > + if (!balloon_mapping) {
> > > + err = -ENOMEM;
> > > + goto out_free_vb;
> > > + }
> > 
> > Can balloon_mapping be dereferenced at this point?
> > Then what happens?
> 
> Since there's no balloon page enqueued for this balloon driver yet, there's no
> chance of balloon_mapping being dereferenced at this point.
> 
>  
> > > +
> > > + INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> > > + INIT_LIST_HEAD(_mapping->i_mmap_nonlinear);
> > > + spin_lock_init(_mapping->tree_lock);
> > > + balloon_mapping->a_ops = _balloon_aops;
> > >  
> > >   err = init_vqs(vb);
> > >   if (err)
> > > @@ -373,6 +493,7 @@ out_del_vqs:
> > >   vdev->config->del_vqs(vdev);
> > >  out_free_vb:
> > >   kfree(vb);
> > > + kfree(balloon_mapping);
> > 
> > No need to set it to NULL? It seems if someone else allocates a mapping
> > and gets this chunk of memory by chance, the logic in mm will get
> > confused.
> 
> Good point. It surely doesn't hurt be asured of this sort of safety.
> 
> > >  out:
> > >   return err;
> > >  }
> > > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct 
> > > virtio_device *vdev)
> > >   kthread_stop(vb->thread);
> > >   remove_common(vb);
> > >   kfree(vb);
> > > + kfree(balloon_mapping);
> > 
> > Neither here?
> 
> ditto.
> 
>  
> > >  }
> > >  
> > >  #ifdef CONFIG_PM
> > > diff --git a/include/linux/virtio_balloon.h 
> > > b/include/linux/virtio_balloon.h
> > > index 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 03:44:09PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" 
> > >  wrote:
> > > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > > +/*
> > > > > + * Populate balloon_mapping->a_ops->freepage method to help 
> > > > > compaction on
> > > > > + * re-inserting an isolated page into the balloon page list.
> > > > > + */
> > > > > +void virtballoon_putbackpage(struct page *page)
> > > > > +{
> > > > > + spin_lock(_lock);
> > > > > + list_add(>lru, _ptr->pages);
> > > > > + spin_unlock(_lock);
> > > > 
> > > > Could the following race trigger:
> > > > migration happens while module unloading is in progress,
> > > > module goes away between here and when the function
> > > > returns, then code for this function gets overwritten?
> > > > If yes we need locking external to module to prevent this.
> > > > Maybe add a spinlock to struct address_space?
> > > 
> > > The balloon module cannot be unloaded until it has leaked all its pages,
> > > so I think this is safe:
> > > 
> > > static void remove_common(struct virtio_balloon *vb)
> > > {
> > >   /* There might be pages left in the balloon: free them. */
> > >   while (vb->num_pages)
> > >   leak_balloon(vb, vb->num_pages);
> > > 
> > > Cheers,
> > > Rusty.
> > 
> > I know I meant something else.
> > Let me lay this out:
> > 
> > CPU1 executes:
> > void virtballoon_putbackpage(struct page *page)
> > {
> > spin_lock(_lock);
> > list_add(>lru, _ptr->pages);
> > spin_unlock(_lock);
> > 
> > 
> > at this point CPU2 unloads module:
> > leak_balloon
> > ..
> > 
> > next CPU2 loads another module so code memory gets overwritten
> > 
> > now CPU1 executes the next instruction:
> > 
> > }
> > 
> > which would normally return to function's caller,
> > but it has been overwritten by CPU2 so we get corruption.
> > 
> > No?
> 
> At the point CPU2 is unloading the module, it will be kept looping at the
> snippet Rusty pointed out because the isolation / migration steps do not mess
> with 'vb->num_pages'. The driver will only unload after leaking the total 
> amount
> of balloon's inflated pages, which means (for this hypothetical case) CPU2 
> will
> wait until CPU1 finishes the putaback procedure.
> 

Yes but only until unlock finishes. The last return from function
is not guarded and can be overwritten.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > +/*
> > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction 
> > > > on
> > > > + * re-inserting an isolated page into the balloon page list.
> > > > + */
> > > > +void virtballoon_putbackpage(struct page *page)
> > > > +{
> > > > +   spin_lock(_lock);
> > > > +   list_add(>lru, _ptr->pages);
> > > > +   spin_unlock(_lock);
> > > 
> > > Could the following race trigger:
> > > migration happens while module unloading is in progress,
> > > module goes away between here and when the function
> > > returns, then code for this function gets overwritten?
> > > If yes we need locking external to module to prevent this.
> > > Maybe add a spinlock to struct address_space?
> > 
> > The balloon module cannot be unloaded until it has leaked all its pages,
> > so I think this is safe:
> > 
> > static void remove_common(struct virtio_balloon *vb)
> > {
> > /* There might be pages left in the balloon: free them. */
> > while (vb->num_pages)
> > leak_balloon(vb, vb->num_pages);
> > 
> > Cheers,
> > Rusty.
> 
> I know I meant something else.
> Let me lay this out:
> 
> CPU1 executes:
> void virtballoon_putbackpage(struct page *page)
> {
>   spin_lock(_lock);
>   list_add(>lru, _ptr->pages);
>   spin_unlock(_lock);
> 
> 
>   at this point CPU2 unloads module:
>   leak_balloon
>   ..
> 
>   next CPU2 loads another module so code memory gets overwritten
> 
> now CPU1 executes the next instruction:
> 
> }
> 
> which would normally return to function's caller,
> but it has been overwritten by CPU2 so we get corruption.
> 
> No?

At the point CPU2 is unloading the module, it will be kept looping at the
snippet Rusty pointed out because the isolation / migration steps do not mess
with 'vb->num_pages'. The driver will only unload after leaking the total amount
of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
wait until CPU1 finishes the putaback procedure.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
> > size_t num)
> > set_page_pfns(vb->pfns + vb->num_pfns, page);
> > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > totalram_pages--;
> > +   spin_lock(_lock);
> > list_add(>lru, >pages);
> 
> If list_add above is reordered with mapping assignment below,
> then nothing bad happens because balloon_mapping takes
> pages_lock.
> 
> > +   page->mapping = balloon_mapping;
> > +   spin_unlock(_lock);
> > }
> >  
> > /* Didn't get any?  Oh well. */
> > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
> > size_t num)
> > return;
> >  
> > tell_host(vb, vb->inflate_vq);
> > +   mutex_unlock(_lock);
> >  }
> >  
> >  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
> > size_t num)
> > /* We can only do one array worth at a time. */
> > num = min(num, ARRAY_SIZE(vb->pfns));
> >  
> > +   mutex_lock(_lock);
> > for (vb->num_pfns = 0; vb->num_pfns < num;
> >  vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > +   /*
> > +* We can race against virtballoon_isolatepage() and end up
> > +* stumbling across a _temporarily_ empty 'pages' list.
> > +*/
> > +   spin_lock(_lock);
> > +   if (unlikely(list_empty(>pages))) {
> > +   spin_unlock(_lock);
> > +   break;
> > +   }
> > page = list_first_entry(>pages, struct page, lru);
> > +   page->mapping = NULL;
> 
> Unlike the case above, here
> if = NULL write above is reordered with list_del below,
> then isolate_page can run on a page that is not
> on lru.
> 
> So I think this needs a wmb().
> And maybe a comment above explaining why it is safe?

Good point. Presumably, this nit has potential to explain your guessing on the
read barrier stuff at movable_balloon_page() on the other patch.

 
> > list_del(>lru);
> 
> I wonder why changing page->lru here is safe against
> races with unmap_and_move in the previous patch.

leak_balloon() doesn't race against unmap_and_move() because the later works on
an isolated page set. So, theoretically, pages being dequeued from balloon page
list here are either migrated (already) or they were not isolated yet.


> > +/*
> > + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() 
> > to
> > + * access pertinent elements from struct virtio_balloon
> > + */
> 
> What if there is more than one balloon device?

Is it possible to load this driver twice, or are you foreseeing a future case
where this driver will be able to manage several distinct memory balloons for
the same guest?


> > +   /* Init the ballooned page->mapping special balloon_mapping */
> > +   balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > +   if (!balloon_mapping) {
> > +   err = -ENOMEM;
> > +   goto out_free_vb;
> > +   }
> 
> Can balloon_mapping be dereferenced at this point?
> Then what happens?

Since there's no balloon page enqueued for this balloon driver yet, there's no
chance of balloon_mapping being dereferenced at this point.

 
> > +
> > +   INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> > +   INIT_LIST_HEAD(_mapping->i_mmap_nonlinear);
> > +   spin_lock_init(_mapping->tree_lock);
> > +   balloon_mapping->a_ops = _balloon_aops;
> >  
> > err = init_vqs(vb);
> > if (err)
> > @@ -373,6 +493,7 @@ out_del_vqs:
> > vdev->config->del_vqs(vdev);
> >  out_free_vb:
> > kfree(vb);
> > +   kfree(balloon_mapping);
> 
> No need to set it to NULL? It seems if someone else allocates a mapping
> and gets this chunk of memory by chance, the logic in mm will get
> confused.

Good point. It surely doesn't hurt be asured of this sort of safety.

> >  out:
> > return err;
> >  }
> > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct 
> > virtio_device *vdev)
> > kthread_stop(vb->thread);
> > remove_common(vb);
> > kfree(vb);
> > +   kfree(balloon_mapping);
> 
> Neither here?

ditto.

 
> >  }
> >  
> >  #ifdef CONFIG_PM
> > diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> > index 652dc8b..930f1b7 100644
> > --- a/include/linux/virtio_balloon.h
> > +++ b/include/linux/virtio_balloon.h
> > @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
> > u64 val;
> >  } __attribute__((packed));
> >  
> > +#if !defined(CONFIG_COMPACTION)
> > +struct address_space *balloon_mapping;
> > +#endif
> > +
> 
> Anyone including this header will get a different copy of
> balloon_mapping. Besides, need to be ifdef KERNEL.

Good point. I'll move this hunk to the balloon driver itself.


--
To unsubscribe from this list: send the line 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > +/*
> > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > > + * re-inserting an isolated page into the balloon page list.
> > > + */
> > > +void virtballoon_putbackpage(struct page *page)
> > > +{
> > > + spin_lock(_lock);
> > > + list_add(>lru, _ptr->pages);
> > > + spin_unlock(_lock);
> > 
> > Could the following race trigger:
> > migration happens while module unloading is in progress,
> > module goes away between here and when the function
> > returns, then code for this function gets overwritten?
> > If yes we need locking external to module to prevent this.
> > Maybe add a spinlock to struct address_space?
> 
> The balloon module cannot be unloaded until it has leaked all its pages,
> so I think this is safe:
> 
> static void remove_common(struct virtio_balloon *vb)
> {
>   /* There might be pages left in the balloon: free them. */
>   while (vb->num_pages)
>   leak_balloon(vb, vb->num_pages);
> 
> Cheers,
> Rusty.

I know I meant something else.
Let me lay this out:

CPU1 executes:
void virtballoon_putbackpage(struct page *page)
{
spin_lock(_lock);
list_add(>lru, _ptr->pages);
spin_unlock(_lock);


at this point CPU2 unloads module:
leak_balloon
..

next CPU2 loads another module so code memory gets overwritten

now CPU1 executes the next instruction:

}

which would normally return to function's caller,
but it has been overwritten by CPU2 so we get corruption.

No?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
 On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
   +/*
   + * Populate balloon_mapping-a_ops-freepage method to help compaction on
   + * re-inserting an isolated page into the balloon page list.
   + */
   +void virtballoon_putbackpage(struct page *page)
   +{
   + spin_lock(pages_lock);
   + list_add(page-lru, vb_ptr-pages);
   + spin_unlock(pages_lock);
  
  Could the following race trigger:
  migration happens while module unloading is in progress,
  module goes away between here and when the function
  returns, then code for this function gets overwritten?
  If yes we need locking external to module to prevent this.
  Maybe add a spinlock to struct address_space?
 
 The balloon module cannot be unloaded until it has leaked all its pages,
 so I think this is safe:
 
 static void remove_common(struct virtio_balloon *vb)
 {
   /* There might be pages left in the balloon: free them. */
   while (vb-num_pages)
   leak_balloon(vb, vb-num_pages);
 
 Cheers,
 Rusty.

I know I meant something else.
Let me lay this out:

CPU1 executes:
void virtballoon_putbackpage(struct page *page)
{
spin_lock(pages_lock);
list_add(page-lru, vb_ptr-pages);
spin_unlock(pages_lock);


at this point CPU2 unloads module:
leak_balloon
..

next CPU2 loads another module so code memory gets overwritten

now CPU1 executes the next instruction:

}

which would normally return to function's caller,
but it has been overwritten by CPU2 so we get corruption.

No?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
  @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
  size_t num)
  set_page_pfns(vb-pfns + vb-num_pfns, page);
  vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
  totalram_pages--;
  +   spin_lock(pages_lock);
  list_add(page-lru, vb-pages);
 
 If list_add above is reordered with mapping assignment below,
 then nothing bad happens because balloon_mapping takes
 pages_lock.
 
  +   page-mapping = balloon_mapping;
  +   spin_unlock(pages_lock);
  }
   
  /* Didn't get any?  Oh well. */
  @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
  size_t num)
  return;
   
  tell_host(vb, vb-inflate_vq);
  +   mutex_unlock(balloon_lock);
   }
   
   static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
  @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
  size_t num)
  /* We can only do one array worth at a time. */
  num = min(num, ARRAY_SIZE(vb-pfns));
   
  +   mutex_lock(balloon_lock);
  for (vb-num_pfns = 0; vb-num_pfns  num;
   vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
  +   /*
  +* We can race against virtballoon_isolatepage() and end up
  +* stumbling across a _temporarily_ empty 'pages' list.
  +*/
  +   spin_lock(pages_lock);
  +   if (unlikely(list_empty(vb-pages))) {
  +   spin_unlock(pages_lock);
  +   break;
  +   }
  page = list_first_entry(vb-pages, struct page, lru);
  +   page-mapping = NULL;
 
 Unlike the case above, here
 if = NULL write above is reordered with list_del below,
 then isolate_page can run on a page that is not
 on lru.
 
 So I think this needs a wmb().
 And maybe a comment above explaining why it is safe?

Good point. Presumably, this nit has potential to explain your guessing on the
read barrier stuff at movable_balloon_page() on the other patch.

 
  list_del(page-lru);
 
 I wonder why changing page-lru here is safe against
 races with unmap_and_move in the previous patch.

leak_balloon() doesn't race against unmap_and_move() because the later works on
an isolated page set. So, theoretically, pages being dequeued from balloon page
list here are either migrated (already) or they were not isolated yet.


  +/*
  + * '*vb_ptr' allows virtballoon_migratepage()  virtballoon_putbackpage() 
  to
  + * access pertinent elements from struct virtio_balloon
  + */
 
 What if there is more than one balloon device?

Is it possible to load this driver twice, or are you foreseeing a future case
where this driver will be able to manage several distinct memory balloons for
the same guest?


  +   /* Init the ballooned page-mapping special balloon_mapping */
  +   balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
  +   if (!balloon_mapping) {
  +   err = -ENOMEM;
  +   goto out_free_vb;
  +   }
 
 Can balloon_mapping be dereferenced at this point?
 Then what happens?

Since there's no balloon page enqueued for this balloon driver yet, there's no
chance of balloon_mapping being dereferenced at this point.

 
  +
  +   INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN);
  +   INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear);
  +   spin_lock_init(balloon_mapping-tree_lock);
  +   balloon_mapping-a_ops = virtio_balloon_aops;
   
  err = init_vqs(vb);
  if (err)
  @@ -373,6 +493,7 @@ out_del_vqs:
  vdev-config-del_vqs(vdev);
   out_free_vb:
  kfree(vb);
  +   kfree(balloon_mapping);
 
 No need to set it to NULL? It seems if someone else allocates a mapping
 and gets this chunk of memory by chance, the logic in mm will get
 confused.

Good point. It surely doesn't hurt be asured of this sort of safety.

   out:
  return err;
   }
  @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct 
  virtio_device *vdev)
  kthread_stop(vb-thread);
  remove_common(vb);
  kfree(vb);
  +   kfree(balloon_mapping);
 
 Neither here?

ditto.

 
   }
   
   #ifdef CONFIG_PM
  diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
  index 652dc8b..930f1b7 100644
  --- a/include/linux/virtio_balloon.h
  +++ b/include/linux/virtio_balloon.h
  @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
  u64 val;
   } __attribute__((packed));
   
  +#if !defined(CONFIG_COMPACTION)
  +struct address_space *balloon_mapping;
  +#endif
  +
 
 Anyone including this header will get a different copy of
 balloon_mapping. Besides, need to be ifdef KERNEL.

Good point. I'll move this hunk to the balloon driver itself.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
  On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
+/*
+ * Populate balloon_mapping-a_ops-freepage method to help compaction 
on
+ * re-inserting an isolated page into the balloon page list.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+   spin_lock(pages_lock);
+   list_add(page-lru, vb_ptr-pages);
+   spin_unlock(pages_lock);
   
   Could the following race trigger:
   migration happens while module unloading is in progress,
   module goes away between here and when the function
   returns, then code for this function gets overwritten?
   If yes we need locking external to module to prevent this.
   Maybe add a spinlock to struct address_space?
  
  The balloon module cannot be unloaded until it has leaked all its pages,
  so I think this is safe:
  
  static void remove_common(struct virtio_balloon *vb)
  {
  /* There might be pages left in the balloon: free them. */
  while (vb-num_pages)
  leak_balloon(vb, vb-num_pages);
  
  Cheers,
  Rusty.
 
 I know I meant something else.
 Let me lay this out:
 
 CPU1 executes:
 void virtballoon_putbackpage(struct page *page)
 {
   spin_lock(pages_lock);
   list_add(page-lru, vb_ptr-pages);
   spin_unlock(pages_lock);
 
 
   at this point CPU2 unloads module:
   leak_balloon
   ..
 
   next CPU2 loads another module so code memory gets overwritten
 
 now CPU1 executes the next instruction:
 
 }
 
 which would normally return to function's caller,
 but it has been overwritten by CPU2 so we get corruption.
 
 No?

At the point CPU2 is unloading the module, it will be kept looping at the
snippet Rusty pointed out because the isolation / migration steps do not mess
with 'vb-num_pages'. The driver will only unload after leaking the total amount
of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
wait until CPU1 finishes the putaback procedure.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 03:44:09PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
   On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin 
   m...@redhat.com wrote:
On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
 +/*
 + * Populate balloon_mapping-a_ops-freepage method to help 
 compaction on
 + * re-inserting an isolated page into the balloon page list.
 + */
 +void virtballoon_putbackpage(struct page *page)
 +{
 + spin_lock(pages_lock);
 + list_add(page-lru, vb_ptr-pages);
 + spin_unlock(pages_lock);

Could the following race trigger:
migration happens while module unloading is in progress,
module goes away between here and when the function
returns, then code for this function gets overwritten?
If yes we need locking external to module to prevent this.
Maybe add a spinlock to struct address_space?
   
   The balloon module cannot be unloaded until it has leaked all its pages,
   so I think this is safe:
   
   static void remove_common(struct virtio_balloon *vb)
   {
 /* There might be pages left in the balloon: free them. */
 while (vb-num_pages)
 leak_balloon(vb, vb-num_pages);
   
   Cheers,
   Rusty.
  
  I know I meant something else.
  Let me lay this out:
  
  CPU1 executes:
  void virtballoon_putbackpage(struct page *page)
  {
  spin_lock(pages_lock);
  list_add(page-lru, vb_ptr-pages);
  spin_unlock(pages_lock);
  
  
  at this point CPU2 unloads module:
  leak_balloon
  ..
  
  next CPU2 loads another module so code memory gets overwritten
  
  now CPU1 executes the next instruction:
  
  }
  
  which would normally return to function's caller,
  but it has been overwritten by CPU2 so we get corruption.
  
  No?
 
 At the point CPU2 is unloading the module, it will be kept looping at the
 snippet Rusty pointed out because the isolation / migration steps do not mess
 with 'vb-num_pages'. The driver will only unload after leaking the total 
 amount
 of balloon's inflated pages, which means (for this hypothetical case) CPU2 
 will
 wait until CPU1 finishes the putaback procedure.
 

Yes but only until unlock finishes. The last return from function
is not guarded and can be overwritten.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
 On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
   @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
   size_t num)
 set_page_pfns(vb-pfns + vb-num_pfns, page);
 vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 totalram_pages--;
   + spin_lock(pages_lock);
 list_add(page-lru, vb-pages);
  
  If list_add above is reordered with mapping assignment below,
  then nothing bad happens because balloon_mapping takes
  pages_lock.
  
   + page-mapping = balloon_mapping;
   + spin_unlock(pages_lock);
 }

 /* Didn't get any?  Oh well. */
   @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
   size_t num)
 return;

 tell_host(vb, vb-inflate_vq);
   + mutex_unlock(balloon_lock);
}

static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
   @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
   size_t num)
 /* We can only do one array worth at a time. */
 num = min(num, ARRAY_SIZE(vb-pfns));

   + mutex_lock(balloon_lock);
 for (vb-num_pfns = 0; vb-num_pfns  num;
  vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
   + /*
   +  * We can race against virtballoon_isolatepage() and end up
   +  * stumbling across a _temporarily_ empty 'pages' list.
   +  */
   + spin_lock(pages_lock);
   + if (unlikely(list_empty(vb-pages))) {
   + spin_unlock(pages_lock);
   + break;
   + }
 page = list_first_entry(vb-pages, struct page, lru);
   + page-mapping = NULL;
  
  Unlike the case above, here
  if = NULL write above is reordered with list_del below,
  then isolate_page can run on a page that is not
  on lru.
  
  So I think this needs a wmb().
  And maybe a comment above explaining why it is safe?
 
 Good point. Presumably, this nit has potential to explain your guessing on the
 read barrier stuff at movable_balloon_page() on the other patch.
 


What I think you should do is use rcu for access.
And here sync rcu before freeing.
Maybe an overkill but at least a documented synchronization
primitive, and it is very light weight.

 list_del(page-lru);
  
  I wonder why changing page-lru here is safe against
  races with unmap_and_move in the previous patch.
 
 leak_balloon() doesn't race against unmap_and_move() because the later works 
 on
 an isolated page set. So, theoretically, pages being dequeued from balloon 
 page
 list here are either migrated (already) or they were not isolated yet.
 

So add a comment explaining why it is safe pls.

   +/*
   + * '*vb_ptr' allows virtballoon_migratepage()  
   virtballoon_putbackpage() to
   + * access pertinent elements from struct virtio_balloon
   + */
  
  What if there is more than one balloon device?
 
 Is it possible to load this driver twice, or are you foreseeing a future case
 where this driver will be able to manage several distinct memory balloons for
 the same guest?
 

Second.
It is easy to create several balloons they are just
pci devices.

   + /* Init the ballooned page-mapping special balloon_mapping */
   + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
   + if (!balloon_mapping) {
   + err = -ENOMEM;
   + goto out_free_vb;
   + }
  
  Can balloon_mapping be dereferenced at this point?
  Then what happens?
 
 Since there's no balloon page enqueued for this balloon driver yet, there's no
 chance of balloon_mapping being dereferenced at this point.
 
  
   +
   + INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN);
   + INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear);
   + spin_lock_init(balloon_mapping-tree_lock);
   + balloon_mapping-a_ops = virtio_balloon_aops;

 err = init_vqs(vb);
 if (err)
   @@ -373,6 +493,7 @@ out_del_vqs:
 vdev-config-del_vqs(vdev);
out_free_vb:
 kfree(vb);
   + kfree(balloon_mapping);
  
  No need to set it to NULL? It seems if someone else allocates a mapping
  and gets this chunk of memory by chance, the logic in mm will get
  confused.
 
 Good point. It surely doesn't hurt be asured of this sort of safety.
 
out:
 return err;
}
   @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct 
   virtio_device *vdev)
 kthread_stop(vb-thread);
 remove_common(vb);
 kfree(vb);
   + kfree(balloon_mapping);
  
  Neither here?
 
 ditto.
 
  
}

#ifdef CONFIG_PM
   diff --git a/include/linux/virtio_balloon.h 
   b/include/linux/virtio_balloon.h
   index 652dc8b..930f1b7 100644
   --- a/include/linux/virtio_balloon.h
   +++ b/include/linux/virtio_balloon.h
   @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
 u64 val;
} __attribute__((packed));

   +#if !defined(CONFIG_COMPACTION)
   +struct address_space 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
  On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon 
*vb, size_t num)
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(pages_lock);
list_add(page-lru, vb-pages);
   
   If list_add above is reordered with mapping assignment below,
   then nothing bad happens because balloon_mapping takes
   pages_lock.
   
+   page-mapping = balloon_mapping;
+   spin_unlock(pages_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
return;
 
tell_host(vb, vb-inflate_vq);
+   mutex_unlock(balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon 
*vb, size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and 
end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(pages_lock);
+   if (unlikely(list_empty(vb-pages))) {
+   spin_unlock(pages_lock);
+   break;
+   }
page = list_first_entry(vb-pages, struct page, lru);
+   page-mapping = NULL;
   
   Unlike the case above, here
   if = NULL write above is reordered with list_del below,
   then isolate_page can run on a page that is not
   on lru.
   
   So I think this needs a wmb().
   And maybe a comment above explaining why it is safe?
  
  Good point. Presumably, this nit has potential to explain your guessing on 
  the
  read barrier stuff at movable_balloon_page() on the other patch.
  
 
 
 What I think you should do is use rcu for access.
 And here sync rcu before freeing.
 Maybe an overkill but at least a documented synchronization
 primitive, and it is very light weight.
 
list_del(page-lru);
   
   I wonder why changing page-lru here is safe against
   races with unmap_and_move in the previous patch.
  
  leak_balloon() doesn't race against unmap_and_move() because the later 
  works on
  an isolated page set. So, theoretically, pages being dequeued from balloon 
  page
  list here are either migrated (already) or they were not isolated yet.
  
 
 So add a comment explaining why it is safe pls.
 
+/*
+ * '*vb_ptr' allows virtballoon_migratepage()  
virtballoon_putbackpage() to
+ * access pertinent elements from struct virtio_balloon
+ */
   
   What if there is more than one balloon device?
  
  Is it possible to load this driver twice, or are you foreseeing a future 
  case
  where this driver will be able to manage several distinct memory balloons 
  for
  the same guest?
  
 
 Second.
 It is easy to create several balloons they are just
 pci devices.
 


and it might not be too important to make it work but
at least would be nice not to have a crash in this
setup.


+   /* Init the ballooned page-mapping special balloon_mapping */
+   balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
+   if (!balloon_mapping) {
+   err = -ENOMEM;
+   goto out_free_vb;
+   }
   
   Can balloon_mapping be dereferenced at this point?
   Then what happens?
  
  Since there's no balloon page enqueued for this balloon driver yet, there's 
  no
  chance of balloon_mapping being dereferenced at this point.
  
   
+
+   INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | 
__GFP_NOWARN);
+   INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear);
+   spin_lock_init(balloon_mapping-tree_lock);
+   balloon_mapping-a_ops = virtio_balloon_aops;
 
err = init_vqs(vb);
if (err)
@@ -373,6 +493,7 @@ out_del_vqs:
vdev-config-del_vqs(vdev);
 out_free_vb:
kfree(vb);
+   kfree(balloon_mapping);
   
   No need to set it to NULL? It seems if someone else allocates a mapping
   and gets this chunk of memory by chance, the logic in mm will get
   confused.
  
  Good point. It surely doesn't hurt be asured of this sort of safety.
  
 out:
return err;
 }
@@ 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
What if there is more than one balloon device?
   
   Is it possible to load this driver twice, or are you foreseeing a future 
   case
   where this driver will be able to manage several distinct memory balloons 
   for
   the same guest?
   
  
  Second.
  It is easy to create several balloons they are just
  pci devices.
  
 
 
 and it might not be too important to make it work but
 at least would be nice not to have a crash in this
 setup.

Fair enough. For now, as I believe it's safe to assume we are only inflating one
balloon per guest, I'd like to propose this as a future enhancement. Sounds
good?
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
 What I think you should do is use rcu for access.
 And here sync rcu before freeing.
 Maybe an overkill but at least a documented synchronization
 primitive, and it is very light weight.
 

I liked your suggestion on barriers, as well.

Rik, Mel ?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
 What if there is more than one balloon device?

Is it possible to load this driver twice, or are you foreseeing a 
future case
where this driver will be able to manage several distinct memory 
balloons for
the same guest?

   
   Second.
   It is easy to create several balloons they are just
   pci devices.
   
  
  
  and it might not be too important to make it work but
  at least would be nice not to have a crash in this
  setup.
 
 Fair enough. For now, as I believe it's safe to assume we are only inflating 
 one
 balloon per guest, I'd like to propose this as a future enhancement. Sounds
 good?
  

Since guest crashes when it's not the case, no it doesn't, sorry :(.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
  What if there is more than one balloon device?
 
 Is it possible to load this driver twice, or are you foreseeing a 
 future case
 where this driver will be able to manage several distinct memory 
 balloons for
 the same guest?
 

Second.
It is easy to create several balloons they are just
pci devices.

   
   
   and it might not be too important to make it work but
   at least would be nice not to have a crash in this
   setup.
  
  Fair enough. For now, as I believe it's safe to assume we are only 
  inflating one
  balloon per guest, I'd like to propose this as a future enhancement. Sounds
  good?
   
 
 Since guest crashes when it's not the case, no it doesn't, sorry :(.

Ok, but right now this driver only takes care of 1 balloon per guest, so how
could this approach crash it? 

Your point is a good thing to be on a to-do list for future enhancements, but
it's not a dealbreaker for the present balloon driver implementation, IMHO.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
  What I think you should do is use rcu for access.
  And here sync rcu before freeing.
  Maybe an overkill but at least a documented synchronization
  primitive, and it is very light weight.
  
 
 I liked your suggestion on barriers, as well.
 
 Rik, Mel ?

Further instead of simple assignment I would add an api
in mm to set/clear the balloon mapping, with proper locking.

This could fail if already set, and thus fix crash with
many ballons.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
   On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
   What if there is more than one balloon device?
  
  Is it possible to load this driver twice, or are you foreseeing a 
  future case
  where this driver will be able to manage several distinct memory 
  balloons for
  the same guest?
  
 
 Second.
 It is easy to create several balloons they are just
 pci devices.
 


and it might not be too important to make it work but
at least would be nice not to have a crash in this
setup.
   
   Fair enough. For now, as I believe it's safe to assume we are only 
   inflating one
   balloon per guest, I'd like to propose this as a future enhancement. 
   Sounds
   good?

  
  Since guest crashes when it's not the case, no it doesn't, sorry :(.
 
 Ok, but right now this driver only takes care of 1 balloon per guest,

It does? Are you sure? There is no global state as far as I can see. So
I can create 2 devices and driver will happily create two instances,
each one can be inflated/deflated independently.

 so how
 could this approach crash it? 

Add device. inflate. Add another device. inflate. deflate. unplug.
Now you have pointer to freed memory and when mm touches
page from first device, you ge use after free.

 Your point is a good thing to be on a to-do list for future enhancements, but
 it's not a dealbreaker for the present balloon driver implementation, IMHO.
 

Yes it looks like a dealbreaker to me.

-- 
MST 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
   On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
What if there is more than one balloon device?
   
   Is it possible to load this driver twice, or are you foreseeing a 
   future case
   where this driver will be able to manage several distinct memory 
   balloons for
   the same guest?
   
  
  Second.
  It is easy to create several balloons they are just
  pci devices.
  
 
 
 and it might not be too important to make it work but
 at least would be nice not to have a crash in this
 setup.

Fair enough. For now, as I believe it's safe to assume we are only 
inflating one
balloon per guest, I'd like to propose this as a future enhancement. 
Sounds
good?
 
   
   Since guest crashes when it's not the case, no it doesn't, sorry :(.
  
  Ok, but right now this driver only takes care of 1 balloon per guest,
 
 It does? Are you sure? There is no global state as far as I can see. So
 I can create 2 devices and driver will happily create two instances,
 each one can be inflated/deflated independently.
 
  so how
  could this approach crash it? 
 
 Add device. inflate. Add another device. inflate. deflate. unplug.
 Now you have pointer to freed memory and when mm touches
 page from first device, you ge use after free.
 
  Your point is a good thing to be on a to-do list for future enhancements, 
  but
  it's not a dealbreaker for the present balloon driver implementation, IMHO.
  
 
 Yes it looks like a dealbreaker to me.

To clarify, the global state that this patch adds, is ugly
even if we didn't support multiple balloons yet.
So I don't think I can accept such a patch.
Rusty has a final word here, maybe he thinks differently.

 -- 
 MST 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rik van Riel

On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:


To clarify, the global state that this patch adds, is ugly
even if we didn't support multiple balloons yet.
So I don't think I can accept such a patch.
Rusty has a final word here, maybe he thinks differently.


Before deciding that does not support multiple balloon drivers
at once is an issue, is there any use case at all for having
multiple balloon drivers active at a time?

I do not see any.

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rafael Aquini
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
  On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
   On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
What if there is more than one balloon device?
   
   Is it possible to load this driver twice, or are you foreseeing a 
   future case
   where this driver will be able to manage several distinct memory 
   balloons for
   the same guest?
   
  
  Second.
  It is easy to create several balloons they are just
  pci devices.
  
 
 
 and it might not be too important to make it work but
 at least would be nice not to have a crash in this
 setup.

Fair enough. For now, as I believe it's safe to assume we are only 
inflating one
balloon per guest, I'd like to propose this as a future enhancement. 
Sounds
good?
 
   
   Since guest crashes when it's not the case, no it doesn't, sorry :(.
  
  Ok, but right now this driver only takes care of 1 balloon per guest,
 
 It does? Are you sure? There is no global state as far as I can see. So
 I can create 2 devices and driver will happily create two instances,
 each one can be inflated/deflated independently.
 
  so how
  could this approach crash it? 
 
 Add device. inflate. Add another device. inflate. deflate. unplug.
 Now you have pointer to freed memory and when mm touches
 page from first device, you ge use after free.
 
  Your point is a good thing to be on a to-do list for future enhancements, 
  but
  it's not a dealbreaker for the present balloon driver implementation, IMHO.
  
 
 Yes it looks like a dealbreaker to me.

Sorry. You're right, I'm wrong.

I'll get back to the scracthpad to overcome this constraint. I believe the way
this patch was at its v4 revision (wrt this particular case) could possibly
address this concern of yours.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 04:56:59PM -0400, Rik van Riel wrote:
 On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:
 
 To clarify, the global state that this patch adds, is ugly
 even if we didn't support multiple balloons yet.
 So I don't think I can accept such a patch.
 Rusty has a final word here, maybe he thinks differently.
 
 Before deciding that does not support multiple balloon drivers
 at once is an issue, is there any use case at all for having
 multiple balloon drivers active at a time?
 
 I do not see any.

For example, we had a proposal for a page-cache backed
device. So it could be useful to have two, a regular balloon
and a pagecache backed one.
There could be other uses - it certainly looks like it
works so how can you be sure it's useless?

And even ignoring that, global pointer to a device
is an ugly hack and ugly hacks tend to explode.

And even ignoring estetics, and if we decide we are fine
with a single balloon, it needs to fail gracefully not
crash like it does now.

 -- 
 All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 06:34:13PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
   On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
 On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
 What if there is more than one balloon device?

Is it possible to load this driver twice, or are you foreseeing 
a future case
where this driver will be able to manage several distinct 
memory balloons for
the same guest?

   
   Second.
   It is easy to create several balloons they are just
   pci devices.
   
  
  
  and it might not be too important to make it work but
  at least would be nice not to have a crash in this
  setup.
 
 Fair enough. For now, as I believe it's safe to assume we are only 
 inflating one
 balloon per guest, I'd like to propose this as a future enhancement. 
 Sounds
 good?
  

Since guest crashes when it's not the case, no it doesn't, sorry :(.
   
   Ok, but right now this driver only takes care of 1 balloon per guest,
  
  It does? Are you sure? There is no global state as far as I can see. So
  I can create 2 devices and driver will happily create two instances,
  each one can be inflated/deflated independently.
  
   so how
   could this approach crash it? 
  
  Add device. inflate. Add another device. inflate. deflate. unplug.
  Now you have pointer to freed memory and when mm touches
  page from first device, you ge use after free.
  
   Your point is a good thing to be on a to-do list for future enhancements, 
   but
   it's not a dealbreaker for the present balloon driver implementation, 
   IMHO.
   
  
  Yes it looks like a dealbreaker to me.
 
 Sorry. You're right, I'm wrong.
 
 I'll get back to the scracthpad to overcome this constraint. I believe the way
 this patch was at its v4 revision (wrt this particular case) could possibly
 address this concern of yours.

Almost. We still have a global balloon_mapping. The only reason for
it to exist seems solely to detect balloon mappings, so it
can just be replaced by a flag in the mapping, or in mapping
ops, or elsewhere. Also, please add APIs to mm so we can
avoid doing internal mm stuff like

INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN);

in the driver. It should be
alloc_address_mapping(virtio_balloon_aops);
free_address_mapping

Make page-mapping use rcu, and sync rcu in
free_address_mapping.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rusty Russell
On Tue, 14 Aug 2012 11:33:20 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
  On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
+/*
+ * Populate balloon_mapping-a_ops-freepage method to help compaction 
on
+ * re-inserting an isolated page into the balloon page list.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+   spin_lock(pages_lock);
+   list_add(page-lru, vb_ptr-pages);
+   spin_unlock(pages_lock);
   
   Could the following race trigger:
   migration happens while module unloading is in progress,
   module goes away between here and when the function
   returns, then code for this function gets overwritten?
   If yes we need locking external to module to prevent this.
   Maybe add a spinlock to struct address_space?
  
  The balloon module cannot be unloaded until it has leaked all its pages,
  so I think this is safe:
  
  static void remove_common(struct virtio_balloon *vb)
  {
  /* There might be pages left in the balloon: free them. */
  while (vb-num_pages)
  leak_balloon(vb, vb-num_pages);
  
  Cheers,
  Rusty.
 
 I know I meant something else.
 Let me lay this out:
 
 CPU1 executes:
 void virtballoon_putbackpage(struct page *page)
 {
   spin_lock(pages_lock);
   list_add(page-lru, vb_ptr-pages);
   spin_unlock(pages_lock);
 
 
   at this point CPU2 unloads module:
   leak_balloon
   ..
 
   next CPU2 loads another module so code memory gets overwritten
 
 now CPU1 executes the next instruction:
 
 }
 
 which would normally return to function's caller,
 but it has been overwritten by CPU2 so we get corruption.

Actually, I have no idea.

Where does virtballoon_putbackpage get called from?  It's some weird mm
thing, and I stay out of that mess.

The vb thread is stopped before we spin checking vb-num_pages, so it's
not touching pages; who would be calling this?

Confused,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-13 Thread Rusty Russell
On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin"  
wrote:
> On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > +/*
> > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > + * re-inserting an isolated page into the balloon page list.
> > + */
> > +void virtballoon_putbackpage(struct page *page)
> > +{
> > +   spin_lock(_lock);
> > +   list_add(>lru, _ptr->pages);
> > +   spin_unlock(_lock);
> 
> Could the following race trigger:
> migration happens while module unloading is in progress,
> module goes away between here and when the function
> returns, then code for this function gets overwritten?
> If yes we need locking external to module to prevent this.
> Maybe add a spinlock to struct address_space?

The balloon module cannot be unloaded until it has leaked all its pages,
so I think this is safe:

static void remove_common(struct virtio_balloon *vb)
{
/* There might be pages left in the balloon: free them. */
while (vb->num_pages)
leak_balloon(vb, vb->num_pages);

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-13 Thread Michael S. Tsirkin
On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> Besides making balloon pages movable at allocation time and introducing
> the necessary primitives to perform balloon page migration/compaction,
> this patch also introduces the following locking scheme to provide the
> proper synchronization and protection for struct virtio_balloon elements
> against concurrent accesses due to parallel operations introduced by
> memory compaction / page migration.
>  - balloon_lock (mutex) : synchronizes the access demand to elements of
> struct virtio_balloon and its queue operations;
>  - pages_lock (spinlock): special protection to balloon pages list against
> concurrent list handling operations;
> 
> Signed-off-by: Rafael Aquini 
> ---
>  drivers/virtio/virtio_balloon.c | 138 
> +---
>  include/linux/virtio_balloon.h  |   4 ++
>  2 files changed, 134 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0908e60..7c937a0 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Balloon device works in 4K page units.  So each page is pointed to by
> @@ -35,6 +36,12 @@
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>  
> +/* Synchronizes accesses/updates to the struct virtio_balloon elements */
> +DEFINE_MUTEX(balloon_lock);
> +
> +/* Protects 'virtio_balloon->pages' list against concurrent handling */
> +DEFINE_SPINLOCK(pages_lock);
> +
>  struct virtio_balloon
>  {
>   struct virtio_device *vdev;
> @@ -51,6 +58,7 @@ struct virtio_balloon
>  
>   /* Number of balloon pages we've told the Host we're not using. */
>   unsigned int num_pages;
> +
>   /*
>* The pages we've told the Host we're not using.
>* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
> @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>   /* We can only do one array worth at a time. */
>   num = min(num, ARRAY_SIZE(vb->pfns));
>  
> + mutex_lock(_lock);
>   for (vb->num_pfns = 0; vb->num_pfns < num;
>vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> - __GFP_NOMEMALLOC | __GFP_NOWARN);
> + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
> + __GFP_NORETRY | __GFP_NOWARN |
> + __GFP_NOMEMALLOC);
>   if (!page) {
>   if (printk_ratelimit())
>   dev_printk(KERN_INFO, >vdev->dev,
> @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>   set_page_pfns(vb->pfns + vb->num_pfns, page);
>   vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>   totalram_pages--;
> + spin_lock(_lock);
>   list_add(>lru, >pages);

If list_add above is reordered with mapping assignment below,
then nothing bad happens because balloon_mapping takes
pages_lock.

> + page->mapping = balloon_mapping;
> + spin_unlock(_lock);
>   }
>  
>   /* Didn't get any?  Oh well. */
> @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>   return;
>  
>   tell_host(vb, vb->inflate_vq);
> + mutex_unlock(_lock);
>  }
>  
>  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
> size_t num)
>   /* We can only do one array worth at a time. */
>   num = min(num, ARRAY_SIZE(vb->pfns));
>  
> + mutex_lock(_lock);
>   for (vb->num_pfns = 0; vb->num_pfns < num;
>vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> + /*
> +  * We can race against virtballoon_isolatepage() and end up
> +  * stumbling across a _temporarily_ empty 'pages' list.
> +  */
> + spin_lock(_lock);
> + if (unlikely(list_empty(>pages))) {
> + spin_unlock(_lock);
> + break;
> + }
>   page = list_first_entry(>pages, struct page, lru);
> + page->mapping = NULL;

Unlike the case above, here
if = NULL write above is reordered with list_del below,
then isolate_page can run on a page that is not
on lru.

So I think 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-13 Thread Michael S. Tsirkin
On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
 Memory fragmentation introduced by ballooning might reduce significantly
 the number of 2MB contiguous memory blocks that can be used within a guest,
 thus imposing performance penalties associated with the reduced number of
 transparent huge pages that could be used by the guest workload.
 
 Besides making balloon pages movable at allocation time and introducing
 the necessary primitives to perform balloon page migration/compaction,
 this patch also introduces the following locking scheme to provide the
 proper synchronization and protection for struct virtio_balloon elements
 against concurrent accesses due to parallel operations introduced by
 memory compaction / page migration.
  - balloon_lock (mutex) : synchronizes the access demand to elements of
 struct virtio_balloon and its queue operations;
  - pages_lock (spinlock): special protection to balloon pages list against
 concurrent list handling operations;
 
 Signed-off-by: Rafael Aquini aqu...@redhat.com
 ---
  drivers/virtio/virtio_balloon.c | 138 
 +---
  include/linux/virtio_balloon.h  |   4 ++
  2 files changed, 134 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index 0908e60..7c937a0 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -27,6 +27,7 @@
  #include linux/delay.h
  #include linux/slab.h
  #include linux/module.h
 +#include linux/fs.h
  
  /*
   * Balloon device works in 4K page units.  So each page is pointed to by
 @@ -35,6 +36,12 @@
   */
  #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE  VIRTIO_BALLOON_PFN_SHIFT)
  
 +/* Synchronizes accesses/updates to the struct virtio_balloon elements */
 +DEFINE_MUTEX(balloon_lock);
 +
 +/* Protects 'virtio_balloon-pages' list against concurrent handling */
 +DEFINE_SPINLOCK(pages_lock);
 +
  struct virtio_balloon
  {
   struct virtio_device *vdev;
 @@ -51,6 +58,7 @@ struct virtio_balloon
  
   /* Number of balloon pages we've told the Host we're not using. */
   unsigned int num_pages;
 +
   /*
* The pages we've told the Host we're not using.
* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
 @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   /* We can only do one array worth at a time. */
   num = min(num, ARRAY_SIZE(vb-pfns));
  
 + mutex_lock(balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
 - __GFP_NOMEMALLOC | __GFP_NOWARN);
 + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
 + __GFP_NORETRY | __GFP_NOWARN |
 + __GFP_NOMEMALLOC);
   if (!page) {
   if (printk_ratelimit())
   dev_printk(KERN_INFO, vb-vdev-dev,
 @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   set_page_pfns(vb-pfns + vb-num_pfns, page);
   vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
   totalram_pages--;
 + spin_lock(pages_lock);
   list_add(page-lru, vb-pages);

If list_add above is reordered with mapping assignment below,
then nothing bad happens because balloon_mapping takes
pages_lock.

 + page-mapping = balloon_mapping;
 + spin_unlock(pages_lock);
   }
  
   /* Didn't get any?  Oh well. */
 @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   return;
  
   tell_host(vb, vb-inflate_vq);
 + mutex_unlock(balloon_lock);
  }
  
  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
   /* We can only do one array worth at a time. */
   num = min(num, ARRAY_SIZE(vb-pfns));
  
 + mutex_lock(balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 + /*
 +  * We can race against virtballoon_isolatepage() and end up
 +  * stumbling across a _temporarily_ empty 'pages' list.
 +  */
 + spin_lock(pages_lock);
 + if (unlikely(list_empty(vb-pages))) {
 + spin_unlock(pages_lock);
 + break;
 + }
   page = list_first_entry(vb-pages, struct page, lru);
 + page-mapping = NULL;

Unlike the case above, here
if = NULL write above is reordered with list_del below,
then isolate_page can run on a page that is not
on lru.

So I think this needs a 

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-13 Thread Rusty Russell
On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
  +/*
  + * Populate balloon_mapping-a_ops-freepage method to help compaction on
  + * re-inserting an isolated page into the balloon page list.
  + */
  +void virtballoon_putbackpage(struct page *page)
  +{
  +   spin_lock(pages_lock);
  +   list_add(page-lru, vb_ptr-pages);
  +   spin_unlock(pages_lock);
 
 Could the following race trigger:
 migration happens while module unloading is in progress,
 module goes away between here and when the function
 returns, then code for this function gets overwritten?
 If yes we need locking external to module to prevent this.
 Maybe add a spinlock to struct address_space?

The balloon module cannot be unloaded until it has leaked all its pages,
so I think this is safe:

static void remove_common(struct virtio_balloon *vb)
{
/* There might be pages left in the balloon: free them. */
while (vb-num_pages)
leak_balloon(vb, vb-num_pages);

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-10 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 138 +---
 include/linux/virtio_balloon.h  |   4 ++
 2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -35,6 +36,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 
+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon->pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
 struct virtio_balloon
 {
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
 * The pages we've told the Host we're not using.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+   __GFP_NORETRY | __GFP_NOWARN |
+   __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, >vdev->dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(_lock);
list_add(>lru, >pages);
+   page->mapping = balloon_mapping;
+   spin_unlock(_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb->inflate_vq);
+   mutex_unlock(_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(_lock);
+   if (unlikely(list_empty(>pages))) {
+   spin_unlock(_lock);
+   break;
+   }
page = list_first_entry(>pages, struct page, lru);
+   page->mapping = NULL;
list_del(>lru);
+   spin_unlock(_lock);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to do it in this order
 */
-   tell_host(vb, vb->deflate_vq);
-   

[PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-10 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 138 +---
 include/linux/virtio_balloon.h  |   4 ++
 2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include linux/delay.h
 #include linux/slab.h
 #include linux/module.h
+#include linux/fs.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -35,6 +36,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE  VIRTIO_BALLOON_PFN_SHIFT)
 
+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon-pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
 struct virtio_balloon
 {
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
 * The pages we've told the Host we're not using.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+   __GFP_NORETRY | __GFP_NOWARN |
+   __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, vb-vdev-dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(pages_lock);
list_add(page-lru, vb-pages);
+   page-mapping = balloon_mapping;
+   spin_unlock(pages_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb-inflate_vq);
+   mutex_unlock(balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(pages_lock);
+   if (unlikely(list_empty(vb-pages))) {
+   spin_unlock(pages_lock);
+   break;
+   }
page = list_first_entry(vb-pages, struct page, lru);
+   page-mapping = NULL;
list_del(page-lru);
+   spin_unlock(pages_lock);
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);