Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 21, 2012 at 03:01:37PM +0930, Rusty Russell wrote: > On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin" > wrote: > > On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: > > > I was thinking of exactly that page->mapping == balloon_mapping check. As > > > I > > > do not know how many active balloon drivers there might be I cannot guess > > > in advance how much of a scalability problem it will be. > > > > Not at all sure multiple drivers are worth supporting, but multiple > > *devices* is I think worth supporting, if for no other reason than that > > they can work today. For that, we need a device pointer which Rafael > > wants to put into the mapping, this means multiple balloon mappings. > > Rafael, please make sure that the balloon driver fails on the second and > subsequent balloon devices. > > Michael, we only allow multiple balloon devices because it fell out of > the implementation. If it causes us even the slightest issue, we should > not support it. It's not a sensible setup. > > Cheers, > Rusty. Looks like latest revision does it using a flag which seems cleaner, so I think the point is moot. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin" wrote: > On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: > > I was thinking of exactly that page->mapping == balloon_mapping check. As I > > do not know how many active balloon drivers there might be I cannot guess > > in advance how much of a scalability problem it will be. > > Not at all sure multiple drivers are worth supporting, but multiple > *devices* is I think worth supporting, if for no other reason than that > they can work today. For that, we need a device pointer which Rafael > wants to put into the mapping, this means multiple balloon mappings. Rafael, please make sure that the balloon driver fails on the second and subsequent balloon devices. Michael, we only allow multiple balloon devices because it fell out of the implementation. If it causes us even the slightest issue, we should not support it. It's not a sensible setup. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, 15 Aug 2012 14:28:51 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: I was thinking of exactly that page-mapping == balloon_mapping check. As I do not know how many active balloon drivers there might be I cannot guess in advance how much of a scalability problem it will be. Not at all sure multiple drivers are worth supporting, but multiple *devices* is I think worth supporting, if for no other reason than that they can work today. For that, we need a device pointer which Rafael wants to put into the mapping, this means multiple balloon mappings. Rafael, please make sure that the balloon driver fails on the second and subsequent balloon devices. Michael, we only allow multiple balloon devices because it fell out of the implementation. If it causes us even the slightest issue, we should not support it. It's not a sensible setup. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 21, 2012 at 03:01:37PM +0930, Rusty Russell wrote: On Wed, 15 Aug 2012 14:28:51 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: I was thinking of exactly that page-mapping == balloon_mapping check. As I do not know how many active balloon drivers there might be I cannot guess in advance how much of a scalability problem it will be. Not at all sure multiple drivers are worth supporting, but multiple *devices* is I think worth supporting, if for no other reason than that they can work today. For that, we need a device pointer which Rafael wants to put into the mapping, this means multiple balloon mappings. Rafael, please make sure that the balloon driver fails on the second and subsequent balloon devices. Michael, we only allow multiple balloon devices because it fell out of the implementation. If it causes us even the slightest issue, we should not support it. It's not a sensible setup. Cheers, Rusty. Looks like latest revision does it using a flag which seems cleaner, so I think the point is moot. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, Aug 20, 2012 at 11:59:11AM +0930, Rusty Russell wrote: > On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin" > wrote: > > On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: > > > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: > > > > > > now CPU1 executes the next instruction: > > > > > > > > > > > > } > > > > > > > > > > > > which would normally return to function's caller, > > > > > > but it has been overwritten by CPU2 so we get corruption. > > > > > > > > > > > > No? > > > > > > > > > > At the point CPU2 is unloading the module, it will be kept looping at > > > > > the > > > > > snippet Rusty pointed out because the isolation / migration steps do > > > > > not mess > > > > > with 'vb->num_pages'. The driver will only unload after leaking the > > > > > total amount > > > > > of balloon's inflated pages, which means (for this hypothetical case) > > > > > CPU2 will > > > > > wait until CPU1 finishes the putaback procedure. > > > > > > > > > > > > > Yes but only until unlock finishes. The last return from function > > > > is not guarded and can be overwritten. > > > > > > CPU1 will be returning to putback_balloon_page() which code is located at > > > core > > > mm/compaction.c, outside the driver. > > > > Sorry, I don't seem to be able to articulate this clearly. > > But this is a correctness issue so I am compelled to try again. > > But if there are 0 balloon pages, how is it migrating a page? It could be we just finished migrating a page dropped page lock and are 1 instruction away from returning from callback. > > In the end the rule is simple: you can not > > prevent module unloading from within module > > itself. It always must be the caller of your > > module that uses some lock to do this. > > Not quite. If you clean up everything in your cleanup function, it also > works, No, we also need a way to make sure we returned to caller, this is missing here. > which is what this does, right? > > Cheers, > Rusty. This makes sure callback was invoked but not that it returned to caller. All will be well if callbacks are done in rcu critical section and we synchronise it before unload. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin" wrote: > On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: > > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: > > > > > now CPU1 executes the next instruction: > > > > > > > > > > } > > > > > > > > > > which would normally return to function's caller, > > > > > but it has been overwritten by CPU2 so we get corruption. > > > > > > > > > > No? > > > > > > > > At the point CPU2 is unloading the module, it will be kept looping at > > > > the > > > > snippet Rusty pointed out because the isolation / migration steps do > > > > not mess > > > > with 'vb->num_pages'. The driver will only unload after leaking the > > > > total amount > > > > of balloon's inflated pages, which means (for this hypothetical case) > > > > CPU2 will > > > > wait until CPU1 finishes the putaback procedure. > > > > > > > > > > Yes but only until unlock finishes. The last return from function > > > is not guarded and can be overwritten. > > > > CPU1 will be returning to putback_balloon_page() which code is located at > > core > > mm/compaction.c, outside the driver. > > Sorry, I don't seem to be able to articulate this clearly. > But this is a correctness issue so I am compelled to try again. But if there are 0 balloon pages, how is it migrating a page? > In the end the rule is simple: you can not > prevent module unloading from within module > itself. It always must be the caller of your > module that uses some lock to do this. Not quite. If you clean up everything in your cleanup function, it also works, which is what this does, right? Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, 15 Aug 2012 17:40:19 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. CPU1 will be returning to putback_balloon_page() which code is located at core mm/compaction.c, outside the driver. Sorry, I don't seem to be able to articulate this clearly. But this is a correctness issue so I am compelled to try again. But if there are 0 balloon pages, how is it migrating a page? In the end the rule is simple: you can not prevent module unloading from within module itself. It always must be the caller of your module that uses some lock to do this. Not quite. If you clean up everything in your cleanup function, it also works, which is what this does, right? Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, Aug 20, 2012 at 11:59:11AM +0930, Rusty Russell wrote: On Wed, 15 Aug 2012 17:40:19 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. CPU1 will be returning to putback_balloon_page() which code is located at core mm/compaction.c, outside the driver. Sorry, I don't seem to be able to articulate this clearly. But this is a correctness issue so I am compelled to try again. But if there are 0 balloon pages, how is it migrating a page? It could be we just finished migrating a page dropped page lock and are 1 instruction away from returning from callback. In the end the rule is simple: you can not prevent module unloading from within module itself. It always must be the caller of your module that uses some lock to do this. Not quite. If you clean up everything in your cleanup function, it also works, No, we also need a way to make sure we returned to caller, this is missing here. which is what this does, right? Cheers, Rusty. This makes sure callback was invoked but not that it returned to caller. All will be well if callbacks are done in rcu critical section and we synchronise it before unload. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On 08/14/2012 05:38 PM, Michael S. Tsirkin wrote: And even ignoring that, global pointer to a device is an ugly hack and ugly hacks tend to explode. And even ignoring estetics, and if we decide we are fine with a single balloon, it needs to fail gracefully not crash like it does now. Fair enough. That certainly seems easy enough to fix. Each balloon driver can have its own struct address_space, and simply point mapping->host (or any of the others) at a global balloon thing somewhere. if (page->mapping && page->mapping->host == balloon_magic) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: > > > > now CPU1 executes the next instruction: > > > > > > > > } > > > > > > > > which would normally return to function's caller, > > > > but it has been overwritten by CPU2 so we get corruption. > > > > > > > > No? > > > > > > At the point CPU2 is unloading the module, it will be kept looping at the > > > snippet Rusty pointed out because the isolation / migration steps do not > > > mess > > > with 'vb->num_pages'. The driver will only unload after leaking the total > > > amount > > > of balloon's inflated pages, which means (for this hypothetical case) > > > CPU2 will > > > wait until CPU1 finishes the putaback procedure. > > > > > > > Yes but only until unlock finishes. The last return from function > > is not guarded and can be overwritten. > > CPU1 will be returning to putback_balloon_page() which code is located at core > mm/compaction.c, outside the driver. Sorry, I don't seem to be able to articulate this clearly. But this is a correctness issue so I am compelled to try again. Here is some pseudo code: int pages_lock; void virtballoon_isolatepage(void *page, unsigned long mode) { pages_lock = 0; } assignment of 0 emulates spin unlock. I removed all other content. Now look at disassembly: 080483d0 : virtballoon_isolatepage(): 80483d0: c7 05 88 96 04 08 00movl $0x0,0x8049688 80483d7: 00 00 00 <--- Above is "spin unlock" 80483da: c3 ret Here we are still executing module code (one instruction of it!) after we have dropped the lock. So if module goes away at the point marked by < above (and nothing seems to prevent that, since pages_lock is unlocked), the last instruction can get overwritten and then random code will get executed instead. In the end the rule is simple: you can not prevent module unloading from within module itself. It always must be the caller of your module that uses some lock to do this. My proposal is to use rcu for this since it is lightweight and also does not require us to pass extra state around. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: > > > now CPU1 executes the next instruction: > > > > > > } > > > > > > which would normally return to function's caller, > > > but it has been overwritten by CPU2 so we get corruption. > > > > > > No? > > > > At the point CPU2 is unloading the module, it will be kept looping at the > > snippet Rusty pointed out because the isolation / migration steps do not > > mess > > with 'vb->num_pages'. The driver will only unload after leaking the total > > amount > > of balloon's inflated pages, which means (for this hypothetical case) CPU2 > > will > > wait until CPU1 finishes the putaback procedure. > > > > Yes but only until unlock finishes. The last return from function > is not guarded and can be overwritten. CPU1 will be returning to putback_balloon_page() which code is located at core mm/compaction.c, outside the driver. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: > On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote: > > On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: > > > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: > > > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: > > > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > > > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > > > > > > What I think you should do is use rcu for access. > > > > > > > And here sync rcu before freeing. > > > > > > > Maybe an overkill but at least a documented synchronization > > > > > > > primitive, and it is very light weight. > > > > > > > > > > > > > > > > > > > I liked your suggestion on barriers, as well. > > > > > > > > > > > > > > > > I have not thought about this as deeply as I shouold but is simply > > > > > rechecking > > > > > the mapping under the pages_lock to make sure the page is still a > > > > > balloon > > > > > page an option? i.e. use pages_lock to stabilise page->mapping. > > > > > > > > To clarify, are you concerned about cost of rcu_read_lock > > > > for non balloon pages? > > > > > > > > > > Not as such, but given the choice between introducing RCU locking and > > > rechecking page->mapping under a spinlock I would choose the latter as it > > > is more straight-forward. > > > > OK but checking it how? page->mapping == balloon_mapping does not scale to > > multiple balloons, > > I was thinking of exactly that page->mapping == balloon_mapping check. As I > do not know how many active balloon drivers there might be I cannot guess > in advance how much of a scalability problem it will be. Not at all sure multiple drivers are worth supporting, but multiple *devices* is I think worth supporting, if for no other reason than that they can work today. For that, we need a device pointer which Rafael wants to put into the mapping, this means multiple balloon mappings. > > so I hoped we can switch to > > page->mapping->flags & BALLOON_MAPPING or some such, > > but this means we dereference it outside the lock ... > > > > That also sounded like future stuff to me that would be justified with > profiling if necessary. Personally I would have started with the spinlock > and a simple check and moved to RCU later when either scalability was a > problem or it was found there was a need to stabilise whether a page was > a balloon page or not outside a spinlock. > > This is not a NAK to the idea and I'm not objecting to RCU being used now > if that is what is really desired. I just suspect it's making the series > more complex than it needs to be right now. > > -- > Mel Gorman > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote: > On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: > > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: > > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > > > > > What I think you should do is use rcu for access. > > > > > > And here sync rcu before freeing. > > > > > > Maybe an overkill but at least a documented synchronization > > > > > > primitive, and it is very light weight. > > > > > > > > > > > > > > > > I liked your suggestion on barriers, as well. > > > > > > > > > > > > > I have not thought about this as deeply as I shouold but is simply > > > > rechecking > > > > the mapping under the pages_lock to make sure the page is still a > > > > balloon > > > > page an option? i.e. use pages_lock to stabilise page->mapping. > > > > > > To clarify, are you concerned about cost of rcu_read_lock > > > for non balloon pages? > > > > > > > Not as such, but given the choice between introducing RCU locking and > > rechecking page->mapping under a spinlock I would choose the latter as it > > is more straight-forward. > > OK but checking it how? page->mapping == balloon_mapping does not scale to > multiple balloons, I was thinking of exactly that page->mapping == balloon_mapping check. As I do not know how many active balloon drivers there might be I cannot guess in advance how much of a scalability problem it will be. > so I hoped we can switch to > page->mapping->flags & BALLOON_MAPPING or some such, > but this means we dereference it outside the lock ... > That also sounded like future stuff to me that would be justified with profiling if necessary. Personally I would have started with the spinlock and a simple check and moved to RCU later when either scalability was a problem or it was found there was a need to stabilise whether a page was a balloon page or not outside a spinlock. This is not a NAK to the idea and I'm not objecting to RCU being used now if that is what is really desired. I just suspect it's making the series more complex than it needs to be right now. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > > > > What I think you should do is use rcu for access. > > > > > And here sync rcu before freeing. > > > > > Maybe an overkill but at least a documented synchronization > > > > > primitive, and it is very light weight. > > > > > > > > > > > > > I liked your suggestion on barriers, as well. > > > > > > > > > > I have not thought about this as deeply as I shouold but is simply > > > rechecking > > > the mapping under the pages_lock to make sure the page is still a balloon > > > page an option? i.e. use pages_lock to stabilise page->mapping. > > > > To clarify, are you concerned about cost of rcu_read_lock > > for non balloon pages? > > > > Not as such, but given the choice between introducing RCU locking and > rechecking page->mapping under a spinlock I would choose the latter as it > is more straight-forward. OK but checking it how? page->mapping == balloon_mapping does not scale to multiple balloons, so I hoped we can switch to page->mapping->flags & BALLOON_MAPPING or some such, but this means we dereference it outside the lock ... We will also need to add some API to set/clear mapping so that driver does not need to poke in mm internals, but that's easy. > -- > Mel Gorman > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > > > What I think you should do is use rcu for access. > > > > And here sync rcu before freeing. > > > > Maybe an overkill but at least a documented synchronization > > > > primitive, and it is very light weight. > > > > > > > > > > I liked your suggestion on barriers, as well. > > > > > > > I have not thought about this as deeply as I shouold but is simply > > rechecking > > the mapping under the pages_lock to make sure the page is still a balloon > > page an option? i.e. use pages_lock to stabilise page->mapping. > > To clarify, are you concerned about cost of rcu_read_lock > for non balloon pages? > Not as such, but given the choice between introducing RCU locking and rechecking page->mapping under a spinlock I would choose the latter as it is more straight-forward. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > > What I think you should do is use rcu for access. > > > And here sync rcu before freeing. > > > Maybe an overkill but at least a documented synchronization > > > primitive, and it is very light weight. > > > > > > > I liked your suggestion on barriers, as well. > > > > I have not thought about this as deeply as I shouold but is simply rechecking > the mapping under the pages_lock to make sure the page is still a balloon > page an option? i.e. use pages_lock to stabilise page->mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? > -- > Mel Gorman > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > What I think you should do is use rcu for access. > > And here sync rcu before freeing. > > Maybe an overkill but at least a documented synchronization > > primitive, and it is very light weight. > > > > I liked your suggestion on barriers, as well. > I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page->mapping. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? Not as such, but given the choice between introducing RCU locking and rechecking page-mapping under a spinlock I would choose the latter as it is more straight-forward. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? Not as such, but given the choice between introducing RCU locking and rechecking page-mapping under a spinlock I would choose the latter as it is more straight-forward. OK but checking it how? page-mapping == balloon_mapping does not scale to multiple balloons, so I hoped we can switch to page-mapping-flags BALLOON_MAPPING or some such, but this means we dereference it outside the lock ... We will also need to add some API to set/clear mapping so that driver does not need to poke in mm internals, but that's easy. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? Not as such, but given the choice between introducing RCU locking and rechecking page-mapping under a spinlock I would choose the latter as it is more straight-forward. OK but checking it how? page-mapping == balloon_mapping does not scale to multiple balloons, I was thinking of exactly that page-mapping == balloon_mapping check. As I do not know how many active balloon drivers there might be I cannot guess in advance how much of a scalability problem it will be. so I hoped we can switch to page-mapping-flags BALLOON_MAPPING or some such, but this means we dereference it outside the lock ... That also sounded like future stuff to me that would be justified with profiling if necessary. Personally I would have started with the spinlock and a simple check and moved to RCU later when either scalability was a problem or it was found there was a need to stabilise whether a page was a balloon page or not outside a spinlock. This is not a NAK to the idea and I'm not objecting to RCU being used now if that is what is really desired. I just suspect it's making the series more complex than it needs to be right now. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote: On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote: On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote: On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote: On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. I have not thought about this as deeply as I shouold but is simply rechecking the mapping under the pages_lock to make sure the page is still a balloon page an option? i.e. use pages_lock to stabilise page-mapping. To clarify, are you concerned about cost of rcu_read_lock for non balloon pages? Not as such, but given the choice between introducing RCU locking and rechecking page-mapping under a spinlock I would choose the latter as it is more straight-forward. OK but checking it how? page-mapping == balloon_mapping does not scale to multiple balloons, I was thinking of exactly that page-mapping == balloon_mapping check. As I do not know how many active balloon drivers there might be I cannot guess in advance how much of a scalability problem it will be. Not at all sure multiple drivers are worth supporting, but multiple *devices* is I think worth supporting, if for no other reason than that they can work today. For that, we need a device pointer which Rafael wants to put into the mapping, this means multiple balloon mappings. so I hoped we can switch to page-mapping-flags BALLOON_MAPPING or some such, but this means we dereference it outside the lock ... That also sounded like future stuff to me that would be justified with profiling if necessary. Personally I would have started with the spinlock and a simple check and moved to RCU later when either scalability was a problem or it was found there was a need to stabilise whether a page was a balloon page or not outside a spinlock. This is not a NAK to the idea and I'm not objecting to RCU being used now if that is what is really desired. I just suspect it's making the series more complex than it needs to be right now. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. CPU1 will be returning to putback_balloon_page() which code is located at core mm/compaction.c, outside the driver. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote: now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. CPU1 will be returning to putback_balloon_page() which code is located at core mm/compaction.c, outside the driver. Sorry, I don't seem to be able to articulate this clearly. But this is a correctness issue so I am compelled to try again. Here is some pseudo code: int pages_lock; void virtballoon_isolatepage(void *page, unsigned long mode) { pages_lock = 0; } assignment of 0 emulates spin unlock. I removed all other content. Now look at disassembly: 080483d0 virtballoon_isolatepage: virtballoon_isolatepage(): 80483d0: c7 05 88 96 04 08 00movl $0x0,0x8049688 80483d7: 00 00 00 --- Above is spin unlock 80483da: c3 ret Here we are still executing module code (one instruction of it!) after we have dropped the lock. So if module goes away at the point marked by above (and nothing seems to prevent that, since pages_lock is unlocked), the last instruction can get overwritten and then random code will get executed instead. In the end the rule is simple: you can not prevent module unloading from within module itself. It always must be the caller of your module that uses some lock to do this. My proposal is to use rcu for this since it is lightweight and also does not require us to pass extra state around. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On 08/14/2012 05:38 PM, Michael S. Tsirkin wrote: And even ignoring that, global pointer to a device is an ugly hack and ugly hacks tend to explode. And even ignoring estetics, and if we decide we are fine with a single balloon, it needs to fail gracefully not crash like it does now. Fair enough. That certainly seems easy enough to fix. Each balloon driver can have its own struct address_space, and simply point mapping-host (or any of the others) at a global balloon thing somewhere. if (page-mapping page-mapping-host == balloon_magic) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, 14 Aug 2012 11:33:20 +0300, "Michael S. Tsirkin" wrote: > On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: > > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" > > wrote: > > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > > > > +/* > > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction > > > > on > > > > + * re-inserting an isolated page into the balloon page list. > > > > + */ > > > > +void virtballoon_putbackpage(struct page *page) > > > > +{ > > > > + spin_lock(_lock); > > > > + list_add(>lru, _ptr->pages); > > > > + spin_unlock(_lock); > > > > > > Could the following race trigger: > > > migration happens while module unloading is in progress, > > > module goes away between here and when the function > > > returns, then code for this function gets overwritten? > > > If yes we need locking external to module to prevent this. > > > Maybe add a spinlock to struct address_space? > > > > The balloon module cannot be unloaded until it has leaked all its pages, > > so I think this is safe: > > > > static void remove_common(struct virtio_balloon *vb) > > { > > /* There might be pages left in the balloon: free them. */ > > while (vb->num_pages) > > leak_balloon(vb, vb->num_pages); > > > > Cheers, > > Rusty. > > I know I meant something else. > Let me lay this out: > > CPU1 executes: > void virtballoon_putbackpage(struct page *page) > { > spin_lock(_lock); > list_add(>lru, _ptr->pages); > spin_unlock(_lock); > > > at this point CPU2 unloads module: > leak_balloon > .. > > next CPU2 loads another module so code memory gets overwritten > > now CPU1 executes the next instruction: > > } > > which would normally return to function's caller, > but it has been overwritten by CPU2 so we get corruption. Actually, I have no idea. Where does virtballoon_putbackpage get called from? It's some weird mm thing, and I stay out of that mess. The vb thread is stopped before we spin checking vb->num_pages, so it's not touching pages; who would be calling this? Confused, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 06:34:13PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: > > > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: > > > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > > > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > What if there is more than one balloon device? > > > > > > > > > > > > > > > > Is it possible to load this driver twice, or are you foreseeing > > > > > > > > a future case > > > > > > > > where this driver will be able to manage several distinct > > > > > > > > memory balloons for > > > > > > > > the same guest? > > > > > > > > > > > > > > > > > > > > > > Second. > > > > > > > It is easy to create several balloons they are just > > > > > > > pci devices. > > > > > > > > > > > > > > > > > > > > > > > > and it might not be too important to make it work but > > > > > > at least would be nice not to have a crash in this > > > > > > setup. > > > > > > > > > > > Fair enough. For now, as I believe it's safe to assume we are only > > > > > inflating one > > > > > balloon per guest, I'd like to propose this as a future enhancement. > > > > > Sounds > > > > > good? > > > > > > > > > > > > > Since guest crashes when it's not the case, no it doesn't, sorry :(. > > > > > > > Ok, but right now this driver only takes care of 1 balloon per guest, > > > > It does? Are you sure? There is no global state as far as I can see. So > > I can create 2 devices and driver will happily create two instances, > > each one can be inflated/deflated independently. > > > > > so how > > > could this approach crash it? > > > > Add device. inflate. Add another device. inflate. deflate. unplug. > > Now you have pointer to freed memory and when mm touches > > page from first device, you ge use after free. > > > > > Your point is a good thing to be on a to-do list for future enhancements, > > > but > > > it's not a dealbreaker for the present balloon driver implementation, > > > IMHO. > > > > > > > Yes it looks like a dealbreaker to me. > > Sorry. You're right, I'm wrong. > > I'll get back to the scracthpad to overcome this constraint. I believe the way > this patch was at its v4 revision (wrt this particular case) could possibly > address this concern of yours. Almost. We still have a global balloon_mapping. The only reason for it to exist seems solely to detect balloon mappings, so it can just be replaced by a flag in the mapping, or in mapping ops, or elsewhere. Also, please add APIs to mm so we can avoid doing internal mm stuff like INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN); in the driver. It should be alloc_address_mapping(_balloon_aops); free_address_mapping Make page->mapping use rcu, and sync rcu in free_address_mapping. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 04:56:59PM -0400, Rik van Riel wrote: > On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote: > > >To clarify, the global state that this patch adds, is ugly > >even if we didn't support multiple balloons yet. > >So I don't think I can accept such a patch. > >Rusty has a final word here, maybe he thinks differently. > > Before deciding that "does not support multiple balloon drivers > at once" is an issue, is there any use case at all for having > multiple balloon drivers active at a time? > > I do not see any. For example, we had a proposal for a page-cache backed device. So it could be useful to have two, a regular balloon and a pagecache backed one. There could be other uses - it certainly looks like it works so how can you be sure it's useless? And even ignoring that, global pointer to a device is an ugly hack and ugly hacks tend to explode. And even ignoring estetics, and if we decide we are fine with a single balloon, it needs to fail gracefully not crash like it does now. > -- > All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: > > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > > > > What if there is more than one balloon device? > > > > > > > > > > > > > > Is it possible to load this driver twice, or are you foreseeing a > > > > > > > future case > > > > > > > where this driver will be able to manage several distinct memory > > > > > > > balloons for > > > > > > > the same guest? > > > > > > > > > > > > > > > > > > > Second. > > > > > > It is easy to create several balloons they are just > > > > > > pci devices. > > > > > > > > > > > > > > > > > > > > and it might not be too important to make it work but > > > > > at least would be nice not to have a crash in this > > > > > setup. > > > > > > > > > Fair enough. For now, as I believe it's safe to assume we are only > > > > inflating one > > > > balloon per guest, I'd like to propose this as a future enhancement. > > > > Sounds > > > > good? > > > > > > > > > > Since guest crashes when it's not the case, no it doesn't, sorry :(. > > > > > Ok, but right now this driver only takes care of 1 balloon per guest, > > It does? Are you sure? There is no global state as far as I can see. So > I can create 2 devices and driver will happily create two instances, > each one can be inflated/deflated independently. > > > so how > > could this approach crash it? > > Add device. inflate. Add another device. inflate. deflate. unplug. > Now you have pointer to freed memory and when mm touches > page from first device, you ge use after free. > > > Your point is a good thing to be on a to-do list for future enhancements, > > but > > it's not a dealbreaker for the present balloon driver implementation, IMHO. > > > > Yes it looks like a dealbreaker to me. Sorry. You're right, I'm wrong. I'll get back to the scracthpad to overcome this constraint. I believe the way this patch was at its v4 revision (wrt this particular case) could possibly address this concern of yours. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote: To clarify, the global state that this patch adds, is ugly even if we didn't support multiple balloons yet. So I don't think I can accept such a patch. Rusty has a final word here, maybe he thinks differently. Before deciding that "does not support multiple balloon drivers at once" is an issue, is there any use case at all for having multiple balloon drivers active at a time? I do not see any. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: > > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > > > > What if there is more than one balloon device? > > > > > > > > > > > > > > Is it possible to load this driver twice, or are you foreseeing a > > > > > > > future case > > > > > > > where this driver will be able to manage several distinct memory > > > > > > > balloons for > > > > > > > the same guest? > > > > > > > > > > > > > > > > > > > Second. > > > > > > It is easy to create several balloons they are just > > > > > > pci devices. > > > > > > > > > > > > > > > > > > > > and it might not be too important to make it work but > > > > > at least would be nice not to have a crash in this > > > > > setup. > > > > > > > > > Fair enough. For now, as I believe it's safe to assume we are only > > > > inflating one > > > > balloon per guest, I'd like to propose this as a future enhancement. > > > > Sounds > > > > good? > > > > > > > > > > Since guest crashes when it's not the case, no it doesn't, sorry :(. > > > > > Ok, but right now this driver only takes care of 1 balloon per guest, > > It does? Are you sure? There is no global state as far as I can see. So > I can create 2 devices and driver will happily create two instances, > each one can be inflated/deflated independently. > > > so how > > could this approach crash it? > > Add device. inflate. Add another device. inflate. deflate. unplug. > Now you have pointer to freed memory and when mm touches > page from first device, you ge use after free. > > > Your point is a good thing to be on a to-do list for future enhancements, > > but > > it's not a dealbreaker for the present balloon driver implementation, IMHO. > > > > Yes it looks like a dealbreaker to me. To clarify, the global state that this patch adds, is ugly even if we didn't support multiple balloons yet. So I don't think I can accept such a patch. Rusty has a final word here, maybe he thinks differently. > -- > MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > > > What if there is more than one balloon device? > > > > > > > > > > > > Is it possible to load this driver twice, or are you foreseeing a > > > > > > future case > > > > > > where this driver will be able to manage several distinct memory > > > > > > balloons for > > > > > > the same guest? > > > > > > > > > > > > > > > > Second. > > > > > It is easy to create several balloons they are just > > > > > pci devices. > > > > > > > > > > > > > > > > and it might not be too important to make it work but > > > > at least would be nice not to have a crash in this > > > > setup. > > > > > > > Fair enough. For now, as I believe it's safe to assume we are only > > > inflating one > > > balloon per guest, I'd like to propose this as a future enhancement. > > > Sounds > > > good? > > > > > > > Since guest crashes when it's not the case, no it doesn't, sorry :(. > > > Ok, but right now this driver only takes care of 1 balloon per guest, It does? Are you sure? There is no global state as far as I can see. So I can create 2 devices and driver will happily create two instances, each one can be inflated/deflated independently. > so how > could this approach crash it? Add device. inflate. Add another device. inflate. deflate. unplug. Now you have pointer to freed memory and when mm touches page from first device, you ge use after free. > Your point is a good thing to be on a to-do list for future enhancements, but > it's not a dealbreaker for the present balloon driver implementation, IMHO. > Yes it looks like a dealbreaker to me. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > > What I think you should do is use rcu for access. > > And here sync rcu before freeing. > > Maybe an overkill but at least a documented synchronization > > primitive, and it is very light weight. > > > > I liked your suggestion on barriers, as well. > > Rik, Mel ? Further instead of simple assignment I would add an api in mm to set/clear the balloon mapping, with proper locking. This could fail if already set, and thus fix crash with many ballons. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > > What if there is more than one balloon device? > > > > > > > > > > Is it possible to load this driver twice, or are you foreseeing a > > > > > future case > > > > > where this driver will be able to manage several distinct memory > > > > > balloons for > > > > > the same guest? > > > > > > > > > > > > > Second. > > > > It is easy to create several balloons they are just > > > > pci devices. > > > > > > > > > > > > and it might not be too important to make it work but > > > at least would be nice not to have a crash in this > > > setup. > > > > > Fair enough. For now, as I believe it's safe to assume we are only > > inflating one > > balloon per guest, I'd like to propose this as a future enhancement. Sounds > > good? > > > > Since guest crashes when it's not the case, no it doesn't, sorry :(. > Ok, but right now this driver only takes care of 1 balloon per guest, so how could this approach crash it? Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > > What if there is more than one balloon device? > > > > > > > > Is it possible to load this driver twice, or are you foreseeing a > > > > future case > > > > where this driver will be able to manage several distinct memory > > > > balloons for > > > > the same guest? > > > > > > > > > > Second. > > > It is easy to create several balloons they are just > > > pci devices. > > > > > > > > and it might not be too important to make it work but > > at least would be nice not to have a crash in this > > setup. > > > Fair enough. For now, as I believe it's safe to assume we are only inflating > one > balloon per guest, I'd like to propose this as a future enhancement. Sounds > good? > Since guest crashes when it's not the case, no it doesn't, sorry :(. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > What I think you should do is use rcu for access. > And here sync rcu before freeing. > Maybe an overkill but at least a documented synchronization > primitive, and it is very light weight. > I liked your suggestion on barriers, as well. Rik, Mel ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: > > > > What if there is more than one balloon device? > > > > > > Is it possible to load this driver twice, or are you foreseeing a future > > > case > > > where this driver will be able to manage several distinct memory balloons > > > for > > > the same guest? > > > > > > > Second. > > It is easy to create several balloons they are just > > pci devices. > > > > and it might not be too important to make it work but > at least would be nice not to have a crash in this > setup. > Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote: > > On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: > > > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon > > > > *vb, size_t num) > > > > set_page_pfns(vb->pfns + vb->num_pfns, page); > > > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > > > > totalram_pages--; > > > > + spin_lock(_lock); > > > > list_add(>lru, >pages); > > > > > > If list_add above is reordered with mapping assignment below, > > > then nothing bad happens because balloon_mapping takes > > > pages_lock. > > > > > > > + page->mapping = balloon_mapping; > > > > + spin_unlock(_lock); > > > > } > > > > > > > > /* Didn't get any? Oh well. */ > > > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, > > > > size_t num) > > > > return; > > > > > > > > tell_host(vb, vb->inflate_vq); > > > > + mutex_unlock(_lock); > > > > } > > > > > > > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num) > > > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon > > > > *vb, size_t num) > > > > /* We can only do one array worth at a time. */ > > > > num = min(num, ARRAY_SIZE(vb->pfns)); > > > > > > > > + mutex_lock(_lock); > > > > for (vb->num_pfns = 0; vb->num_pfns < num; > > > > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > > > > + /* > > > > +* We can race against virtballoon_isolatepage() and > > > > end up > > > > +* stumbling across a _temporarily_ empty 'pages' list. > > > > +*/ > > > > + spin_lock(_lock); > > > > + if (unlikely(list_empty(>pages))) { > > > > + spin_unlock(_lock); > > > > + break; > > > > + } > > > > page = list_first_entry(>pages, struct page, lru); > > > > + page->mapping = NULL; > > > > > > Unlike the case above, here > > > if = NULL write above is reordered with list_del below, > > > then isolate_page can run on a page that is not > > > on lru. > > > > > > So I think this needs a wmb(). > > > And maybe a comment above explaining why it is safe? > > > > Good point. Presumably, this nit has potential to explain your guessing on > > the > > read barrier stuff at movable_balloon_page() on the other patch. > > > > > What I think you should do is use rcu for access. > And here sync rcu before freeing. > Maybe an overkill but at least a documented synchronization > primitive, and it is very light weight. > > > > > list_del(>lru); > > > > > > I wonder why changing page->lru here is safe against > > > races with unmap_and_move in the previous patch. > > > > leak_balloon() doesn't race against unmap_and_move() because the later > > works on > > an isolated page set. So, theoretically, pages being dequeued from balloon > > page > > list here are either migrated (already) or they were not isolated yet. > > > > So add a comment explaining why it is safe pls. > > > > > +/* > > > > + * '*vb_ptr' allows virtballoon_migratepage() & > > > > virtballoon_putbackpage() to > > > > + * access pertinent elements from struct virtio_balloon > > > > + */ > > > > > > What if there is more than one balloon device? > > > > Is it possible to load this driver twice, or are you foreseeing a future > > case > > where this driver will be able to manage several distinct memory balloons > > for > > the same guest? > > > > Second. > It is easy to create several balloons they are just > pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. > > > > + /* Init the ballooned page->mapping special balloon_mapping */ > > > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); > > > > + if (!balloon_mapping) { > > > > + err = -ENOMEM; > > > > + goto out_free_vb; > > > > + } > > > > > > Can balloon_mapping be dereferenced at this point? > > > Then what happens? > > > > Since there's no balloon page enqueued for this balloon driver yet, there's > > no > > chance of balloon_mapping being dereferenced at this point. > > > > > > > > + > > > > + INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | > > > > __GFP_NOWARN); > > > > + INIT_LIST_HEAD(_mapping->i_mmap_nonlinear); > > > > + spin_lock_init(_mapping->tree_lock); > > > > + balloon_mapping->a_ops = _balloon_aops; > > > > > > > > err = init_vqs(vb); > > > > if (err) > > > > @@ -373,6 +493,7 @@ out_del_vqs: > > > > vdev->config->del_vqs(vdev); > > > > out_free_vb: > > > >
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote: > On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: > > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, > > > size_t num) > > > set_page_pfns(vb->pfns + vb->num_pfns, page); > > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > > > totalram_pages--; > > > + spin_lock(_lock); > > > list_add(>lru, >pages); > > > > If list_add above is reordered with mapping assignment below, > > then nothing bad happens because balloon_mapping takes > > pages_lock. > > > > > + page->mapping = balloon_mapping; > > > + spin_unlock(_lock); > > > } > > > > > > /* Didn't get any? Oh well. */ > > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, > > > size_t num) > > > return; > > > > > > tell_host(vb, vb->inflate_vq); > > > + mutex_unlock(_lock); > > > } > > > > > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num) > > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, > > > size_t num) > > > /* We can only do one array worth at a time. */ > > > num = min(num, ARRAY_SIZE(vb->pfns)); > > > > > > + mutex_lock(_lock); > > > for (vb->num_pfns = 0; vb->num_pfns < num; > > >vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > > > + /* > > > + * We can race against virtballoon_isolatepage() and end up > > > + * stumbling across a _temporarily_ empty 'pages' list. > > > + */ > > > + spin_lock(_lock); > > > + if (unlikely(list_empty(>pages))) { > > > + spin_unlock(_lock); > > > + break; > > > + } > > > page = list_first_entry(>pages, struct page, lru); > > > + page->mapping = NULL; > > > > Unlike the case above, here > > if = NULL write above is reordered with list_del below, > > then isolate_page can run on a page that is not > > on lru. > > > > So I think this needs a wmb(). > > And maybe a comment above explaining why it is safe? > > Good point. Presumably, this nit has potential to explain your guessing on the > read barrier stuff at movable_balloon_page() on the other patch. > What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. > > > list_del(>lru); > > > > I wonder why changing page->lru here is safe against > > races with unmap_and_move in the previous patch. > > leak_balloon() doesn't race against unmap_and_move() because the later works > on > an isolated page set. So, theoretically, pages being dequeued from balloon > page > list here are either migrated (already) or they were not isolated yet. > So add a comment explaining why it is safe pls. > > > +/* > > > + * '*vb_ptr' allows virtballoon_migratepage() & > > > virtballoon_putbackpage() to > > > + * access pertinent elements from struct virtio_balloon > > > + */ > > > > What if there is more than one balloon device? > > Is it possible to load this driver twice, or are you foreseeing a future case > where this driver will be able to manage several distinct memory balloons for > the same guest? > Second. It is easy to create several balloons they are just pci devices. > > > + /* Init the ballooned page->mapping special balloon_mapping */ > > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); > > > + if (!balloon_mapping) { > > > + err = -ENOMEM; > > > + goto out_free_vb; > > > + } > > > > Can balloon_mapping be dereferenced at this point? > > Then what happens? > > Since there's no balloon page enqueued for this balloon driver yet, there's no > chance of balloon_mapping being dereferenced at this point. > > > > > + > > > + INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN); > > > + INIT_LIST_HEAD(_mapping->i_mmap_nonlinear); > > > + spin_lock_init(_mapping->tree_lock); > > > + balloon_mapping->a_ops = _balloon_aops; > > > > > > err = init_vqs(vb); > > > if (err) > > > @@ -373,6 +493,7 @@ out_del_vqs: > > > vdev->config->del_vqs(vdev); > > > out_free_vb: > > > kfree(vb); > > > + kfree(balloon_mapping); > > > > No need to set it to NULL? It seems if someone else allocates a mapping > > and gets this chunk of memory by chance, the logic in mm will get > > confused. > > Good point. It surely doesn't hurt be asured of this sort of safety. > > > > out: > > > return err; > > > } > > > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct > > > virtio_device *vdev) > > > kthread_stop(vb->thread); > > > remove_common(vb); > > > kfree(vb); > > > + kfree(balloon_mapping); > > > > Neither here? > > ditto. > > > > > } > > > > > > #ifdef CONFIG_PM > > > diff --git a/include/linux/virtio_balloon.h > > > b/include/linux/virtio_balloon.h > > > index
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 03:44:09PM -0300, Rafael Aquini wrote: > On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: > > > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" > > > wrote: > > > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > > > > > +/* > > > > > + * Populate balloon_mapping->a_ops->freepage method to help > > > > > compaction on > > > > > + * re-inserting an isolated page into the balloon page list. > > > > > + */ > > > > > +void virtballoon_putbackpage(struct page *page) > > > > > +{ > > > > > + spin_lock(_lock); > > > > > + list_add(>lru, _ptr->pages); > > > > > + spin_unlock(_lock); > > > > > > > > Could the following race trigger: > > > > migration happens while module unloading is in progress, > > > > module goes away between here and when the function > > > > returns, then code for this function gets overwritten? > > > > If yes we need locking external to module to prevent this. > > > > Maybe add a spinlock to struct address_space? > > > > > > The balloon module cannot be unloaded until it has leaked all its pages, > > > so I think this is safe: > > > > > > static void remove_common(struct virtio_balloon *vb) > > > { > > > /* There might be pages left in the balloon: free them. */ > > > while (vb->num_pages) > > > leak_balloon(vb, vb->num_pages); > > > > > > Cheers, > > > Rusty. > > > > I know I meant something else. > > Let me lay this out: > > > > CPU1 executes: > > void virtballoon_putbackpage(struct page *page) > > { > > spin_lock(_lock); > > list_add(>lru, _ptr->pages); > > spin_unlock(_lock); > > > > > > at this point CPU2 unloads module: > > leak_balloon > > .. > > > > next CPU2 loads another module so code memory gets overwritten > > > > now CPU1 executes the next instruction: > > > > } > > > > which would normally return to function's caller, > > but it has been overwritten by CPU2 so we get corruption. > > > > No? > > At the point CPU2 is unloading the module, it will be kept looping at the > snippet Rusty pointed out because the isolation / migration steps do not mess > with 'vb->num_pages'. The driver will only unload after leaking the total > amount > of balloon's inflated pages, which means (for this hypothetical case) CPU2 > will > wait until CPU1 finishes the putaback procedure. > Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: > > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" > > wrote: > > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > > > > +/* > > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction > > > > on > > > > + * re-inserting an isolated page into the balloon page list. > > > > + */ > > > > +void virtballoon_putbackpage(struct page *page) > > > > +{ > > > > + spin_lock(_lock); > > > > + list_add(>lru, _ptr->pages); > > > > + spin_unlock(_lock); > > > > > > Could the following race trigger: > > > migration happens while module unloading is in progress, > > > module goes away between here and when the function > > > returns, then code for this function gets overwritten? > > > If yes we need locking external to module to prevent this. > > > Maybe add a spinlock to struct address_space? > > > > The balloon module cannot be unloaded until it has leaked all its pages, > > so I think this is safe: > > > > static void remove_common(struct virtio_balloon *vb) > > { > > /* There might be pages left in the balloon: free them. */ > > while (vb->num_pages) > > leak_balloon(vb, vb->num_pages); > > > > Cheers, > > Rusty. > > I know I meant something else. > Let me lay this out: > > CPU1 executes: > void virtballoon_putbackpage(struct page *page) > { > spin_lock(_lock); > list_add(>lru, _ptr->pages); > spin_unlock(_lock); > > > at this point CPU2 unloads module: > leak_balloon > .. > > next CPU2 loads another module so code memory gets overwritten > > now CPU1 executes the next instruction: > > } > > which would normally return to function's caller, > but it has been overwritten by CPU2 so we get corruption. > > No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb->num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, > > size_t num) > > set_page_pfns(vb->pfns + vb->num_pfns, page); > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > > totalram_pages--; > > + spin_lock(_lock); > > list_add(>lru, >pages); > > If list_add above is reordered with mapping assignment below, > then nothing bad happens because balloon_mapping takes > pages_lock. > > > + page->mapping = balloon_mapping; > > + spin_unlock(_lock); > > } > > > > /* Didn't get any? Oh well. */ > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, > > size_t num) > > return; > > > > tell_host(vb, vb->inflate_vq); > > + mutex_unlock(_lock); > > } > > > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num) > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, > > size_t num) > > /* We can only do one array worth at a time. */ > > num = min(num, ARRAY_SIZE(vb->pfns)); > > > > + mutex_lock(_lock); > > for (vb->num_pfns = 0; vb->num_pfns < num; > > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > > + /* > > +* We can race against virtballoon_isolatepage() and end up > > +* stumbling across a _temporarily_ empty 'pages' list. > > +*/ > > + spin_lock(_lock); > > + if (unlikely(list_empty(>pages))) { > > + spin_unlock(_lock); > > + break; > > + } > > page = list_first_entry(>pages, struct page, lru); > > + page->mapping = NULL; > > Unlike the case above, here > if = NULL write above is reordered with list_del below, > then isolate_page can run on a page that is not > on lru. > > So I think this needs a wmb(). > And maybe a comment above explaining why it is safe? Good point. Presumably, this nit has potential to explain your guessing on the read barrier stuff at movable_balloon_page() on the other patch. > > list_del(>lru); > > I wonder why changing page->lru here is safe against > races with unmap_and_move in the previous patch. leak_balloon() doesn't race against unmap_and_move() because the later works on an isolated page set. So, theoretically, pages being dequeued from balloon page list here are either migrated (already) or they were not isolated yet. > > +/* > > + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() > > to > > + * access pertinent elements from struct virtio_balloon > > + */ > > What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? > > + /* Init the ballooned page->mapping special balloon_mapping */ > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); > > + if (!balloon_mapping) { > > + err = -ENOMEM; > > + goto out_free_vb; > > + } > > Can balloon_mapping be dereferenced at this point? > Then what happens? Since there's no balloon page enqueued for this balloon driver yet, there's no chance of balloon_mapping being dereferenced at this point. > > + > > + INIT_RADIX_TREE(_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN); > > + INIT_LIST_HEAD(_mapping->i_mmap_nonlinear); > > + spin_lock_init(_mapping->tree_lock); > > + balloon_mapping->a_ops = _balloon_aops; > > > > err = init_vqs(vb); > > if (err) > > @@ -373,6 +493,7 @@ out_del_vqs: > > vdev->config->del_vqs(vdev); > > out_free_vb: > > kfree(vb); > > + kfree(balloon_mapping); > > No need to set it to NULL? It seems if someone else allocates a mapping > and gets this chunk of memory by chance, the logic in mm will get > confused. Good point. It surely doesn't hurt be asured of this sort of safety. > > out: > > return err; > > } > > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct > > virtio_device *vdev) > > kthread_stop(vb->thread); > > remove_common(vb); > > kfree(vb); > > + kfree(balloon_mapping); > > Neither here? ditto. > > } > > > > #ifdef CONFIG_PM > > diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h > > index 652dc8b..930f1b7 100644 > > --- a/include/linux/virtio_balloon.h > > +++ b/include/linux/virtio_balloon.h > > @@ -56,4 +56,8 @@ struct virtio_balloon_stat { > > u64 val; > > } __attribute__((packed)); > > > > +#if !defined(CONFIG_COMPACTION) > > +struct address_space *balloon_mapping; > > +#endif > > + > > Anyone including this header will get a different copy of > balloon_mapping. Besides, need to be ifdef KERNEL. Good point. I'll move this hunk to the balloon driver itself. -- To unsubscribe from this list: send the line
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" > wrote: > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > > > +/* > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on > > > + * re-inserting an isolated page into the balloon page list. > > > + */ > > > +void virtballoon_putbackpage(struct page *page) > > > +{ > > > + spin_lock(_lock); > > > + list_add(>lru, _ptr->pages); > > > + spin_unlock(_lock); > > > > Could the following race trigger: > > migration happens while module unloading is in progress, > > module goes away between here and when the function > > returns, then code for this function gets overwritten? > > If yes we need locking external to module to prevent this. > > Maybe add a spinlock to struct address_space? > > The balloon module cannot be unloaded until it has leaked all its pages, > so I think this is safe: > > static void remove_common(struct virtio_balloon *vb) > { > /* There might be pages left in the balloon: free them. */ > while (vb->num_pages) > leak_balloon(vb, vb->num_pages); > > Cheers, > Rusty. I know I meant something else. Let me lay this out: CPU1 executes: void virtballoon_putbackpage(struct page *page) { spin_lock(_lock); list_add(>lru, _ptr->pages); spin_unlock(_lock); at this point CPU2 unloads module: leak_balloon .. next CPU2 loads another module so code memory gets overwritten now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: +/* + * Populate balloon_mapping-a_ops-freepage method to help compaction on + * re-inserting an isolated page into the balloon page list. + */ +void virtballoon_putbackpage(struct page *page) +{ + spin_lock(pages_lock); + list_add(page-lru, vb_ptr-pages); + spin_unlock(pages_lock); Could the following race trigger: migration happens while module unloading is in progress, module goes away between here and when the function returns, then code for this function gets overwritten? If yes we need locking external to module to prevent this. Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb-num_pages) leak_balloon(vb, vb-num_pages); Cheers, Rusty. I know I meant something else. Let me lay this out: CPU1 executes: void virtballoon_putbackpage(struct page *page) { spin_lock(pages_lock); list_add(page-lru, vb_ptr-pages); spin_unlock(pages_lock); at this point CPU2 unloads module: leak_balloon .. next CPU2 loads another module so code memory gets overwritten now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(pages_lock); list_add(page-lru, vb-pages); If list_add above is reordered with mapping assignment below, then nothing bad happens because balloon_mapping takes pages_lock. + page-mapping = balloon_mapping; + spin_unlock(pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* +* We can race against virtballoon_isolatepage() and end up +* stumbling across a _temporarily_ empty 'pages' list. +*/ + spin_lock(pages_lock); + if (unlikely(list_empty(vb-pages))) { + spin_unlock(pages_lock); + break; + } page = list_first_entry(vb-pages, struct page, lru); + page-mapping = NULL; Unlike the case above, here if = NULL write above is reordered with list_del below, then isolate_page can run on a page that is not on lru. So I think this needs a wmb(). And maybe a comment above explaining why it is safe? Good point. Presumably, this nit has potential to explain your guessing on the read barrier stuff at movable_balloon_page() on the other patch. list_del(page-lru); I wonder why changing page-lru here is safe against races with unmap_and_move in the previous patch. leak_balloon() doesn't race against unmap_and_move() because the later works on an isolated page set. So, theoretically, pages being dequeued from balloon page list here are either migrated (already) or they were not isolated yet. +/* + * '*vb_ptr' allows virtballoon_migratepage() virtballoon_putbackpage() to + * access pertinent elements from struct virtio_balloon + */ What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? + /* Init the ballooned page-mapping special balloon_mapping */ + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); + if (!balloon_mapping) { + err = -ENOMEM; + goto out_free_vb; + } Can balloon_mapping be dereferenced at this point? Then what happens? Since there's no balloon page enqueued for this balloon driver yet, there's no chance of balloon_mapping being dereferenced at this point. + + INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN); + INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear); + spin_lock_init(balloon_mapping-tree_lock); + balloon_mapping-a_ops = virtio_balloon_aops; err = init_vqs(vb); if (err) @@ -373,6 +493,7 @@ out_del_vqs: vdev-config-del_vqs(vdev); out_free_vb: kfree(vb); + kfree(balloon_mapping); No need to set it to NULL? It seems if someone else allocates a mapping and gets this chunk of memory by chance, the logic in mm will get confused. Good point. It surely doesn't hurt be asured of this sort of safety. out: return err; } @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev) kthread_stop(vb-thread); remove_common(vb); kfree(vb); + kfree(balloon_mapping); Neither here? ditto. } #ifdef CONFIG_PM diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h index 652dc8b..930f1b7 100644 --- a/include/linux/virtio_balloon.h +++ b/include/linux/virtio_balloon.h @@ -56,4 +56,8 @@ struct virtio_balloon_stat { u64 val; } __attribute__((packed)); +#if !defined(CONFIG_COMPACTION) +struct address_space *balloon_mapping; +#endif + Anyone including this header will get a different copy of balloon_mapping. Besides, need to be ifdef KERNEL. Good point. I'll move this hunk to the balloon driver itself. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: +/* + * Populate balloon_mapping-a_ops-freepage method to help compaction on + * re-inserting an isolated page into the balloon page list. + */ +void virtballoon_putbackpage(struct page *page) +{ + spin_lock(pages_lock); + list_add(page-lru, vb_ptr-pages); + spin_unlock(pages_lock); Could the following race trigger: migration happens while module unloading is in progress, module goes away between here and when the function returns, then code for this function gets overwritten? If yes we need locking external to module to prevent this. Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb-num_pages) leak_balloon(vb, vb-num_pages); Cheers, Rusty. I know I meant something else. Let me lay this out: CPU1 executes: void virtballoon_putbackpage(struct page *page) { spin_lock(pages_lock); list_add(page-lru, vb_ptr-pages); spin_unlock(pages_lock); at this point CPU2 unloads module: leak_balloon .. next CPU2 loads another module so code memory gets overwritten now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 03:44:09PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: +/* + * Populate balloon_mapping-a_ops-freepage method to help compaction on + * re-inserting an isolated page into the balloon page list. + */ +void virtballoon_putbackpage(struct page *page) +{ + spin_lock(pages_lock); + list_add(page-lru, vb_ptr-pages); + spin_unlock(pages_lock); Could the following race trigger: migration happens while module unloading is in progress, module goes away between here and when the function returns, then code for this function gets overwritten? If yes we need locking external to module to prevent this. Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb-num_pages) leak_balloon(vb, vb-num_pages); Cheers, Rusty. I know I meant something else. Let me lay this out: CPU1 executes: void virtballoon_putbackpage(struct page *page) { spin_lock(pages_lock); list_add(page-lru, vb_ptr-pages); spin_unlock(pages_lock); at this point CPU2 unloads module: leak_balloon .. next CPU2 loads another module so code memory gets overwritten now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. No? At the point CPU2 is unloading the module, it will be kept looping at the snippet Rusty pointed out because the isolation / migration steps do not mess with 'vb-num_pages'. The driver will only unload after leaking the total amount of balloon's inflated pages, which means (for this hypothetical case) CPU2 will wait until CPU1 finishes the putaback procedure. Yes but only until unlock finishes. The last return from function is not guarded and can be overwritten. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote: On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(pages_lock); list_add(page-lru, vb-pages); If list_add above is reordered with mapping assignment below, then nothing bad happens because balloon_mapping takes pages_lock. + page-mapping = balloon_mapping; + spin_unlock(pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* + * We can race against virtballoon_isolatepage() and end up + * stumbling across a _temporarily_ empty 'pages' list. + */ + spin_lock(pages_lock); + if (unlikely(list_empty(vb-pages))) { + spin_unlock(pages_lock); + break; + } page = list_first_entry(vb-pages, struct page, lru); + page-mapping = NULL; Unlike the case above, here if = NULL write above is reordered with list_del below, then isolate_page can run on a page that is not on lru. So I think this needs a wmb(). And maybe a comment above explaining why it is safe? Good point. Presumably, this nit has potential to explain your guessing on the read barrier stuff at movable_balloon_page() on the other patch. What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. list_del(page-lru); I wonder why changing page-lru here is safe against races with unmap_and_move in the previous patch. leak_balloon() doesn't race against unmap_and_move() because the later works on an isolated page set. So, theoretically, pages being dequeued from balloon page list here are either migrated (already) or they were not isolated yet. So add a comment explaining why it is safe pls. +/* + * '*vb_ptr' allows virtballoon_migratepage() virtballoon_putbackpage() to + * access pertinent elements from struct virtio_balloon + */ What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. + /* Init the ballooned page-mapping special balloon_mapping */ + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); + if (!balloon_mapping) { + err = -ENOMEM; + goto out_free_vb; + } Can balloon_mapping be dereferenced at this point? Then what happens? Since there's no balloon page enqueued for this balloon driver yet, there's no chance of balloon_mapping being dereferenced at this point. + + INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN); + INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear); + spin_lock_init(balloon_mapping-tree_lock); + balloon_mapping-a_ops = virtio_balloon_aops; err = init_vqs(vb); if (err) @@ -373,6 +493,7 @@ out_del_vqs: vdev-config-del_vqs(vdev); out_free_vb: kfree(vb); + kfree(balloon_mapping); No need to set it to NULL? It seems if someone else allocates a mapping and gets this chunk of memory by chance, the logic in mm will get confused. Good point. It surely doesn't hurt be asured of this sort of safety. out: return err; } @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev) kthread_stop(vb-thread); remove_common(vb); kfree(vb); + kfree(balloon_mapping); Neither here? ditto. } #ifdef CONFIG_PM diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h index 652dc8b..930f1b7 100644 --- a/include/linux/virtio_balloon.h +++ b/include/linux/virtio_balloon.h @@ -56,4 +56,8 @@ struct virtio_balloon_stat { u64 val; } __attribute__((packed)); +#if !defined(CONFIG_COMPACTION) +struct address_space
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote: On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote: @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(pages_lock); list_add(page-lru, vb-pages); If list_add above is reordered with mapping assignment below, then nothing bad happens because balloon_mapping takes pages_lock. + page-mapping = balloon_mapping; + spin_unlock(pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* +* We can race against virtballoon_isolatepage() and end up +* stumbling across a _temporarily_ empty 'pages' list. +*/ + spin_lock(pages_lock); + if (unlikely(list_empty(vb-pages))) { + spin_unlock(pages_lock); + break; + } page = list_first_entry(vb-pages, struct page, lru); + page-mapping = NULL; Unlike the case above, here if = NULL write above is reordered with list_del below, then isolate_page can run on a page that is not on lru. So I think this needs a wmb(). And maybe a comment above explaining why it is safe? Good point. Presumably, this nit has potential to explain your guessing on the read barrier stuff at movable_balloon_page() on the other patch. What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. list_del(page-lru); I wonder why changing page-lru here is safe against races with unmap_and_move in the previous patch. leak_balloon() doesn't race against unmap_and_move() because the later works on an isolated page set. So, theoretically, pages being dequeued from balloon page list here are either migrated (already) or they were not isolated yet. So add a comment explaining why it is safe pls. +/* + * '*vb_ptr' allows virtballoon_migratepage() virtballoon_putbackpage() to + * access pertinent elements from struct virtio_balloon + */ What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. + /* Init the ballooned page-mapping special balloon_mapping */ + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL); + if (!balloon_mapping) { + err = -ENOMEM; + goto out_free_vb; + } Can balloon_mapping be dereferenced at this point? Then what happens? Since there's no balloon page enqueued for this balloon driver yet, there's no chance of balloon_mapping being dereferenced at this point. + + INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN); + INIT_LIST_HEAD(balloon_mapping-i_mmap_nonlinear); + spin_lock_init(balloon_mapping-tree_lock); + balloon_mapping-a_ops = virtio_balloon_aops; err = init_vqs(vb); if (err) @@ -373,6 +493,7 @@ out_del_vqs: vdev-config-del_vqs(vdev); out_free_vb: kfree(vb); + kfree(balloon_mapping); No need to set it to NULL? It seems if someone else allocates a mapping and gets this chunk of memory by chance, the logic in mm will get confused. Good point. It surely doesn't hurt be asured of this sort of safety. out: return err; } @@
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. Rik, Mel ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. Ok, but right now this driver only takes care of 1 balloon per guest, so how could this approach crash it? Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote: What I think you should do is use rcu for access. And here sync rcu before freeing. Maybe an overkill but at least a documented synchronization primitive, and it is very light weight. I liked your suggestion on barriers, as well. Rik, Mel ? Further instead of simple assignment I would add an api in mm to set/clear the balloon mapping, with proper locking. This could fail if already set, and thus fix crash with many ballons. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. Ok, but right now this driver only takes care of 1 balloon per guest, It does? Are you sure? There is no global state as far as I can see. So I can create 2 devices and driver will happily create two instances, each one can be inflated/deflated independently. so how could this approach crash it? Add device. inflate. Add another device. inflate. deflate. unplug. Now you have pointer to freed memory and when mm touches page from first device, you ge use after free. Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. Yes it looks like a dealbreaker to me. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. Ok, but right now this driver only takes care of 1 balloon per guest, It does? Are you sure? There is no global state as far as I can see. So I can create 2 devices and driver will happily create two instances, each one can be inflated/deflated independently. so how could this approach crash it? Add device. inflate. Add another device. inflate. deflate. unplug. Now you have pointer to freed memory and when mm touches page from first device, you ge use after free. Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. Yes it looks like a dealbreaker to me. To clarify, the global state that this patch adds, is ugly even if we didn't support multiple balloons yet. So I don't think I can accept such a patch. Rusty has a final word here, maybe he thinks differently. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote: To clarify, the global state that this patch adds, is ugly even if we didn't support multiple balloons yet. So I don't think I can accept such a patch. Rusty has a final word here, maybe he thinks differently. Before deciding that does not support multiple balloon drivers at once is an issue, is there any use case at all for having multiple balloon drivers active at a time? I do not see any. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. Ok, but right now this driver only takes care of 1 balloon per guest, It does? Are you sure? There is no global state as far as I can see. So I can create 2 devices and driver will happily create two instances, each one can be inflated/deflated independently. so how could this approach crash it? Add device. inflate. Add another device. inflate. deflate. unplug. Now you have pointer to freed memory and when mm touches page from first device, you ge use after free. Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. Yes it looks like a dealbreaker to me. Sorry. You're right, I'm wrong. I'll get back to the scracthpad to overcome this constraint. I believe the way this patch was at its v4 revision (wrt this particular case) could possibly address this concern of yours. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 04:56:59PM -0400, Rik van Riel wrote: On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote: To clarify, the global state that this patch adds, is ugly even if we didn't support multiple balloons yet. So I don't think I can accept such a patch. Rusty has a final word here, maybe he thinks differently. Before deciding that does not support multiple balloon drivers at once is an issue, is there any use case at all for having multiple balloon drivers active at a time? I do not see any. For example, we had a proposal for a page-cache backed device. So it could be useful to have two, a regular balloon and a pagecache backed one. There could be other uses - it certainly looks like it works so how can you be sure it's useless? And even ignoring that, global pointer to a device is an ugly hack and ugly hacks tend to explode. And even ignoring estetics, and if we decide we are fine with a single balloon, it needs to fail gracefully not crash like it does now. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, Aug 14, 2012 at 06:34:13PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote: On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote: What if there is more than one balloon device? Is it possible to load this driver twice, or are you foreseeing a future case where this driver will be able to manage several distinct memory balloons for the same guest? Second. It is easy to create several balloons they are just pci devices. and it might not be too important to make it work but at least would be nice not to have a crash in this setup. Fair enough. For now, as I believe it's safe to assume we are only inflating one balloon per guest, I'd like to propose this as a future enhancement. Sounds good? Since guest crashes when it's not the case, no it doesn't, sorry :(. Ok, but right now this driver only takes care of 1 balloon per guest, It does? Are you sure? There is no global state as far as I can see. So I can create 2 devices and driver will happily create two instances, each one can be inflated/deflated independently. so how could this approach crash it? Add device. inflate. Add another device. inflate. deflate. unplug. Now you have pointer to freed memory and when mm touches page from first device, you ge use after free. Your point is a good thing to be on a to-do list for future enhancements, but it's not a dealbreaker for the present balloon driver implementation, IMHO. Yes it looks like a dealbreaker to me. Sorry. You're right, I'm wrong. I'll get back to the scracthpad to overcome this constraint. I believe the way this patch was at its v4 revision (wrt this particular case) could possibly address this concern of yours. Almost. We still have a global balloon_mapping. The only reason for it to exist seems solely to detect balloon mappings, so it can just be replaced by a flag in the mapping, or in mapping ops, or elsewhere. Also, please add APIs to mm so we can avoid doing internal mm stuff like INIT_RADIX_TREE(balloon_mapping-page_tree, GFP_ATOMIC | __GFP_NOWARN); in the driver. It should be alloc_address_mapping(virtio_balloon_aops); free_address_mapping Make page-mapping use rcu, and sync rcu in free_address_mapping. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Tue, 14 Aug 2012 11:33:20 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote: On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: +/* + * Populate balloon_mapping-a_ops-freepage method to help compaction on + * re-inserting an isolated page into the balloon page list. + */ +void virtballoon_putbackpage(struct page *page) +{ + spin_lock(pages_lock); + list_add(page-lru, vb_ptr-pages); + spin_unlock(pages_lock); Could the following race trigger: migration happens while module unloading is in progress, module goes away between here and when the function returns, then code for this function gets overwritten? If yes we need locking external to module to prevent this. Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb-num_pages) leak_balloon(vb, vb-num_pages); Cheers, Rusty. I know I meant something else. Let me lay this out: CPU1 executes: void virtballoon_putbackpage(struct page *page) { spin_lock(pages_lock); list_add(page-lru, vb_ptr-pages); spin_unlock(pages_lock); at this point CPU2 unloads module: leak_balloon .. next CPU2 loads another module so code memory gets overwritten now CPU1 executes the next instruction: } which would normally return to function's caller, but it has been overwritten by CPU2 so we get corruption. Actually, I have no idea. Where does virtballoon_putbackpage get called from? It's some weird mm thing, and I stay out of that mess. The vb thread is stopped before we spin checking vb-num_pages, so it's not touching pages; who would be calling this? Confused, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" wrote: > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > > +/* > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on > > + * re-inserting an isolated page into the balloon page list. > > + */ > > +void virtballoon_putbackpage(struct page *page) > > +{ > > + spin_lock(_lock); > > + list_add(>lru, _ptr->pages); > > + spin_unlock(_lock); > > Could the following race trigger: > migration happens while module unloading is in progress, > module goes away between here and when the function > returns, then code for this function gets overwritten? > If yes we need locking external to module to prevent this. > Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb->num_pages) leak_balloon(vb, vb->num_pages); Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: > Memory fragmentation introduced by ballooning might reduce significantly > the number of 2MB contiguous memory blocks that can be used within a guest, > thus imposing performance penalties associated with the reduced number of > transparent huge pages that could be used by the guest workload. > > Besides making balloon pages movable at allocation time and introducing > the necessary primitives to perform balloon page migration/compaction, > this patch also introduces the following locking scheme to provide the > proper synchronization and protection for struct virtio_balloon elements > against concurrent accesses due to parallel operations introduced by > memory compaction / page migration. > - balloon_lock (mutex) : synchronizes the access demand to elements of > struct virtio_balloon and its queue operations; > - pages_lock (spinlock): special protection to balloon pages list against > concurrent list handling operations; > > Signed-off-by: Rafael Aquini > --- > drivers/virtio/virtio_balloon.c | 138 > +--- > include/linux/virtio_balloon.h | 4 ++ > 2 files changed, 134 insertions(+), 8 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 0908e60..7c937a0 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > /* > * Balloon device works in 4K page units. So each page is pointed to by > @@ -35,6 +36,12 @@ > */ > #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) > > +/* Synchronizes accesses/updates to the struct virtio_balloon elements */ > +DEFINE_MUTEX(balloon_lock); > + > +/* Protects 'virtio_balloon->pages' list against concurrent handling */ > +DEFINE_SPINLOCK(pages_lock); > + > struct virtio_balloon > { > struct virtio_device *vdev; > @@ -51,6 +58,7 @@ struct virtio_balloon > > /* Number of balloon pages we've told the Host we're not using. */ > unsigned int num_pages; > + > /* >* The pages we've told the Host we're not using. >* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE > @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, > size_t num) > /* We can only do one array worth at a time. */ > num = min(num, ARRAY_SIZE(vb->pfns)); > > + mutex_lock(_lock); > for (vb->num_pfns = 0; vb->num_pfns < num; >vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | > - __GFP_NOMEMALLOC | __GFP_NOWARN); > + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE | > + __GFP_NORETRY | __GFP_NOWARN | > + __GFP_NOMEMALLOC); > if (!page) { > if (printk_ratelimit()) > dev_printk(KERN_INFO, >vdev->dev, > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, > size_t num) > set_page_pfns(vb->pfns + vb->num_pfns, page); > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > totalram_pages--; > + spin_lock(_lock); > list_add(>lru, >pages); If list_add above is reordered with mapping assignment below, then nothing bad happens because balloon_mapping takes pages_lock. > + page->mapping = balloon_mapping; > + spin_unlock(_lock); > } > > /* Didn't get any? Oh well. */ > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, > size_t num) > return; > > tell_host(vb, vb->inflate_vq); > + mutex_unlock(_lock); > } > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num) > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, > size_t num) > /* We can only do one array worth at a time. */ > num = min(num, ARRAY_SIZE(vb->pfns)); > > + mutex_lock(_lock); > for (vb->num_pfns = 0; vb->num_pfns < num; >vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > + /* > + * We can race against virtballoon_isolatepage() and end up > + * stumbling across a _temporarily_ empty 'pages' list. > + */ > + spin_lock(_lock); > + if (unlikely(list_empty(>pages))) { > + spin_unlock(_lock); > + break; > + } > page = list_first_entry(>pages, struct page, lru); > + page->mapping = NULL; Unlike the case above, here if = NULL write above is reordered with list_del below, then isolate_page can run on a page that is not on lru. So I think
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme to provide the proper synchronization and protection for struct virtio_balloon elements against concurrent accesses due to parallel operations introduced by memory compaction / page migration. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; - pages_lock (spinlock): special protection to balloon pages list against concurrent list handling operations; Signed-off-by: Rafael Aquini aqu...@redhat.com --- drivers/virtio/virtio_balloon.c | 138 +--- include/linux/virtio_balloon.h | 4 ++ 2 files changed, 134 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0908e60..7c937a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -27,6 +27,7 @@ #include linux/delay.h #include linux/slab.h #include linux/module.h +#include linux/fs.h /* * Balloon device works in 4K page units. So each page is pointed to by @@ -35,6 +36,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) +/* Synchronizes accesses/updates to the struct virtio_balloon elements */ +DEFINE_MUTEX(balloon_lock); + +/* Protects 'virtio_balloon-pages' list against concurrent handling */ +DEFINE_SPINLOCK(pages_lock); + struct virtio_balloon { struct virtio_device *vdev; @@ -51,6 +58,7 @@ struct virtio_balloon /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* * The pages we've told the Host we're not using. * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | - __GFP_NOMEMALLOC | __GFP_NOWARN); + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE | + __GFP_NORETRY | __GFP_NOWARN | + __GFP_NOMEMALLOC); if (!page) { if (printk_ratelimit()) dev_printk(KERN_INFO, vb-vdev-dev, @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(pages_lock); list_add(page-lru, vb-pages); If list_add above is reordered with mapping assignment below, then nothing bad happens because balloon_mapping takes pages_lock. + page-mapping = balloon_mapping; + spin_unlock(pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* + * We can race against virtballoon_isolatepage() and end up + * stumbling across a _temporarily_ empty 'pages' list. + */ + spin_lock(pages_lock); + if (unlikely(list_empty(vb-pages))) { + spin_unlock(pages_lock); + break; + } page = list_first_entry(vb-pages, struct page, lru); + page-mapping = NULL; Unlike the case above, here if = NULL write above is reordered with list_del below, then isolate_page can run on a page that is not on lru. So I think this needs a
Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
On Mon, 13 Aug 2012 11:41:23 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote: +/* + * Populate balloon_mapping-a_ops-freepage method to help compaction on + * re-inserting an isolated page into the balloon page list. + */ +void virtballoon_putbackpage(struct page *page) +{ + spin_lock(pages_lock); + list_add(page-lru, vb_ptr-pages); + spin_unlock(pages_lock); Could the following race trigger: migration happens while module unloading is in progress, module goes away between here and when the function returns, then code for this function gets overwritten? If yes we need locking external to module to prevent this. Maybe add a spinlock to struct address_space? The balloon module cannot be unloaded until it has leaked all its pages, so I think this is safe: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ while (vb-num_pages) leak_balloon(vb, vb-num_pages); Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme to provide the proper synchronization and protection for struct virtio_balloon elements against concurrent accesses due to parallel operations introduced by memory compaction / page migration. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; - pages_lock (spinlock): special protection to balloon pages list against concurrent list handling operations; Signed-off-by: Rafael Aquini --- drivers/virtio/virtio_balloon.c | 138 +--- include/linux/virtio_balloon.h | 4 ++ 2 files changed, 134 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0908e60..7c937a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -27,6 +27,7 @@ #include #include #include +#include /* * Balloon device works in 4K page units. So each page is pointed to by @@ -35,6 +36,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) +/* Synchronizes accesses/updates to the struct virtio_balloon elements */ +DEFINE_MUTEX(balloon_lock); + +/* Protects 'virtio_balloon->pages' list against concurrent handling */ +DEFINE_SPINLOCK(pages_lock); + struct virtio_balloon { struct virtio_device *vdev; @@ -51,6 +58,7 @@ struct virtio_balloon /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* * The pages we've told the Host we're not using. * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); + mutex_lock(_lock); for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | - __GFP_NOMEMALLOC | __GFP_NOWARN); + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE | + __GFP_NORETRY | __GFP_NOWARN | + __GFP_NOMEMALLOC); if (!page) { if (printk_ratelimit()) dev_printk(KERN_INFO, >vdev->dev, @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb->pfns + vb->num_pfns, page); vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(_lock); list_add(>lru, >pages); + page->mapping = balloon_mapping; + spin_unlock(_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb->inflate_vq); + mutex_unlock(_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); + mutex_lock(_lock); for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* +* We can race against virtballoon_isolatepage() and end up +* stumbling across a _temporarily_ empty 'pages' list. +*/ + spin_lock(_lock); + if (unlikely(list_empty(>pages))) { + spin_unlock(_lock); + break; + } page = list_first_entry(>pages, struct page, lru); + page->mapping = NULL; list_del(>lru); + spin_unlock(_lock); set_page_pfns(vb->pfns + vb->num_pfns, page); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb->deflate_vq); -
[PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. Besides making balloon pages movable at allocation time and introducing the necessary primitives to perform balloon page migration/compaction, this patch also introduces the following locking scheme to provide the proper synchronization and protection for struct virtio_balloon elements against concurrent accesses due to parallel operations introduced by memory compaction / page migration. - balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; - pages_lock (spinlock): special protection to balloon pages list against concurrent list handling operations; Signed-off-by: Rafael Aquini aqu...@redhat.com --- drivers/virtio/virtio_balloon.c | 138 +--- include/linux/virtio_balloon.h | 4 ++ 2 files changed, 134 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0908e60..7c937a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -27,6 +27,7 @@ #include linux/delay.h #include linux/slab.h #include linux/module.h +#include linux/fs.h /* * Balloon device works in 4K page units. So each page is pointed to by @@ -35,6 +36,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) +/* Synchronizes accesses/updates to the struct virtio_balloon elements */ +DEFINE_MUTEX(balloon_lock); + +/* Protects 'virtio_balloon-pages' list against concurrent handling */ +DEFINE_SPINLOCK(pages_lock); + struct virtio_balloon { struct virtio_device *vdev; @@ -51,6 +58,7 @@ struct virtio_balloon /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* * The pages we've told the Host we're not using. * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | - __GFP_NOMEMALLOC | __GFP_NOWARN); + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE | + __GFP_NORETRY | __GFP_NOWARN | + __GFP_NOMEMALLOC); if (!page) { if (printk_ratelimit()) dev_printk(KERN_INFO, vb-vdev-dev, @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; totalram_pages--; + spin_lock(pages_lock); list_add(page-lru, vb-pages); + page-mapping = balloon_mapping; + spin_unlock(pages_lock); } /* Didn't get any? Oh well. */ @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) return; tell_host(vb, vb-inflate_vq); + mutex_unlock(balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); + mutex_lock(balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + /* +* We can race against virtballoon_isolatepage() and end up +* stumbling across a _temporarily_ empty 'pages' list. +*/ + spin_lock(pages_lock); + if (unlikely(list_empty(vb-pages))) { + spin_unlock(pages_lock); + break; + } page = list_first_entry(vb-pages, struct page, lru); + page-mapping = NULL; list_del(page-lru); + spin_unlock(pages_lock); set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);