Re: Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM)

2017-12-03 Thread Michael Ellerman
"Michael S. Tsirkin"  writes:
> On Fri, Dec 01, 2017 at 11:31:08PM +1100, Michael Ellerman wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > fill_balloon doing memory allocations under balloon_lock
>> > can cause a deadlock when leak_balloon is called from
>> > virtballoon_oom_notify and tries to take same lock.
...
>> 
>> 
>> Somehow this commit seems to be killing powerpc guests.
...
>
> Thanks for the report!
> A fix was just posted:
> virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
>
> Would appreciate testing.

Yep that fixes it. Thanks.

Tested-by: Michael Ellerman 

cheers
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM)

2017-12-01 Thread Michael S. Tsirkin
On Fri, Dec 01, 2017 at 11:31:08PM +1100, Michael Ellerman wrote:
> "Michael S. Tsirkin"  writes:
> 
> > fill_balloon doing memory allocations under balloon_lock
> > can cause a deadlock when leak_balloon is called from
> > virtballoon_oom_notify and tries to take same lock.
> >
> > To fix, split page allocation and enqueue and do allocations outside the 
> > lock.
> >
> > Here's a detailed analysis of the deadlock by Tetsuo Handa:
> >
> > In leak_balloon(), mutex_lock(>balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > is specified, this allocation attempt might indirectly depend on somebody
> > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > will cause OOM lockup.
> >
> >   Thread1   Thread2
> > fill_balloon()
> >   takes a balloon_lock
> >   balloon_page_enqueue()
> > alloc_page(GFP_HIGHUSER_MOVABLE)
> >   direct reclaim (__GFP_FS context)   takes a fs lock
> > waits for that fs lock  alloc_page(GFP_NOFS)
> >   
> > __alloc_pages_may_oom()
> > takes the oom_lock
> > out_of_memory()
> >   
> > blocking_notifier_call_chain()
> > leak_balloon()
> >   tries to take 
> > that balloon_lock and deadlocks
> >
> > Reported-by: Tetsuo Handa 
> > Cc: Michal Hocko 
> > Cc: Wei Wang 
> > Signed-off-by: Michael S. Tsirkin 
> >
> >  drivers/virtio/virtio_balloon.c| 24 +++-
> >  include/linux/balloon_compaction.h | 35 ++-
> >  mm/balloon_compaction.c| 28 +---
> >  3 files changed, 74 insertions(+), 13 deletions(-)
> 
> 
> Somehow this commit seems to be killing powerpc guests.
> 
> The symptom is that the first page (64K) of the guests memory gets over
> written with zeroes, which is where our interrupt handlers are, so the
> system rapidly locks up due to illegal instructions in the illegal
> instruction handler.
> 
> There seems to be some element of a race, because it doesn't always
> crash. Sometimes I can boot to a shell, but not often. When it does
> happen it's fairly late in boot, but before I get to a shell.
> 
> I had a few bisects go off into the weeds due to the intermittent
> nature. But once I realised that I changed my script to boot 5 times
> before declaring a kernel good, and that bisected straight here.
> 
> I can also revert this commit on v4.15-rc1 and everything's fine again -
> I got through ~250 boots with that kernel.
> 
> So I'm pretty sure this commit is triggering/exposing/causing the bug.
> 
> The other data point is that the page that's overwritten is mapped read
> only in the guest kernel. So either the guest kernel is writing to it in
> real mode (MMU off), or the hypervisor/DMA is doing it.
> 
> I haven't isolated if it's host kernel/qemu version dependent, at the
> moment I'm just using distro packaged versions of both.
> 
> Anyway I'll try and dig further into it on Monday, but I thought I'd let
> you know in case this is a known bug with a fix in the pipeline, or
> rings any bells or whatever.
> 
> cheers

Thanks for the report!
A fix was just posted:
virtio_balloon: fix increment of vb->num_pfns in fill_balloon()

Would appreciate testing.

> 
> > diff --git a/drivers/virtio/virtio_balloon.c 
> > b/drivers/virtio/virtio_balloon.c
> > index f0b3a0b..7960746 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_balloon *vb,
> >  
> >  static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> >  {
> > -   struct balloon_dev_info *vb_dev_info = >vb_dev_info;
> > unsigned num_allocated_pages;
> > +   unsigned num_pfns;
> > +   struct page *page;
> > +   LIST_HEAD(pages);
> >  
> > /* We can only do one array worth at a time. */
> > num = min(num, ARRAY_SIZE(vb->pfns));
> >  
> > -   mutex_lock(>balloon_lock);
> > -   for (vb->num_pfns = 0; 

Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM)

2017-12-01 Thread Michael Ellerman
"Michael S. Tsirkin"  writes:

> fill_balloon doing memory allocations under balloon_lock
> can cause a deadlock when leak_balloon is called from
> virtballoon_oom_notify and tries to take same lock.
>
> To fix, split page allocation and enqueue and do allocations outside the lock.
>
> Here's a detailed analysis of the deadlock by Tetsuo Handa:
>
> In leak_balloon(), mutex_lock(>balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup.
>
>   Thread1   Thread2
> fill_balloon()
>   takes a balloon_lock
>   balloon_page_enqueue()
> alloc_page(GFP_HIGHUSER_MOVABLE)
>   direct reclaim (__GFP_FS context)   takes a fs lock
> waits for that fs lock  alloc_page(GFP_NOFS)
>   __alloc_pages_may_oom()
> takes the oom_lock
> out_of_memory()
>   
> blocking_notifier_call_chain()
> leak_balloon()
>   tries to take 
> that balloon_lock and deadlocks
>
> Reported-by: Tetsuo Handa 
> Cc: Michal Hocko 
> Cc: Wei Wang 
> Signed-off-by: Michael S. Tsirkin 
>
>  drivers/virtio/virtio_balloon.c| 24 +++-
>  include/linux/balloon_compaction.h | 35 ++-
>  mm/balloon_compaction.c| 28 +---
>  3 files changed, 74 insertions(+), 13 deletions(-)


Somehow this commit seems to be killing powerpc guests.

The symptom is that the first page (64K) of the guests memory gets over
written with zeroes, which is where our interrupt handlers are, so the
system rapidly locks up due to illegal instructions in the illegal
instruction handler.

There seems to be some element of a race, because it doesn't always
crash. Sometimes I can boot to a shell, but not often. When it does
happen it's fairly late in boot, but before I get to a shell.

I had a few bisects go off into the weeds due to the intermittent
nature. But once I realised that I changed my script to boot 5 times
before declaring a kernel good, and that bisected straight here.

I can also revert this commit on v4.15-rc1 and everything's fine again -
I got through ~250 boots with that kernel.

So I'm pretty sure this commit is triggering/exposing/causing the bug.

The other data point is that the page that's overwritten is mapped read
only in the guest kernel. So either the guest kernel is writing to it in
real mode (MMU off), or the hypervisor/DMA is doing it.

I haven't isolated if it's host kernel/qemu version dependent, at the
moment I'm just using distro packaged versions of both.

Anyway I'll try and dig further into it on Monday, but I thought I'd let
you know in case this is a known bug with a fix in the pipeline, or
rings any bells or whatever.

cheers


> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..7960746 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  
>  static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  {
> - struct balloon_dev_info *vb_dev_info = >vb_dev_info;
>   unsigned num_allocated_pages;
> + unsigned num_pfns;
> + struct page *page;
> + LIST_HEAD(pages);
>  
>   /* We can only do one array worth at a time. */
>   num = min(num, ARRAY_SIZE(vb->pfns));
>  
> - mutex_lock(>balloon_lock);
> - for (vb->num_pfns = 0; vb->num_pfns < num;
> -  vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - struct page *page = balloon_page_enqueue(vb_dev_info);
> + for (num_pfns = 0; num_pfns < num;
> +  num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> + struct page *page = balloon_page_alloc();
>  
>   if (!page) {
>   dev_info_ratelimited(>vdev->dev,
> @@