Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

Greg Kurz Wed, 24 May 2017 08:57:03 -0700

On Wed, 24 May 2017 12:14:02 +0200
Igor Mammedov <imamm...@redhat.com> wrote:


> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz <gr...@kaod.org> wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson <da...@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier <lviv...@redhat.com>      
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.  
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?
> 

Yes and that's what actually happens with cpus.

But catching up with the background for this series, I have the
impression that the issue isn't the fact we loose an event if the OS
isn't started (which is not true), but more something wrong happening
when hotplugging+unplugging memory as described in this commit:

commit fe6824d12642b005c69123ecf8631f9b13553f8b
Author: Laurent Vivier <lviv...@redhat.com>
Date:   Tue Mar 28 14:09:34 2017 +0200

    spapr: fix memory hot-unplugging

> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> >   
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > >     
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a 
> > previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert 
> > you're
> > looking for :)  
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>    hotplug notification (SCI interrupt) stays pending and once guest
>    is up it will/should handle it and online CPU
>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt 
> equivalent)
>    with all present cpus (including hotplugged one before crash) and online
>    hotplugged cpu along with coldplugged ones
>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>    stays unpluggable until guest picks it somehow (reboot, manually running 
> cpus scan
>    method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should 
> be
> fixed on guest side and not in QEMU.
> 
>

pgpLOqndisquu.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

Reply via email to