On Mon, Apr 12, 2021 at 04:27:43PM -0300, Daniel Henrique Barboza wrote: > > > On 3/31/21 11:37 PM, David Gibson wrote: > > On Wed, Mar 31, 2021 at 09:04:37PM -0300, Daniel Henrique Barboza wrote: > > > Commit 47c8c915b162 fixed a problem where multiple spapr_drc_detach() > > > requests were breaking QEMU. The solution was to just spapr_drc_detach() > > > once, and use spapr_drc_unplug_requested() to filter whether we already > > > detached it or not. The commit also tied the hotplug request to the > > > guest in the same condition. > > > > > > Turns out that there is a reliable way for a CPU hotunplug to fail. If a > > > guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 > > > > /sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1, > > > the kernel will refuse it because it's the last online CPU of the > > > system. Given that we're pulsing the IRQ only in the first try, in a > > > failed attempt, all other CPU1 hotunplug attempts will fail, regardless > > > of the online state of CPU1 in the kernel, because we're simply not > > > letting the guest know that we want to hotunplug the device. > > > > > > Let's move spapr_hotplug_req_remove_by_index() back out of the "if > > > (!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple > > > 'device_del' requests to the same CPU core to reach the guest, in case > > > the CPU core didn't fully hotunplugged previously. > > > > > > Signed-off-by: Daniel Henrique Barboza <danielhb...@gmail.com> > > > > I've applied these to ppc-for-6.0, but.. > > > > > --- > > > hw/ppc/spapr.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index 05a765fab4..e4be00b732 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -3777,8 +3777,17 @@ void spapr_core_unplug_request(HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > if (!spapr_drc_unplug_requested(drc)) { > > > spapr_drc_unplug_request(drc); > > > - spapr_hotplug_req_remove_by_index(drc); > > > } > > > + > > > + /* > > > + * spapr_hotplug_req_remove_by_index is left unguarded, out of the > > > + * "!spapr_drc_unplug_requested" check, to allow for multiple IRQ > > > + * pulses removing the same CPU. Otherwise, in an failed hotunplug > > > + * attempt (e.g. the kernel will refuse to remove the last online > > > + * CPU), we will never attempt it again because unplug_requested > > > + * will still be 'true' in that case. > > > + */ > > > + spapr_hotplug_req_remove_by_index(drc); > > > > I think we need similar changes for all the other unplug types (LMB, > > PCI, PHB) - basically retries should always be allowed, and at worst > > be a no-op, rather than generating an error like they do now. > > > For PHBs should be straightforward. Not so sure about PCI because there is > all the PCI function logic around the hotunplug of function 0. > > As for LMBs, we block further attempts because there is no way we can tell > if the hotunplug is being executed but it is taking some time (it is not > uncommon for a DIMM unplug to take 20-30 seconds to complete), versus > an error scenario.
I don't see why that prevents retries. Can't you reissue the index+size unplug request anyway? The code you already have to fail unplugs on a reconfigure should work for both the original request and the retry, shouldn't it? > What we do ATM is check is the pending DIMM unplug > state exists, and if it does, assume that a hotunplug is pending. I have > no idea what would happen if an unplug request for a LMB DRC reaches the > kernel in the middle of an error rollback (when the kernel reconnects all > the LMBs again) and the same DRC that was rolled back is disconnected > again. > > We would need to check not only if the pending dimm unplug state exists, but > also if partially exists. In other words, if there are DRCs of that DIMM that > were unplugged already. That way we can prevent to issue a removal while > the unplug is still running. > > > Thanks, > > > DHB > > > > > > } > > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature