Re: [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs

David Gibson Tue, 16 Feb 2021 17:33:14 -0800

On Thu, Feb 11, 2021 at 07:52:45PM -0300, Daniel Henrique Barboza wrote:
> There is a reliable way to make a CPU hotunplug fail in the pseries
> machine. Hotplug a CPU A, then offline all other CPUs inside the guest
> but A. When trying to hotunplug A the guest kernel will refuse to do
> it, because A is now the last online CPU of the guest. PAPR has no
> 'error callback' in this situation to report back to the platform,
> so the guest kernel will deny the unplug in silent and QEMU will never
> know what happened. The unplug pending state of A will remain until
> the guest is shutdown or rebooted.
> 
> Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
> mitigate the effects of the problem. In [1] we were trying to guess which
> guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU
> layer, avoiding the scenario described above because QEMU is now failing
> in behalf of the guest. This is not robust because the last online CPU of
> the guest can change while we're in the middle of the unplug process, and
> our initial assumptions are now invalid. In [2] we were accepting that our
> unplug process is uncertain and the user should be allowed to spam the IRQ
> hotunplug queue of the guest in case the CPU hotunplug fails.
> 
> This patch presents another alternative, using the timeout infrastructure
> introduced in the previous patch. CPU hotunplugs in the pSeries machine will
> now timeout after 15 seconds. This is a long time for a single CPU unplug
> to occur, regardless of guest load - although the user is *strongly* 
> encouraged
> to *not* hotunplug devices from a guest under high load - and we can be sure
> that something went wrong if it takes longer than that for the guest to 
> release
> the CPU (the same can't be said about memory hotunplug - more on that in the
> next patch).
> 
> Timing out the unplug operation will reset the unplug state of the CPU and
> allow the user to try it again, regardless of the error situation that
> prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations
> for CPU hotunplug errors in pSeries, timing out the operation is an admission
> that we have no control in the process, and must assume the worst case if
> the operation doesn't succeed in a sensible time frame.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> Reported-by: Xujun Ma <x...@redhat.com>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
> Signed-off-by: Daniel Henrique Barboza <danielhb...@gmail.com>


Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c             |  4 ++++
>  hw/ppc/spapr_drc.c         | 17 +++++++++++++++++
>  include/hw/ppc/spapr_drc.h |  3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b066df68cb..ecce8abf14 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3724,6 +3724,10 @@ void spapr_core_unplug_request(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>      if (!spapr_drc_unplug_requested(drc)) {
>          spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
> +    } else {
> +        error_setg(errp, "core-id %d unplug is still pending, %d seconds "
> +                   "timeout remaining",
> +                   cc->core_id, spapr_drc_unplug_timeout_remaining_sec(drc));

Reporting this information is a nice touch.

>      }
>  }
>  
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c88bb524c5..c143bfb6d3 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -398,6 +398,12 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>  
>      drc->unplug_requested = true;
>  
> +    if (drck->unplug_timeout_seconds != 0) {
> +        timer_mod(drc->unplug_timeout_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  drck->unplug_timeout_seconds * 1000);
> +    }
> +
>      if (drc->state != drck->empty_state) {
>          trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
>          return;
> @@ -406,6 +412,16 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>      spapr_drc_release(drc);
>  }
>  
> +int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc)
> +{
> +    if (drc->unplug_requested && timer_pending(drc->unplug_timeout_timer)) {
> +        return 
> (qemu_timeout_ns_to_ms(drc->unplug_timeout_timer->expire_time) -
> +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)) / 1000;

Hmm.  Reaching into the timer's internal fields isn't ideal.  I wonder
if we should add a helper in the timer code for reporting this information.

> +    }
> +
> +    return 0;
> +}
> +
>  bool spapr_drc_reset(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -706,6 +722,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void 
> *data)
>      drck->drc_name_prefix = "CPU ";
>      drck->release = spapr_core_release;
>      drck->dt_populate = spapr_core_dt_populate;
> +    drck->unplug_timeout_seconds = 15;
>  }
>  
>  static void spapr_drc_pci_class_init(ObjectClass *k, void *data)
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index b2e6222d09..26599c385a 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -211,6 +211,8 @@ typedef struct SpaprDrcClass {
>  
>      int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
>                         void *fdt, int *fdt_start_offset, Error **errp);
> +
> +    int unplug_timeout_seconds;
>  } SpaprDrcClass;
>  
>  typedef struct SpaprDrcPhysical {
> @@ -246,6 +248,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, 
> uint32_t drc_type_mask);
>   */
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>  void spapr_drc_unplug_request(SpaprDrc *drc);
> +int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc);
>  
>  /*
>   * Reset all DRCs, causing pending hot-plug/unplug requests to complete.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs

Reply via email to