Re: [Qemu-devel] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug
On Fri, Aug 25, 2017 at 06:11:18PM -0300, Daniel Henrique Barboza wrote: > v2: > - rebased with ppc-for-2.11 > - function 'spapr_cas_completed' dropped > - function 'spapr_drc_needed' made public and it's now used inside > 'spapr_hotplugged_dev_before_cas' > - 'spapr_drc_needed' was changed to support the migration of logical > DRCs with devs attached in UNUSED state > - new function: 'spapr_clear_pending_events'. This function is used > inside ppc_spapr_reset to reset the pending_events QTAILQ Thanks for the followup, unfortunately there is still an important bug left, see comments on the patch itself. At a higher level, though, looking at the event reset code made me think of a possible even simpler solution to this problem. The queue of events (both hotplug and epow) is already in a simple internal form that's independent of the two delivery mechanisms. The only difference is what event source triggers the interrupt. This explains why an extra hotplug event after the CAS "unstuck" the queue. AFAICT, a spurious interrupts here should be harmless - the kernel will just check the queue and find nothing there. So, it should be sufficient to, after CAS, pulse the hotplug queue interrupt if the hotplug queue is negotiated. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug
On Fri, Aug 25, 2017 at 06:11:19PM -0300, Daniel Henrique Barboza wrote: > This patch is a follow up on the discussions made in patch > "hw/ppc: disable hotplug before CAS is completed" that can be > found at [1]. > > At this moment, we do not support CPU/memory hotplug in early > boot stages, before CAS. When a hotplug occurs, the event is logged > in an internal RTAS event log queue and an IRQ pulse is fired. In > regular conditions, the guest handles the interrupt by executing > check_exception, fetching the generated hotplug event and enabling > the device for use. > > In early boot, this IRQ isn't caught (SLOF does not handle hotplug > events), leaving the event in the rtas event log queue. If the guest > executes check_exception due to another hotplug event, the re-assertion > of the IRQ ends up de-queuing the first hotplug event as well. In short, > a device hotplugged before CAS is considered coldplugged by SLOF. > This leads to device misbehavior and, in some cases, guest kernel > Ooops when trying to unplug the device. > > A proper fix would be to turn every device hotplugged before CAS > as a colplugged device. This is not trivial to do with the current > code base though - the FDT is written in the guest memory at > ppc_spapr_reset and can't be retrieved without adding extra state > (fdt_size for example) that will need to managed and migrated. Adding > the hotplugged DT in the middle of CAS negotiation via the updated DT > tree works with CPU devs, but panics the guest kernel at boot. Additional > analysis would be necessary for LMBs and PCI devices. There are > questions to be made in QEMU/SLOF/kernel level about how we can make > this change in a sustainable way. > > Until we go all the way with the proper fix, this patch works around > the situation by issuing a CAS reset if a hotplugged device is detected > during CAS: > > - the DRC conditions that warrant a CAS reset is the same as those that > triggers a DRC migration - the DRC must have a device attached and > the DRC state is not equal to its ready_state. With that in mind, this > patch makes use of 'spapr_drc_needed' to determine if a CAS reset > is needed. > > - In the middle of CAS negotiations, the function > 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see > if there are any DRC that requires a reset, using spapr_drc_needed. If > that happens, returns '1' in 'spapr_h_cas_compose_response' which will set > spapr->cas_reboot to true, causing the machine to reboot. > > - a small fix was made in 'spapr_drc_needed' to change how we detect > a DRC device. Using dr_entity_sense worked for physical DRCs but, > for logical DRCs, it didn't cover the case where a logical DRC has > a drc->dev but the state is LOGICAL_UNUSABLE (e.g. a hotplugged CPU before > CAS). In this case, the dr_entity_sense of this DRC returns UNUSABLE and > spapr_drc_needed was return 'false' for a scenario what we would like > to migrate the DRC (or issue a CAS reset). Changing it to check for > drc->dev instead works for all DRC types. > > - a new function called 'spapr_clear_pending_events' was created > and is being called inside ppc_spapr_reset. This function clears > the pending_events QTAILQ that holds the RTAS event logs. This prevents > old/deprecated events from persisting after a reset. > > No changes are made for coldplug devices. > > [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html > > Signed-off-by: Daniel Henrique BarbozaSorry I've taken a while to review. > --- > hw/ppc/spapr.c | 34 ++ > hw/ppc/spapr_drc.c | 5 ++--- > include/hw/ppc/spapr_drc.h | 1 + > 3 files changed, 37 insertions(+), 3 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index fb1e5e0..4b23ad3 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -790,6 +790,26 @@ out: > return ret; > } > > +static bool spapr_hotplugged_dev_before_cas(void) > +{ > +Object *drc_container, *obj; > +ObjectProperty *prop; > +ObjectPropertyIterator iter; > + > +drc_container = container_get(object_get_root(), "/dr-connector"); > +object_property_iter_init(, drc_container); > +while ((prop = object_property_iter_next())) { > +if (!strstart(prop->type, "link<", NULL)) { > +continue; > +} > +obj = object_property_get_link(drc_container, prop->name, NULL); > +if (spapr_drc_needed(obj)) { > +return true; > +} > +} > +return false; > +} > + > int spapr_h_cas_compose_response(sPAPRMachineState *spapr, > target_ulong addr, target_ulong size, > sPAPROptionVector *ov5_updates) > @@ -797,6 +817,10 @@ int spapr_h_cas_compose_response(sPAPRMachineState > *spapr, > void *fdt, *fdt_skel; > sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; > > +if
[Qemu-devel] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug
This patch is a follow up on the discussions made in patch "hw/ppc: disable hotplug before CAS is completed" that can be found at [1]. At this moment, we do not support CPU/memory hotplug in early boot stages, before CAS. When a hotplug occurs, the event is logged in an internal RTAS event log queue and an IRQ pulse is fired. In regular conditions, the guest handles the interrupt by executing check_exception, fetching the generated hotplug event and enabling the device for use. In early boot, this IRQ isn't caught (SLOF does not handle hotplug events), leaving the event in the rtas event log queue. If the guest executes check_exception due to another hotplug event, the re-assertion of the IRQ ends up de-queuing the first hotplug event as well. In short, a device hotplugged before CAS is considered coldplugged by SLOF. This leads to device misbehavior and, in some cases, guest kernel Ooops when trying to unplug the device. A proper fix would be to turn every device hotplugged before CAS as a colplugged device. This is not trivial to do with the current code base though - the FDT is written in the guest memory at ppc_spapr_reset and can't be retrieved without adding extra state (fdt_size for example) that will need to managed and migrated. Adding the hotplugged DT in the middle of CAS negotiation via the updated DT tree works with CPU devs, but panics the guest kernel at boot. Additional analysis would be necessary for LMBs and PCI devices. There are questions to be made in QEMU/SLOF/kernel level about how we can make this change in a sustainable way. Until we go all the way with the proper fix, this patch works around the situation by issuing a CAS reset if a hotplugged device is detected during CAS: - the DRC conditions that warrant a CAS reset is the same as those that triggers a DRC migration - the DRC must have a device attached and the DRC state is not equal to its ready_state. With that in mind, this patch makes use of 'spapr_drc_needed' to determine if a CAS reset is needed. - In the middle of CAS negotiations, the function 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see if there are any DRC that requires a reset, using spapr_drc_needed. If that happens, returns '1' in 'spapr_h_cas_compose_response' which will set spapr->cas_reboot to true, causing the machine to reboot. - a small fix was made in 'spapr_drc_needed' to change how we detect a DRC device. Using dr_entity_sense worked for physical DRCs but, for logical DRCs, it didn't cover the case where a logical DRC has a drc->dev but the state is LOGICAL_UNUSABLE (e.g. a hotplugged CPU before CAS). In this case, the dr_entity_sense of this DRC returns UNUSABLE and spapr_drc_needed was return 'false' for a scenario what we would like to migrate the DRC (or issue a CAS reset). Changing it to check for drc->dev instead works for all DRC types. - a new function called 'spapr_clear_pending_events' was created and is being called inside ppc_spapr_reset. This function clears the pending_events QTAILQ that holds the RTAS event logs. This prevents old/deprecated events from persisting after a reset. No changes are made for coldplug devices. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html Signed-off-by: Daniel Henrique Barboza--- hw/ppc/spapr.c | 34 ++ hw/ppc/spapr_drc.c | 5 ++--- include/hw/ppc/spapr_drc.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index fb1e5e0..4b23ad3 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -790,6 +790,26 @@ out: return ret; } +static bool spapr_hotplugged_dev_before_cas(void) +{ +Object *drc_container, *obj; +ObjectProperty *prop; +ObjectPropertyIterator iter; + +drc_container = container_get(object_get_root(), "/dr-connector"); +object_property_iter_init(, drc_container); +while ((prop = object_property_iter_next())) { +if (!strstart(prop->type, "link<", NULL)) { +continue; +} +obj = object_property_get_link(drc_container, prop->name, NULL); +if (spapr_drc_needed(obj)) { +return true; +} +} +return false; +} + int spapr_h_cas_compose_response(sPAPRMachineState *spapr, target_ulong addr, target_ulong size, sPAPROptionVector *ov5_updates) @@ -797,6 +817,10 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr, void *fdt, *fdt_skel; sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; +if (spapr_hotplugged_dev_before_cas()) { +return 1; +} + size -= sizeof(hdr); /* Create sceleton */ @@ -1369,6 +1393,15 @@ static void find_unknown_sysbus_device(SysBusDevice *sbdev, void *opaque) } } +static void spapr_clear_pending_events(sPAPRMachineState *spapr) +{ +sPAPREventLogEntry *entry = NULL; + +
[Qemu-devel] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug
v2: - rebased with ppc-for-2.11 - function 'spapr_cas_completed' dropped - function 'spapr_drc_needed' made public and it's now used inside 'spapr_hotplugged_dev_before_cas' - 'spapr_drc_needed' was changed to support the migration of logical DRCs with devs attached in UNUSED state - new function: 'spapr_clear_pending_events'. This function is used inside ppc_spapr_reset to reset the pending_events QTAILQ Daniel Henrique Barboza (1): hw/ppc: CAS reset on early device hotplug hw/ppc/spapr.c | 34 ++ hw/ppc/spapr_drc.c | 5 ++--- include/hw/ppc/spapr_drc.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) -- 2.9.4