Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, Jul 16, 2020 at 04:57:54PM +0200, Greg Kurz wrote: > On Thu, 16 Jul 2020 16:23:52 +0200 > Markus Armbruster wrote: > > > David Gibson writes: > > > > > On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote: > > >> On Thu, 16 Jul 2020 14:45:40 +1000 > > >> David Gibson wrote: > > >> > > >> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > >> > > Some recent error handling cleanups unveiled issues with our support > > >> > > of > > >> > > PCI bridges: > > >> > > > > >> > > 1) QEMU aborts when using non-standard PCI bridge types, > > >> > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error > > >> > > handling" > > >> > > > > >> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > >> > > Unexpected error in object_property_find() at qom/object.c:1240: > > >> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' > > >> > > not found > > >> > > Aborted (core dumped) > > >> > > > >> > Oops, I thought we had a check that we actually had a "pci-bridge" > > >> > device before continuing with the hotplug, but I guess not. > > >> > > >> Ah... are you suggesting we should explicitly check the actual type > > >> of the bridge rather than looking for the "chassis_nr" property ? > > > > > > Uh.. I thought about it, but I don't think it matters much which way > > > we do it. > > > > Would it make sense to add the "chassis_nr" property to *all* PCI > > bridge devices? > > > > I see that the "PCI Express to PCI/PCI-X Bridge Specification" mentions > a "Chassis Number Register" which looks very similar to the what exists > in standard PCI-to-PCI brdiges. This doesn't seem to be implemented in > our "pcie-pci-bridge" device model though, but of course I have no idea > why :) We could consider it, but I don't think there's a lot to be gained by it at this stage. I don't think there's really any reason to want to use bridges other than plain "pci-bridge" on the pseries machine. PCI is a bit weird on pseries, since it's explicitly paravirt. Although you can use extended config space, and thereby PCI-E devices on it, the topology really looks pretty much identical to vanilla PCI. So, I don't think there's any reason to use PCI-E bridges on pseries. Other than PCI-E bridges of various sorts, a quick scan suggests all the other bridge types in qemu are weird variants that are mostly specific to some particular platform. I don't see any reason we'd want those on pseries either. > Maybe Michael or Marcel (cc'd) can share some thoughts about that ? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, Jul 16, 2020 at 04:42:00PM +0200, Greg Kurz wrote: > On Thu, 16 Jul 2020 16:01:18 +0200 > Markus Armbruster wrote: > > > David Gibson writes: > > > > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > >> Some recent error handling cleanups unveiled issues with our support of > > >> PCI bridges: > > >> > > >> 1) QEMU aborts when using non-standard PCI bridge types, > > >>unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error > > >> handling" > > >> > > >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > >> Unexpected error in object_property_find() at qom/object.c:1240: > > >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not > > >> found > > >> Aborted (core dumped) > > > > > > Oops, I thought we had a check that we actually had a "pci-bridge" > > > device before continuing with the hotplug, but I guess not. > > > > > >> This happens because we assume all PCI bridge types to have a > > >> "chassis_nr" > > >> property. This property only exists with the standard PCI bridge type > > >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > > >> much simpler to check the presence of "chassis_nr" earlier. > > > > > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > > > can fail. > > > > Right. I failed to see that we can run into a bridge without a > > "chassis_nr" here. And I missed it on review, as well. > > >> 2) QEMU abort if same "chassis_nr" value is used several times, > > >>unveiled by commit d2623129a7de "qom: Drop parameter @errp of > > >>object_property_add() & friends" > > >> > > >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > > >> -device pci-bridge,chassis_nr=1 > > >> Unexpected error in object_property_try_add() at qom/object.c:1167: > > >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add > > >> duplicate property '4100' to object (type 'container') > > >> Aborted (core dumped) > > > > Before d2623129a7de, the error got *ignored* in > > spapr_dr_connector_new(): > > > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > > uint32_t id) > > { > > SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); > > char *prop_name; > > > > drc->id = id; > > drc->owner = owner; > > prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", > > spapr_drc_index(drc)); > > object_property_add_child(owner, prop_name, OBJECT(drc), > > &error_abort); > > object_unref(OBJECT(drc)); > > --->object_property_set_bool(OBJECT(drc), true, "realized", NULL); > > g_free(prop_name); > > > > return drc; > > } > > > > I doubt that's healthy. Indeed. > This isn't. The object_property_set_bool() was later converted to > qdev_realize() (thanks again for the cleanups!) but the problem > remains. Realize can fail and I see now reason we don't do proper > error handling when it comes to the DRCs. > > I'll look into fixing that. > > > >> This happens because we assume that "chassis_nr" values are unique, but > > >> nobody enforces that and we end up generating duplicate DRC ids. The PCI > > >> code doesn't really care for duplicate "chassis_nr" properties since it > > >> is only used to initialize the "Chassis Number Register" of the bridge, > > >> with no functional impact on QEMU. So, even if passing the same value > > >> several times might look weird, it never broke anything before, so > > >> I guess we don't necessarily want to enforce strict checking in the PCI > > >> code now. > > > > > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > > > supposed to be system-unique (well, unique within the PCI domain at > > > least, I guess) as part of the hardware spec. So specifying multiple > > > chassis ids the same is a user error, but we need a better failure > > > mode. > > > > > >> Workaround both issues in the PAPR code: check that the bridge has a > > >> unique and non null "chassis_nr" when plugging it into its parent bus. > > >> > > >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > > > > > Arguably, it's really fixing 7ef1553dac. > > > > I agree 7ef1553dac broke the "use a bridge that doesn't have property > > 'chassis_nr' case. > > > > I suspect the "duplicate chassis_nr" case has always been broken, and > > d2623129a7de merely uncovered it. > > Yes. I agree. > > If we can trigger the abort with hot-plug, then d2623129a7de made things > > materially worse (new way to accidentally kill your guest and maybe lose > > data), and I'd add a Fixes: blaming it. > > > > Yes it does. > > David, > > Maybe consider folding a third Fixes: tag into this patch ? Done. > > >> Reported-by: Thomas Huth > > >> Signed-off-by: Greg Kurz > > > > > > I had a few misgivings about the details of this, bu
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, 16 Jul 2020 16:23:52 +0200 Markus Armbruster wrote: > David Gibson writes: > > > On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote: > >> On Thu, 16 Jul 2020 14:45:40 +1000 > >> David Gibson wrote: > >> > >> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > >> > > Some recent error handling cleanups unveiled issues with our support of > >> > > PCI bridges: > >> > > > >> > > 1) QEMU aborts when using non-standard PCI bridge types, > >> > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error > >> > > handling" > >> > > > >> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > >> > > Unexpected error in object_property_find() at qom/object.c:1240: > >> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not > >> > > found > >> > > Aborted (core dumped) > >> > > >> > Oops, I thought we had a check that we actually had a "pci-bridge" > >> > device before continuing with the hotplug, but I guess not. > >> > >> Ah... are you suggesting we should explicitly check the actual type > >> of the bridge rather than looking for the "chassis_nr" property ? > > > > Uh.. I thought about it, but I don't think it matters much which way > > we do it. > > Would it make sense to add the "chassis_nr" property to *all* PCI > bridge devices? > I see that the "PCI Express to PCI/PCI-X Bridge Specification" mentions a "Chassis Number Register" which looks very similar to the what exists in standard PCI-to-PCI brdiges. This doesn't seem to be implemented in our "pcie-pci-bridge" device model though, but of course I have no idea why :) Maybe Michael or Marcel (cc'd) can share some thoughts about that ? > [...] >
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, 16 Jul 2020 16:01:18 +0200 Markus Armbruster wrote: > David Gibson writes: > > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > >> Some recent error handling cleanups unveiled issues with our support of > >> PCI bridges: > >> > >> 1) QEMU aborts when using non-standard PCI bridge types, > >>unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > >> > >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > >> Unexpected error in object_property_find() at qom/object.c:1240: > >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not > >> found > >> Aborted (core dumped) > > > > Oops, I thought we had a check that we actually had a "pci-bridge" > > device before continuing with the hotplug, but I guess not. > > > >> This happens because we assume all PCI bridge types to have a "chassis_nr" > >> property. This property only exists with the standard PCI bridge type > >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > >> much simpler to check the presence of "chassis_nr" earlier. > > > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > > can fail. > > Right. I failed to see that we can run into a bridge without a > "chassis_nr" here. > > >> 2) QEMU abort if same "chassis_nr" value is used several times, > >>unveiled by commit d2623129a7de "qom: Drop parameter @errp of > >>object_property_add() & friends" > >> > >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > >> -device pci-bridge,chassis_nr=1 > >> Unexpected error in object_property_try_add() at qom/object.c:1167: > >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add > >> duplicate property '4100' to object (type 'container') > >> Aborted (core dumped) > > Before d2623129a7de, the error got *ignored* in > spapr_dr_connector_new(): > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > uint32_t id) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); > char *prop_name; > > drc->id = id; > drc->owner = owner; > prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", > spapr_drc_index(drc)); > object_property_add_child(owner, prop_name, OBJECT(drc), > &error_abort); > object_unref(OBJECT(drc)); > --->object_property_set_bool(OBJECT(drc), true, "realized", NULL); > g_free(prop_name); > > return drc; > } > > I doubt that's healthy. > This isn't. The object_property_set_bool() was later converted to qdev_realize() (thanks again for the cleanups!) but the problem remains. Realize can fail and I see now reason we don't do proper error handling when it comes to the DRCs. I'll look into fixing that. > >> This happens because we assume that "chassis_nr" values are unique, but > >> nobody enforces that and we end up generating duplicate DRC ids. The PCI > >> code doesn't really care for duplicate "chassis_nr" properties since it > >> is only used to initialize the "Chassis Number Register" of the bridge, > >> with no functional impact on QEMU. So, even if passing the same value > >> several times might look weird, it never broke anything before, so > >> I guess we don't necessarily want to enforce strict checking in the PCI > >> code now. > > > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > > supposed to be system-unique (well, unique within the PCI domain at > > least, I guess) as part of the hardware spec. So specifying multiple > > chassis ids the same is a user error, but we need a better failure > > mode. > > > >> Workaround both issues in the PAPR code: check that the bridge has a > >> unique and non null "chassis_nr" when plugging it into its parent bus. > >> > >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > > > Arguably, it's really fixing 7ef1553dac. > > I agree 7ef1553dac broke the "use a bridge that doesn't have property > 'chassis_nr' case. > > I suspect the "duplicate chassis_nr" case has always been broken, and > d2623129a7de merely uncovered it. > Yes. > If we can trigger the abort with hot-plug, then d2623129a7de made things > materially worse (new way to accidentally kill your guest and maybe lose > data), and I'd add a Fixes: blaming it. > Yes it does. David, Maybe consider folding a third Fixes: tag into this patch ? > >> Reported-by: Thomas Huth > >> Signed-off-by: Greg Kurz > > > > I had a few misgivings about the details of this, but I think I've > > convinced myself they're fine. There's a couple of things I'd like to > > polish, but I'll do that as a follow up. >
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
David Gibson writes: > On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote: >> On Thu, 16 Jul 2020 14:45:40 +1000 >> David Gibson wrote: >> >> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: >> > > Some recent error handling cleanups unveiled issues with our support of >> > > PCI bridges: >> > > >> > > 1) QEMU aborts when using non-standard PCI bridge types, >> > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error >> > > handling" >> > > >> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge >> > > Unexpected error in object_property_find() at qom/object.c:1240: >> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not >> > > found >> > > Aborted (core dumped) >> > >> > Oops, I thought we had a check that we actually had a "pci-bridge" >> > device before continuing with the hotplug, but I guess not. >> >> Ah... are you suggesting we should explicitly check the actual type >> of the bridge rather than looking for the "chassis_nr" property ? > > Uh.. I thought about it, but I don't think it matters much which way > we do it. Would it make sense to add the "chassis_nr" property to *all* PCI bridge devices? [...]
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
David Gibson writes: > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: >> Some recent error handling cleanups unveiled issues with our support of >> PCI bridges: >> >> 1) QEMU aborts when using non-standard PCI bridge types, >>unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" >> >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge >> Unexpected error in object_property_find() at qom/object.c:1240: >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found >> Aborted (core dumped) > > Oops, I thought we had a check that we actually had a "pci-bridge" > device before continuing with the hotplug, but I guess not. > >> This happens because we assume all PCI bridge types to have a "chassis_nr" >> property. This property only exists with the standard PCI bridge type >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems >> much simpler to check the presence of "chassis_nr" earlier. > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > can fail. Right. I failed to see that we can run into a bridge without a "chassis_nr" here. >> 2) QEMU abort if same "chassis_nr" value is used several times, >>unveiled by commit d2623129a7de "qom: Drop parameter @errp of >>object_property_add() & friends" >> >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ >> -device pci-bridge,chassis_nr=1 >> Unexpected error in object_property_try_add() at qom/object.c:1167: >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate >> property '4100' to object (type 'container') >> Aborted (core dumped) Before d2623129a7de, the error got *ignored* in spapr_dr_connector_new(): SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, uint32_t id) { SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); char *prop_name; drc->id = id; drc->owner = owner; prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", spapr_drc_index(drc)); object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort); object_unref(OBJECT(drc)); --->object_property_set_bool(OBJECT(drc), true, "realized", NULL); g_free(prop_name); return drc; } I doubt that's healthy. >> This happens because we assume that "chassis_nr" values are unique, but >> nobody enforces that and we end up generating duplicate DRC ids. The PCI >> code doesn't really care for duplicate "chassis_nr" properties since it >> is only used to initialize the "Chassis Number Register" of the bridge, >> with no functional impact on QEMU. So, even if passing the same value >> several times might look weird, it never broke anything before, so >> I guess we don't necessarily want to enforce strict checking in the PCI >> code now. > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > supposed to be system-unique (well, unique within the PCI domain at > least, I guess) as part of the hardware spec. So specifying multiple > chassis ids the same is a user error, but we need a better failure > mode. > >> Workaround both issues in the PAPR code: check that the bridge has a >> unique and non null "chassis_nr" when plugging it into its parent bus. >> >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > Arguably, it's really fixing 7ef1553dac. I agree 7ef1553dac broke the "use a bridge that doesn't have property 'chassis_nr' case. I suspect the "duplicate chassis_nr" case has always been broken, and d2623129a7de merely uncovered it. If we can trigger the abort with hot-plug, then d2623129a7de made things materially worse (new way to accidentally kill your guest and maybe lose data), and I'd add a Fixes: blaming it. >> Reported-by: Thomas Huth >> Signed-off-by: Greg Kurz > > I had a few misgivings about the details of this, but I think I've > convinced myself they're fine. There's a couple of things I'd like to > polish, but I'll do that as a follow up.
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote: > On Thu, 16 Jul 2020 14:45:40 +1000 > David Gibson wrote: > > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > > Some recent error handling cleanups unveiled issues with our support of > > > PCI bridges: > > > > > > 1) QEMU aborts when using non-standard PCI bridge types, > > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error > > > handling" > > > > > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > > Unexpected error in object_property_find() at qom/object.c:1240: > > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not > > > found > > > Aborted (core dumped) > > > > Oops, I thought we had a check that we actually had a "pci-bridge" > > device before continuing with the hotplug, but I guess not. > > Ah... are you suggesting we should explicitly check the actual type > of the bridge rather than looking for the "chassis_nr" property ? Uh.. I thought about it, but I don't think it matters much which way we do it. > > > This happens because we assume all PCI bridge types to have a "chassis_nr" > > > property. This property only exists with the standard PCI bridge type > > > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > > > much simpler to check the presence of "chassis_nr" earlier. > > > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > > can fail. > > Yes. > > > > 2) QEMU abort if same "chassis_nr" value is used several times, > > >unveiled by commit d2623129a7de "qom: Drop parameter @errp of > > >object_property_add() & friends" > > > > > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > > > -device pci-bridge,chassis_nr=1 > > > Unexpected error in object_property_try_add() at qom/object.c:1167: > > > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add > > > duplicate property '4100' to object (type 'container') > > > Aborted (core dumped) > > > > > > This happens because we assume that "chassis_nr" values are unique, but > > > nobody enforces that and we end up generating duplicate DRC ids. The PCI > > > code doesn't really care for duplicate "chassis_nr" properties since it > > > is only used to initialize the "Chassis Number Register" of the bridge, > > > with no functional impact on QEMU. So, even if passing the same value > > > several times might look weird, it never broke anything before, so > > > I guess we don't necessarily want to enforce strict checking in the PCI > > > code now. > > > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > > supposed to be system-unique (well, unique within the PCI domain at > > least, I guess) as part of the hardware spec. So specifying multiple > > chassis ids the same is a user error, but we need a better failure > > mode. > > According to section 3.2.6.4 of "PCI-to-PCI Bridge Architecture > Specification", the chassis number is exposed to the OS as a > non-volatile r/w register. Argh. Dammit. I could have sworn I checked that chassis numbers were supposed to be unique (and not guest alterable). That's the whole reason I chose it. > It seems to be expected that chassis > numbers might collide, in which case the system software can > overwrite the register with a new number. So I'm not sure that > specifying the same number multiple times is an actual user error. > > > > Workaround both issues in the PAPR code: check that the bridge has a > > > unique and non null "chassis_nr" when plugging it into its parent bus. > > > > > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > > > Arguably, it's really fixing 7ef1553dac. > > True for issue 1 but not for issue 2, which is the result of > 05929a6c5dfe (switch to "chassis_nr" introduces a condition > to end up with duplicate DRC ids) Well, technically. But the method we had before was *way* more broken than chassis numbers. It was using bus number which is not stable across the VM's lifetime, which is a non-starter. Plus, the bus number too won't be unique, until the guest has enumerated the bus, which is too late for DRC creation. The only reason we got away with it, was that it was basically dead code - at that stage we didn't support hotplug under bridges, so we never actually created DRCs except for the root bus. > and d2623129a7de (assert > when trying to add a duplicated DRC). Well, again, only on a technicality. It might not have immediatley assert()ed, but adding a duplicated DRC was still completely broken before that. > I'm starting to think I should maybe split this in > two patches. One for each issue. At this stage, I'd kind of prefer to merge this fix now, with the intention of doing a pull request tomorrow. AFAICT none of the suggested improvements can't be done as followups. > > > Reported-by: Thomas Huth > > > Signed-off-by: Greg Kurz > > > > I had a few misgivings abou
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, 16 Jul 2020 02:53:07 -0400 "Michael S. Tsirkin" wrote: > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > Some recent error handling cleanups unveiled issues with our support of > > PCI bridges: > > > > 1) QEMU aborts when using non-standard PCI bridge types, > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > > > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > Unexpected error in object_property_find() at qom/object.c:1240: > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found > > Aborted (core dumped) > > > > This happens because we assume all PCI bridge types to have a "chassis_nr" > > property. This property only exists with the standard PCI bridge type > > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > > much simpler to check the presence of "chassis_nr" earlier. > > > > 2) QEMU abort if same "chassis_nr" value is used several times, > >unveiled by commit d2623129a7de "qom: Drop parameter @errp of > >object_property_add() & friends" > > > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > > -device pci-bridge,chassis_nr=1 > > Unexpected error in object_property_try_add() at qom/object.c:1167: > > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add > > duplicate property '4100' to object (type 'container') > > Aborted (core dumped) > > > > This happens because we assume that "chassis_nr" values are unique, but > > nobody enforces that and we end up generating duplicate DRC ids. The PCI > > code doesn't really care for duplicate "chassis_nr" properties since it > > is only used to initialize the "Chassis Number Register" of the bridge, > > with no functional impact on QEMU. So, even if passing the same value > > several times might look weird, it never broke anything before, so > > I guess we don't necessarily want to enforce strict checking in the PCI > > code now. > > > > Workaround both issues in the PAPR code: check that the bridge has a > > unique and non null "chassis_nr" when plugging it into its parent bus. > > > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > Reported-by: Thomas Huth > > Signed-off-by: Greg Kurz > > > > > --- > > hw/ppc/spapr_pci.c | 57 > > > > 1 file changed, 57 insertions(+) > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 329002ac040e..09d52ef7954d 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb, > > add_drcs(phb, bus); > > } > > > > +/* Returns non-zero if the value of "chassis_nr" is already in use */ > > +static int check_chassis_nr(Object *obj, void *opaque) > > +{ > > +int new_chassis_nr = > > +object_property_get_uint(opaque, "chassis_nr", &error_abort); > > +int chassis_nr = > > +object_property_get_uint(obj, "chassis_nr", NULL); > > + > > +if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) { > > +return 0; > > +} > > + > > +/* Skip unsupported bridge types */ > > +if (!chassis_nr) { > > +return 0; > > +} > > + > > +/* Skip self */ > > +if (obj == opaque) { > > +return 0; > > +} > > + > > +return chassis_nr == new_chassis_nr; > > +} > > + > > +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp) > > I would rename this "bridge_has_unique_chassis_nr". > Right. > > +{ > > +int chassis_nr = > > +object_property_get_uint(bridge, "chassis_nr", NULL); > > + > > +/* > > + * slotid_cap_init() already ensures that "chassis_nr" isn't null for > > + * standard PCI bridges, so this really tells if "chassis_nr" is > > present > > + * or not. > > + */ > > +if (!chassis_nr) { > > +error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property"); > > +error_append_hint(errp, "Try -device pci-bridge instead.\n"); > > +return false; > > +} > > + > > +/* We want unique values for "chassis_nr" */ > > +if (object_child_foreach_recursive(object_get_root(), check_chassis_nr, > > + bridge)) { > > +error_setg(errp, "Bridge chassis %d already in use", chassis_nr); > > +return false; > > +} > > + > > +return true; > > +} > > + > > static void spapr_pci_plug(HotplugHandler *plug_handler, > > DeviceState *plugged_dev, Error **errp) > > { > > @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler > > *plug_handler, > > PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev))); > > uint32_t slotnr = PCI_SLOT(pdev->devfn); > > > > +if (pc->is_bridge) { > > +if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) { > > +return; > > +} > > +} > > + > > > Add a comm
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, 16 Jul 2020 14:45:40 +1000 David Gibson wrote: > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > > Some recent error handling cleanups unveiled issues with our support of > > PCI bridges: > > > > 1) QEMU aborts when using non-standard PCI bridge types, > >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > > > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > > Unexpected error in object_property_find() at qom/object.c:1240: > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found > > Aborted (core dumped) > > Oops, I thought we had a check that we actually had a "pci-bridge" > device before continuing with the hotplug, but I guess not. > Ah... are you suggesting we should explicitly check the actual type of the bridge rather than looking for the "chassis_nr" property ? > > This happens because we assume all PCI bridge types to have a "chassis_nr" > > property. This property only exists with the standard PCI bridge type > > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > > much simpler to check the presence of "chassis_nr" earlier. > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really > can fail. > Yes. > > 2) QEMU abort if same "chassis_nr" value is used several times, > >unveiled by commit d2623129a7de "qom: Drop parameter @errp of > >object_property_add() & friends" > > > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > > -device pci-bridge,chassis_nr=1 > > Unexpected error in object_property_try_add() at qom/object.c:1167: > > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add > > duplicate property '4100' to object (type 'container') > > Aborted (core dumped) > > > > This happens because we assume that "chassis_nr" values are unique, but > > nobody enforces that and we end up generating duplicate DRC ids. The PCI > > code doesn't really care for duplicate "chassis_nr" properties since it > > is only used to initialize the "Chassis Number Register" of the bridge, > > with no functional impact on QEMU. So, even if passing the same value > > several times might look weird, it never broke anything before, so > > I guess we don't necessarily want to enforce strict checking in the PCI > > code now. > > Yeah, I guess. I'm pretty sure that the chassis number of bridges is > supposed to be system-unique (well, unique within the PCI domain at > least, I guess) as part of the hardware spec. So specifying multiple > chassis ids the same is a user error, but we need a better failure > mode. > According to section 3.2.6.4 of "PCI-to-PCI Bridge Architecture Specification", the chassis number is exposed to the OS as a non-volatile r/w register. It seems to be expected that chassis numbers might collide, in which case the system software can overwrite the register with a new number. So I'm not sure that specifying the same number multiple times is an actual user error. > > Workaround both issues in the PAPR code: check that the bridge has a > > unique and non null "chassis_nr" when plugging it into its parent bus. > > > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > > Arguably, it's really fixing 7ef1553dac. > True for issue 1 but not for issue 2, which is the result of 05929a6c5dfe (switch to "chassis_nr" introduces a condition to end up with duplicate DRC ids) and d2623129a7de (assert when trying to add a duplicated DRC). I'm starting to think I should maybe split this in two patches. One for each issue. > > Reported-by: Thomas Huth > > Signed-off-by: Greg Kurz > > I had a few misgivings about the details of this, but I think I've > convinced myself they're fine. There's a couple of things I'd like to > polish, but I'll do that as a follow up. > Some fixes for d2623129a7de just got merged. Let me have a look again. > > --- > > hw/ppc/spapr_pci.c | 57 > > > > 1 file changed, 57 insertions(+) > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 329002ac040e..09d52ef7954d 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb, > > add_drcs(phb, bus); > > } > > > > +/* Returns non-zero if the value of "chassis_nr" is already in use */ > > +static int check_chassis_nr(Object *obj, void *opaque) > > +{ > > +int new_chassis_nr = > > +object_property_get_uint(opaque, "chassis_nr", &error_abort); > > +int chassis_nr = > > +object_property_get_uint(obj, "chassis_nr", NULL); > > + > > +if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) { > > +return 0; > > +} > > + > > +/* Skip unsupported bridge types */ > > +if (!chassis_nr) { > > +return 0; > > +} > > + > > +/* Skip self */ > > +if (obj == opaque) { > > +return 0; > > +
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > Some recent error handling cleanups unveiled issues with our support of > PCI bridges: > > 1) QEMU aborts when using non-standard PCI bridge types, >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > Unexpected error in object_property_find() at qom/object.c:1240: > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found > Aborted (core dumped) > > This happens because we assume all PCI bridge types to have a "chassis_nr" > property. This property only exists with the standard PCI bridge type > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > much simpler to check the presence of "chassis_nr" earlier. > > 2) QEMU abort if same "chassis_nr" value is used several times, >unveiled by commit d2623129a7de "qom: Drop parameter @errp of >object_property_add() & friends" > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > -device pci-bridge,chassis_nr=1 > Unexpected error in object_property_try_add() at qom/object.c:1167: > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate > property '4100' to object (type 'container') > Aborted (core dumped) > > This happens because we assume that "chassis_nr" values are unique, but > nobody enforces that and we end up generating duplicate DRC ids. The PCI > code doesn't really care for duplicate "chassis_nr" properties since it > is only used to initialize the "Chassis Number Register" of the bridge, > with no functional impact on QEMU. So, even if passing the same value > several times might look weird, it never broke anything before, so > I guess we don't necessarily want to enforce strict checking in the PCI > code now. > > Workaround both issues in the PAPR code: check that the bridge has a > unique and non null "chassis_nr" when plugging it into its parent bus. > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") > Reported-by: Thomas Huth > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr_pci.c | 57 > > 1 file changed, 57 insertions(+) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 329002ac040e..09d52ef7954d 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb, > add_drcs(phb, bus); > } > > +/* Returns non-zero if the value of "chassis_nr" is already in use */ > +static int check_chassis_nr(Object *obj, void *opaque) > +{ > +int new_chassis_nr = > +object_property_get_uint(opaque, "chassis_nr", &error_abort); > +int chassis_nr = > +object_property_get_uint(obj, "chassis_nr", NULL); > + > +if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) { > +return 0; > +} > + > +/* Skip unsupported bridge types */ > +if (!chassis_nr) { > +return 0; > +} > + > +/* Skip self */ > +if (obj == opaque) { > +return 0; > +} > + > +return chassis_nr == new_chassis_nr; > +} > + > +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp) I would rename this "bridge_has_unique_chassis_nr". > +{ > +int chassis_nr = > +object_property_get_uint(bridge, "chassis_nr", NULL); > + > +/* > + * slotid_cap_init() already ensures that "chassis_nr" isn't null for > + * standard PCI bridges, so this really tells if "chassis_nr" is present > + * or not. > + */ > +if (!chassis_nr) { > +error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property"); > +error_append_hint(errp, "Try -device pci-bridge instead.\n"); > +return false; > +} > + > +/* We want unique values for "chassis_nr" */ > +if (object_child_foreach_recursive(object_get_root(), check_chassis_nr, > + bridge)) { > +error_setg(errp, "Bridge chassis %d already in use", chassis_nr); > +return false; > +} > + > +return true; > +} > + > static void spapr_pci_plug(HotplugHandler *plug_handler, > DeviceState *plugged_dev, Error **errp) > { > @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler > *plug_handler, > PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev))); > uint32_t slotnr = PCI_SLOT(pdev->devfn); > > +if (pc->is_bridge) { > +if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) { > +return; > +} > +} > + Add a comment here explaning DRC ids are generated from chassis_nr and these need to be unique? > /* if DR is disabled we don't need to do anything in the case of > * hotplug or coldplug callbacks > */ > With above fixed: Acked-by: Michael S. Tsirkin Feel free to merge.
Re: [PATCH] spapr_pci: Robustify support of PCI bridges
On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote: > Some recent error handling cleanups unveiled issues with our support of > PCI bridges: > > 1) QEMU aborts when using non-standard PCI bridge types, >unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge > Unexpected error in object_property_find() at qom/object.c:1240: > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found > Aborted (core dumped) Oops, I thought we had a check that we actually had a "pci-bridge" device before continuing with the hotplug, but I guess not. > This happens because we assume all PCI bridge types to have a "chassis_nr" > property. This property only exists with the standard PCI bridge type > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems > much simpler to check the presence of "chassis_nr" earlier. Hrm, right, 7ef1553dac was not really correct since add_drcs() really can fail. > 2) QEMU abort if same "chassis_nr" value is used several times, >unveiled by commit d2623129a7de "qom: Drop parameter @errp of >object_property_add() & friends" > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ > -device pci-bridge,chassis_nr=1 > Unexpected error in object_property_try_add() at qom/object.c:1167: > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate > property '4100' to object (type 'container') > Aborted (core dumped) > > This happens because we assume that "chassis_nr" values are unique, but > nobody enforces that and we end up generating duplicate DRC ids. The PCI > code doesn't really care for duplicate "chassis_nr" properties since it > is only used to initialize the "Chassis Number Register" of the bridge, > with no functional impact on QEMU. So, even if passing the same value > several times might look weird, it never broke anything before, so > I guess we don't necessarily want to enforce strict checking in the PCI > code now. Yeah, I guess. I'm pretty sure that the chassis number of bridges is supposed to be system-unique (well, unique within the PCI domain at least, I guess) as part of the hardware spec. So specifying multiple chassis ids the same is a user error, but we need a better failure mode. > Workaround both issues in the PAPR code: check that the bridge has a > unique and non null "chassis_nr" when plugging it into its parent bus. > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") Arguably, it's really fixing 7ef1553dac. > Reported-by: Thomas Huth > Signed-off-by: Greg Kurz I had a few misgivings about the details of this, but I think I've convinced myself they're fine. There's a couple of things I'd like to polish, but I'll do that as a follow up. > --- > hw/ppc/spapr_pci.c | 57 > > 1 file changed, 57 insertions(+) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 329002ac040e..09d52ef7954d 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb, > add_drcs(phb, bus); > } > > +/* Returns non-zero if the value of "chassis_nr" is already in use */ > +static int check_chassis_nr(Object *obj, void *opaque) > +{ > +int new_chassis_nr = > +object_property_get_uint(opaque, "chassis_nr", &error_abort); > +int chassis_nr = > +object_property_get_uint(obj, "chassis_nr", NULL); > + > +if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) { > +return 0; > +} > + > +/* Skip unsupported bridge types */ > +if (!chassis_nr) { > +return 0; > +} > + > +/* Skip self */ > +if (obj == opaque) { > +return 0; > +} > + > +return chassis_nr == new_chassis_nr; > +} > + > +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp) > +{ > +int chassis_nr = > +object_property_get_uint(bridge, "chassis_nr", NULL); > + > +/* > + * slotid_cap_init() already ensures that "chassis_nr" isn't null for > + * standard PCI bridges, so this really tells if "chassis_nr" is present > + * or not. > + */ > +if (!chassis_nr) { > +error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property"); > +error_append_hint(errp, "Try -device pci-bridge instead.\n"); > +return false; > +} > + > +/* We want unique values for "chassis_nr" */ > +if (object_child_foreach_recursive(object_get_root(), check_chassis_nr, > + bridge)) { > +error_setg(errp, "Bridge chassis %d already in use", chassis_nr); > +return false; > +} > + > +return true; > +} > + > static void spapr_pci_plug(HotplugHandler *plug_handler, > DeviceState *plugged_dev, Error **errp) > { > @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(H
[PATCH] spapr_pci: Robustify support of PCI bridges
Some recent error handling cleanups unveiled issues with our support of PCI bridges: 1) QEMU aborts when using non-standard PCI bridge types, unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling" $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge Unexpected error in object_property_find() at qom/object.c:1240: qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found Aborted (core dumped) This happens because we assume all PCI bridge types to have a "chassis_nr" property. This property only exists with the standard PCI bridge type "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems much simpler to check the presence of "chassis_nr" earlier. 2) QEMU abort if same "chassis_nr" value is used several times, unveiled by commit d2623129a7de "qom: Drop parameter @errp of object_property_add() & friends" $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ -device pci-bridge,chassis_nr=1 Unexpected error in object_property_try_add() at qom/object.c:1167: qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '4100' to object (type 'container') Aborted (core dumped) This happens because we assume that "chassis_nr" values are unique, but nobody enforces that and we end up generating duplicate DRC ids. The PCI code doesn't really care for duplicate "chassis_nr" properties since it is only used to initialize the "Chassis Number Register" of the bridge, with no functional impact on QEMU. So, even if passing the same value several times might look weird, it never broke anything before, so I guess we don't necessarily want to enforce strict checking in the PCI code now. Workaround both issues in the PAPR code: check that the bridge has a unique and non null "chassis_nr" when plugging it into its parent bus. Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids") Reported-by: Thomas Huth Signed-off-by: Greg Kurz --- hw/ppc/spapr_pci.c | 57 1 file changed, 57 insertions(+) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 329002ac040e..09d52ef7954d 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb, add_drcs(phb, bus); } +/* Returns non-zero if the value of "chassis_nr" is already in use */ +static int check_chassis_nr(Object *obj, void *opaque) +{ +int new_chassis_nr = +object_property_get_uint(opaque, "chassis_nr", &error_abort); +int chassis_nr = +object_property_get_uint(obj, "chassis_nr", NULL); + +if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) { +return 0; +} + +/* Skip unsupported bridge types */ +if (!chassis_nr) { +return 0; +} + +/* Skip self */ +if (obj == opaque) { +return 0; +} + +return chassis_nr == new_chassis_nr; +} + +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp) +{ +int chassis_nr = +object_property_get_uint(bridge, "chassis_nr", NULL); + +/* + * slotid_cap_init() already ensures that "chassis_nr" isn't null for + * standard PCI bridges, so this really tells if "chassis_nr" is present + * or not. + */ +if (!chassis_nr) { +error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property"); +error_append_hint(errp, "Try -device pci-bridge instead.\n"); +return false; +} + +/* We want unique values for "chassis_nr" */ +if (object_child_foreach_recursive(object_get_root(), check_chassis_nr, + bridge)) { +error_setg(errp, "Bridge chassis %d already in use", chassis_nr); +return false; +} + +return true; +} + static void spapr_pci_plug(HotplugHandler *plug_handler, DeviceState *plugged_dev, Error **errp) { @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler *plug_handler, PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev))); uint32_t slotnr = PCI_SLOT(pdev->devfn); +if (pc->is_bridge) { +if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) { +return; +} +} + /* if DR is disabled we don't need to do anything in the case of * hotplug or coldplug callbacks */