Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.
On 2021-04-30 11:53:24 Fri, Oliver O'Halloran wrote: > On Thu, Apr 29, 2021 at 7:02 PM Mahesh J Salgaonkar > wrote: > > > > On 2021-04-28 22:33:45 Wed, Oliver O'Halloran wrote: > > > On Tue, Apr 27, 2021 at 9:56 PM Mahesh Salgaonkar > > > wrote: > > > > > > > > With upstream kernel, especially after commit 98ba956f6a389 > > > > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that > > > > KVM > > > > guest isn't able to enable EEH option for PCI pass-through devices > > > > anymore. > > > > > > How are you passing the devices through to the guest? > > > > I am using libvirt with below xml section to add pass-through: > > > > > > > > > > > > > >> function='0x0' multifunction='on'/> > > > > > > > > > > > > > >> function='0x1' multifunction='on'/> > > > > > > Looks like libvirt does not allow pass through device in slot zero, and > > throws following error. > > > > error: XML error: Invalid PCI address :01:00.0. slot must be >= 1 > > Failed. Try again? [y,n,i,f,?]: > > That's pretty odd and I have no idea why that's happening. I seem to > remember being able to use slot 0 for vfio devices when doing the > passthru manually with the qemu command line so this might be a > libvirt quirk. > > > *snip* > > > > Agree. I realize my fix is not correctly handling this. The current code > > under ibm,set-eeh-option is checking for individual PCI device presence. > > Better fix should be to check if there is any PCI device (vfio-pci) > > present under specified bus and enable the EEH if found. And no change > > in return value of get-config-addr-info2. What do you say ? > > That sounds reasonable. You would however need to verify that all the > devices on that bus are within the same PE on the hypervisor side. I see that the spapr_phb_eeh_available(sphb) check in ibm,set-eeh-option already makes sure that all the devices on that bus are from same iommu group (within same PE) before going ahead with enabling EEH. - rtas_ibm_set_eeh_option(): ... if (!spapr_phb_eeh_available(sphb)) { <--- goto param_error_exit; } ret = spapr_phb_vfio_eeh_set_option(sphb, addr, option); rtas_st(rets, 0, ret); return; param_error_exit: rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); -- I verified that, if we add devices from two different PHB under same sphb then spapr_phb_eeh_available(sphb) returns false and we fail to enable EEH. Hence we are good here, we fail very early if devices on that bus are not within same PE. So, I just need to check for presence of any vfio-pci device under the specified bus and enable the EEH. Let me know your views on this. Thanks, -Mahesh. -- Mahesh J Salgaonkar
Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.
On 2021-04-30 07:52:58 Fri, Daniel Henrique Barboza wrote: > > > On 4/29/21 6:02 AM, Mahesh J Salgaonkar wrote: > > On 2021-04-28 22:33:45 Wed, Oliver O'Halloran wrote: > > > On Tue, Apr 27, 2021 at 9:56 PM Mahesh Salgaonkar > > > wrote: > > > > > > > > With upstream kernel, especially after commit 98ba956f6a389 > > > > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that > > > > KVM > > > > guest isn't able to enable EEH option for PCI pass-through devices > > > > anymore. > > > > > > How are you passing the devices through to the guest? > > > > I am using libvirt with below xml section to add pass-through: > > > > > > > > > > > > > > > function='0x0' multifunction='on'/> > > > > > > > > > > > > > > > function='0x1' multifunction='on'/> > > > > > > Looks like libvirt does not allow pass through device in slot zero, and > > throws following error. > > There's no such restriction in Libvirt, at least as far as I remember. > > > > > error: XML error: Invalid PCI address :01:00.0. slot must be >= 1 > > Failed. Try again? [y,n,i,f,?]: > > > This looks odd. The error message is complaining about :01:00.0 but > your XML up there is declaring :01:01.0. Above XML snipphet is working one. I see the XML error when I change slot value to zero. > > Also, the 'multifunction' bool is usually used only in the function 0 > passthrough, indicating that the guest will configure all other functions as > the the multifunction device. You can ignore this bool in the XML for > PCI passthrough if you don't care about the guest seeing this device as > multifunction (i.e. all functions in the same IOMMU group and so on). > Thanks, -Mahesh.
Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.
card 1 0005:ff:02.0 - card 2 And with get-config-addr-info2 returning 00BB0001, ibm,set-eeh-option RTAS call tries to check if device is present on the bus of spapr_phb at bus->devices[devfn] where devfn is 0. And since qemu does not support pass through device on slot zero bus->devices[0] is always NULL. And hence it fails to enable EEH. > > With this patch applied get-config-addr-info2 returns 00BBD001, so the > PE we get for each device will be: > > card 1 - 00ff0001 > card 2 - 00ff1001 > > Which implies the two are in different PEs. As a result, if the guest > requests a reset of card 1's PE then the guest will see an unexpected > reset of card 2 as well. From the hypervisor's point of view the two > are in the same PE so this is a legitimate thing to do, but due to > this patch the guest doesn't know that. Agree. I realize my fix is not correctly handling this. The current code under ibm,set-eeh-option is checking for individual PCI device presence. Better fix should be to check if there is any PCI device (vfio-pci) present under specified bus and enable the EEH if found. And no change in return value of get-config-addr-info2. What do you say ? -- Mahesh J Salgaonkar
Re: [PATCH] ppc/spapr: Set the effective address provided flag in mc error log.
On 2020-03-17 17:11:22 Tue, Greg Kurz wrote: > On Tue, 17 Mar 2020 08:51:50 -0700 (PDT) > no-re...@patchew.org wrote: > > > Patchew URL: > > https://patchew.org/QEMU/158444819283.31599.12155058652686614304.stgit@jupiter/ > > > > > > > > Hi, > > > > This series seems to have some coding style problems. See output below for > > more information: > > > > Subject: [PATCH] ppc/spapr: Set the effective address provided flag in mc > > error log. > > Message-id: 158444819283.31599.12155058652686614304.stgit@jupiter > > Type: series > > > > === TEST SCRIPT BEGIN === > > #!/bin/bash > > git rev-parse base > /dev/null || exit 0 > > git config --local diff.renamelimit 0 > > git config --local diff.renames True > > git config --local diff.algorithm histogram > > ./scripts/checkpatch.pl --mailback base.. > > === TEST SCRIPT END === > > > > Switched to a new branch 'test' > > 62d8ada ppc/spapr: Set the effective address provided flag in mc error log. > > > > === OUTPUT BEGIN === > > ERROR: code indent should never use tabs > > #57: FILE: hw/ppc/spapr_events.c:739: > > +^Iswitch (ext_elog->mc.error_type) {$ > > > > Yeah no tabs allowed in the QEMU code (see CODING_STYLE.rst). > > If your editor is emacs, you can consider setting these: > > (setq indent-tabs-mode nil) > (setq c-basic-offset 4)) My bad. Will fix it and respin v2. Thanks, -Mahesh.
Re: [PATCH v2 2/8] ppc/spapr: Change FWNMI names
On 2020-03-17 00:26:07 Tue, Nicholas Piggin wrote: > The option is called "FWNMI", and it involves more than just machine > checks, also machine checks can be delivered without the FWNMI option, > so re-name various things to reflect that. > > Signed-off-by: Nicholas Piggin > --- > hw/ppc/spapr.c| 28 ++-- > hw/ppc/spapr_caps.c | 14 +++--- > hw/ppc/spapr_events.c | 14 +++--- > hw/ppc/spapr_rtas.c | 17 + > include/hw/ppc/spapr.h| 27 +-- > tests/qtest/libqos/libqos-spapr.h | 2 +- > 6 files changed, 55 insertions(+), 47 deletions(-) > [...] > @@ -626,14 +626,14 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = { > .type = "bool", > .apply = cap_ccf_assist_apply, > }, > -[SPAPR_CAP_FWNMI_MCE] = { > -.name = "fwnmi-mce", > -.description = "Handle fwnmi machine check exceptions", > -.index = SPAPR_CAP_FWNMI_MCE, > +[SPAPR_CAP_FWNMI] = { > +.name = "fwnmi", I guess this should be fine and should hit QEMU 5.0 release so that we don't end up with two different CAP names for 5.0 and future releases. Thanks, -Mahesh.
Re: [PATCH v2 7/8] ppc/spapr: Implement FWNMI System Reset delivery
On 2020-03-17 00:26:12 Tue, Nicholas Piggin wrote: > PAPR requires that if "ibm,nmi-register" succeeds, then the hypervisor > delivers all system reset and machine check exceptions to the registered > addresses. > > System Resets are delivered with registers set to the architected state, > and with no interlock. > > Signed-off-by: Nicholas Piggin > --- > hw/ppc/spapr.c | 46 -- > 1 file changed, 44 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 25221d843c..78e649f47d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -967,7 +967,29 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void > *fdt) > _FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", > maxdomains, sizeof(maxdomains))); > > -_FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_SIZE)); > +/* > + * FWNMI reserves RTAS_ERROR_LOG_MAX for the machine check error log, > + * and 16 bytes per CPU for system reset error log plus an extra 8 bytes. > + * > + * The system reset requirements are driven by existing Linux and PowerVM > + * implementation which (contrary to PAPR) saves r3 in the error log > + * structure like machine check, so Linux expects to find the saved r3 > + * value at the address in r3 upon FWNMI-enabled sreset interrupt (and > + * does not look at the error value). > + * > + * System reset interrupts are not subject to interlock like machine > + * check, so this memory area could be corrupted if the sreset is > + * interrupted by a machine check (or vice versa) if it was shared. To > + * prevent this, system reset uses per-CPU areas for the sreset save > + * area. A system reset that interrupts a system reset handler could > + * still overwrite this area, but Linux doesn't try to recover in that > + * case anyway. > + * > + * The extra 8 bytes is required because Linux's FWNMI error log check > + * is off-by-one. > + */ > +_FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX + > + ms->smp.max_cpus * sizeof(uint64_t)*2 + > sizeof(uint64_t))); Currently the rtas region is only of size 2048 (i.e RTAS_ERROR_LOG_MAX). Do we need SLOF change to increase rtas area as well ? Otherwise QEMU may corrupt guest memory area OR Am I wrong ? Thanks, -Mahesh/ > _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", >RTAS_ERROR_LOG_MAX)); > _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", > @@ -3399,8 +3421,28 @@ static void spapr_machine_finalizefn(Object *obj) > > void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg) > { > +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); > + > cpu_synchronize_state(cs); > -ppc_cpu_do_system_reset(cs, -1); > +/* If FWNMI is inactive, addr will be -1, which will deliver to 0x100 */ > +if (spapr->fwnmi_system_reset_addr != -1) { > +uint64_t rtas_addr, addr; > +PowerPCCPU *cpu = POWERPC_CPU(cs); > +CPUPPCState *env = >env; > + > +/* get rtas addr from fdt */ > +rtas_addr = spapr_get_rtas_addr(); > +if (!rtas_addr) { > +qemu_system_guest_panicked(NULL); > +return; > +} > + > +addr = rtas_addr + RTAS_ERROR_LOG_MAX + cs->cpu_index * > sizeof(uint64_t)*2; > +stq_be_phys(_space_memory, addr, env->gpr[3]); > +stq_be_phys(_space_memory, addr + sizeof(uint64_t), 0); > +env->gpr[3] = addr; > +} > +ppc_cpu_do_system_reset(cs, spapr->fwnmi_system_reset_addr); > } > > static void spapr_nmi(NMIState *n, int cpu_index, Error **errp) > -- > 2.23.0 > > -- Mahesh J Salgaonkar
Re: [PATCH v2 3/8] ppc/spapr: Add FWNMI System Reset state
On 2020-03-17 00:26:08 Tue, Nicholas Piggin wrote: > The FWNMI option must deliver system reset interrupts to their > registered address, and there are a few constraints on the handler > addresses specified in PAPR. Add the system reset address state and > checks. > > Signed-off-by: Nicholas Piggin Looks good to me. Reviwed-by: Mahesh Salgaonkar Thanks, -Mahesh. > --- > hw/ppc/spapr.c | 2 ++ > hw/ppc/spapr_rtas.c| 14 +- > include/hw/ppc/spapr.h | 3 ++- > 3 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index b03b26370d..5f93c49706 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1704,6 +1704,7 @@ static void spapr_machine_reset(MachineState *machine) > > spapr->cas_reboot = false; > > +spapr->fwnmi_system_reset_addr = -1; > spapr->fwnmi_machine_check_addr = -1; > spapr->fwnmi_machine_check_interlock = -1; > > @@ -2023,6 +2024,7 @@ static const VMStateDescription vmstate_spapr_fwnmi = { > .needed = spapr_fwnmi_needed, > .pre_save = spapr_fwnmi_pre_save, > .fields = (VMStateField[]) { > +VMSTATE_UINT64(fwnmi_system_reset_addr, SpaprMachineState), > VMSTATE_UINT64(fwnmi_machine_check_addr, SpaprMachineState), > VMSTATE_INT32(fwnmi_machine_check_interlock, SpaprMachineState), > VMSTATE_END_OF_LIST() > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c > index 0b8c481593..521e6b0b72 100644 > --- a/hw/ppc/spapr_rtas.c > +++ b/hw/ppc/spapr_rtas.c > @@ -414,6 +414,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu, >uint32_t nret, target_ulong rets) > { > hwaddr rtas_addr; > +target_ulong sreset_addr, mce_addr; > > if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI) == SPAPR_CAP_OFF) { > rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED); > @@ -426,7 +427,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu, > return; > } > > -spapr->fwnmi_machine_check_addr = rtas_ld(args, 1); > +sreset_addr = rtas_ld(args, 0); > +mce_addr = rtas_ld(args, 1); > + > +/* PAPR requires these are in the first 32M of memory and within RMA */ > +if (sreset_addr >= 32 * MiB || sreset_addr >= spapr->rma_size || > + mce_addr >= 32 * MiB ||mce_addr >= spapr->rma_size) { > +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); > +return; > +} > + > +spapr->fwnmi_system_reset_addr = sreset_addr; > +spapr->fwnmi_machine_check_addr = mce_addr; > > rtas_st(rets, 0, RTAS_OUT_SUCCESS); > } > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 64b83402cb..42d64a0368 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -194,9 +194,10 @@ struct SpaprMachineState { > > /* State related to FWNMI option */ > > -/* Machine Check Notification Routine address > +/* System Reset and Machine Check Notification Routine addresses > * registered by "ibm,nmi-register" RTAS call. > */ > +target_ulong fwnmi_system_reset_addr; > target_ulong fwnmi_machine_check_addr; > > /* Machine Check FWNMI synchronization, fwnmi_machine_check_interlock is > -- > 2.23.0 > > -- Mahesh J Salgaonkar