Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.

2021-05-03 Thread Mahesh J Salgaonkar
On 2021-04-30 11:53:24 Fri, Oliver O'Halloran wrote:
> On Thu, Apr 29, 2021 at 7:02 PM Mahesh J Salgaonkar
>  wrote:
> >
> > On 2021-04-28 22:33:45 Wed, Oliver O'Halloran wrote:
> > > On Tue, Apr 27, 2021 at 9:56 PM Mahesh Salgaonkar  
> > > wrote:
> > > >
> > > > With upstream kernel, especially after commit 98ba956f6a389
> > > > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that 
> > > > KVM
> > > > guest isn't able to enable EEH option for PCI pass-through devices 
> > > > anymore.
> > >
> > > How are you passing the devices through to the guest?
> >
> > I am using libvirt with below xml section to add pass-through:
> >
> > 
> >   
> >   
> > 
> >   
> >> function='0x0' multifunction='on'/>
> > 
> > 
> >   
> >   
> > 
> >   
> >> function='0x1' multifunction='on'/>
> > 
> >
> > Looks like libvirt does not allow pass through device in slot zero, and
> > throws following error.
> >
> > error: XML error: Invalid PCI address :01:00.0. slot must be >= 1
> > Failed. Try again? [y,n,i,f,?]:
> 
> That's pretty odd and I have no idea why that's happening. I seem to
> remember being able to use slot 0 for vfio devices when doing the
> passthru manually with the qemu command line so this might be a
> libvirt quirk.
> 
> > *snip*
> >
> > Agree. I realize my fix is not correctly handling this. The current code
> > under ibm,set-eeh-option is checking for individual PCI device presence.
> > Better fix should be to check if there is any PCI device (vfio-pci)
> > present under specified bus and enable the EEH if found. And no change
> > in return value of get-config-addr-info2. What do you say ?
> 
> That sounds reasonable. You would however need to verify that all the
> devices on that bus are within the same PE on the hypervisor side.

I see that the spapr_phb_eeh_available(sphb) check in ibm,set-eeh-option
already makes sure that all the devices on that bus are from same iommu
group (within same PE) before going ahead with enabling EEH.

-
rtas_ibm_set_eeh_option():
...
if (!spapr_phb_eeh_available(sphb)) {  <---
goto param_error_exit;
}

ret = spapr_phb_vfio_eeh_set_option(sphb, addr, option);
rtas_st(rets, 0, ret);
return;

param_error_exit:
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
--

I verified that, if we add devices from two different PHB under same
sphb then spapr_phb_eeh_available(sphb) returns false and we fail to
enable EEH. Hence we are good here, we fail very early if devices on
that bus are not within same PE.

So, I just need to check for presence of any vfio-pci device under the
specified bus and enable the EEH. Let me know your views on this.

Thanks,
-Mahesh.

-- 
Mahesh J Salgaonkar



Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.

2021-05-03 Thread Mahesh J Salgaonkar
On 2021-04-30 07:52:58 Fri, Daniel Henrique Barboza wrote:
> 
> 
> On 4/29/21 6:02 AM, Mahesh J Salgaonkar wrote:
> > On 2021-04-28 22:33:45 Wed, Oliver O'Halloran wrote:
> > > On Tue, Apr 27, 2021 at 9:56 PM Mahesh Salgaonkar  
> > > wrote:
> > > > 
> > > > With upstream kernel, especially after commit 98ba956f6a389
> > > > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that 
> > > > KVM
> > > > guest isn't able to enable EEH option for PCI pass-through devices 
> > > > anymore.
> > > 
> > > How are you passing the devices through to the guest?
> > 
> > I am using libvirt with below xml section to add pass-through:
> > 
> >  
> >
> >
> >  
> >
> > > function='0x0' multifunction='on'/>
> >  
> >  
> >
> >
> >  
> >
> > > function='0x1' multifunction='on'/>
> >  
> > 
> > Looks like libvirt does not allow pass through device in slot zero, and
> > throws following error.
> 
> There's no such restriction in Libvirt, at least as far as I remember.
> 
> > 
> > error: XML error: Invalid PCI address :01:00.0. slot must be >= 1
> > Failed. Try again? [y,n,i,f,?]:
> 
> 
> This looks odd. The error message is complaining about :01:00.0 but
> your XML up there is declaring :01:01.0.

Above XML snipphet is working one. I see the XML error when I change slot
value to zero.

> 
> Also, the 'multifunction' bool is usually used only in the function 0
> passthrough, indicating that the guest will configure all other functions as
> the the multifunction device. You can ignore this bool in the XML for
> PCI passthrough if you don't care about the guest seeing this device as
> multifunction (i.e. all functions in the same IOMMU group and so on).
> 

Thanks,
-Mahesh.



Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.

2021-04-29 Thread Mahesh J Salgaonkar
 card 1
 0005:ff:02.0 - card 2

And with get-config-addr-info2 returning 00BB0001, ibm,set-eeh-option
RTAS call tries to check if device is present on the bus of spapr_phb at
bus->devices[devfn] where devfn is 0. And since qemu does not support
pass through device on slot zero bus->devices[0] is always NULL. And
hence it fails to enable EEH.

> 
> With this patch applied get-config-addr-info2 returns 00BBD001, so the
> PE we get for each device will be:
> 
> card 1 - 00ff0001
> card 2 - 00ff1001
> 
> Which implies the two are in different PEs. As a result, if the guest
> requests a reset of card 1's PE then the guest will see an unexpected
> reset of card 2 as well. From the hypervisor's point of view the two
> are in the same PE so this is a legitimate thing to do, but due to
> this patch the guest doesn't know that.

Agree. I realize my fix is not correctly handling this. The current code
under ibm,set-eeh-option is checking for individual PCI device presence.
Better fix should be to check if there is any PCI device (vfio-pci)
present under specified bus and enable the EEH if found. And no change
in return value of get-config-addr-info2. What do you say ?

-- 
Mahesh J Salgaonkar



Re: [PATCH] ppc/spapr: Set the effective address provided flag in mc error log.

2020-03-17 Thread Mahesh J Salgaonkar
On 2020-03-17 17:11:22 Tue, Greg Kurz wrote:
> On Tue, 17 Mar 2020 08:51:50 -0700 (PDT)
> no-re...@patchew.org wrote:
> 
> > Patchew URL: 
> > https://patchew.org/QEMU/158444819283.31599.12155058652686614304.stgit@jupiter/
> > 
> > 
> > 
> > Hi,
> > 
> > This series seems to have some coding style problems. See output below for
> > more information:
> > 
> > Subject: [PATCH] ppc/spapr: Set the effective address provided flag in mc 
> > error log.
> > Message-id: 158444819283.31599.12155058652686614304.stgit@jupiter
> > Type: series
> > 
> > === TEST SCRIPT BEGIN ===
> > #!/bin/bash
> > git rev-parse base > /dev/null || exit 0
> > git config --local diff.renamelimit 0
> > git config --local diff.renames True
> > git config --local diff.algorithm histogram
> > ./scripts/checkpatch.pl --mailback base..
> > === TEST SCRIPT END ===
> > 
> > Switched to a new branch 'test'
> > 62d8ada ppc/spapr: Set the effective address provided flag in mc error log.
> > 
> > === OUTPUT BEGIN ===
> > ERROR: code indent should never use tabs
> > #57: FILE: hw/ppc/spapr_events.c:739:
> > +^Iswitch (ext_elog->mc.error_type) {$
> > 
> 
> Yeah no tabs allowed in the QEMU code (see CODING_STYLE.rst).
> 
> If your editor is emacs, you can consider setting these:
> 
>   (setq indent-tabs-mode nil)
>   (setq c-basic-offset 4))

My bad. Will fix it and respin v2.

Thanks,
-Mahesh.




Re: [PATCH v2 2/8] ppc/spapr: Change FWNMI names

2020-03-16 Thread Mahesh J Salgaonkar
On 2020-03-17 00:26:07 Tue, Nicholas Piggin wrote:
> The option is called "FWNMI", and it involves more than just machine
> checks, also machine checks can be delivered without the FWNMI option,
> so re-name various things to reflect that.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  hw/ppc/spapr.c| 28 ++--
>  hw/ppc/spapr_caps.c   | 14 +++---
>  hw/ppc/spapr_events.c | 14 +++---
>  hw/ppc/spapr_rtas.c   | 17 +
>  include/hw/ppc/spapr.h| 27 +--
>  tests/qtest/libqos/libqos-spapr.h |  2 +-
>  6 files changed, 55 insertions(+), 47 deletions(-)
> 
[...]
> @@ -626,14 +626,14 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  .type = "bool",
>  .apply = cap_ccf_assist_apply,
>  },
> -[SPAPR_CAP_FWNMI_MCE] = {
> -.name = "fwnmi-mce",
> -.description = "Handle fwnmi machine check exceptions",
> -.index = SPAPR_CAP_FWNMI_MCE,
> +[SPAPR_CAP_FWNMI] = {
> +.name = "fwnmi",

I guess this should be fine and should hit QEMU 5.0 release so that we
don't end up with two different CAP names for 5.0 and future releases.

Thanks,
-Mahesh.




Re: [PATCH v2 7/8] ppc/spapr: Implement FWNMI System Reset delivery

2020-03-16 Thread Mahesh J Salgaonkar
On 2020-03-17 00:26:12 Tue, Nicholas Piggin wrote:
> PAPR requires that if "ibm,nmi-register" succeeds, then the hypervisor
> delivers all system reset and machine check exceptions to the registered
> addresses.
> 
> System Resets are delivered with registers set to the architected state,
> and with no interlock.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  hw/ppc/spapr.c | 46 --
>  1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 25221d843c..78e649f47d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -967,7 +967,29 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void 
> *fdt)
>  _FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
>   maxdomains, sizeof(maxdomains)));
> 
> -_FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_SIZE));
> +/*
> + * FWNMI reserves RTAS_ERROR_LOG_MAX for the machine check error log,
> + * and 16 bytes per CPU for system reset error log plus an extra 8 bytes.
> + *
> + * The system reset requirements are driven by existing Linux and PowerVM
> + * implementation which (contrary to PAPR) saves r3 in the error log
> + * structure like machine check, so Linux expects to find the saved r3
> + * value at the address in r3 upon FWNMI-enabled sreset interrupt (and
> + * does not look at the error value).
> + *
> + * System reset interrupts are not subject to interlock like machine
> + * check, so this memory area could be corrupted if the sreset is
> + * interrupted by a machine check (or vice versa) if it was shared. To
> + * prevent this, system reset uses per-CPU areas for the sreset save
> + * area. A system reset that interrupts a system reset handler could
> + * still overwrite this area, but Linux doesn't try to recover in that
> + * case anyway.
> + *
> + * The extra 8 bytes is required because Linux's FWNMI error log check
> + * is off-by-one.
> + */
> +_FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
> +   ms->smp.max_cpus * sizeof(uint64_t)*2 + 
> sizeof(uint64_t)));

Currently the rtas region is only of size 2048 (i.e RTAS_ERROR_LOG_MAX).
Do we need SLOF change to increase rtas area as well ? Otherwise QEMU
may corrupt guest memory area OR Am I wrong ?

Thanks,
-Mahesh/

>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max",
>RTAS_ERROR_LOG_MAX));
>  _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate",
> @@ -3399,8 +3421,28 @@ static void spapr_machine_finalizefn(Object *obj)
> 
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg)
>  {
> +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
>  cpu_synchronize_state(cs);
> -ppc_cpu_do_system_reset(cs, -1);
> +/* If FWNMI is inactive, addr will be -1, which will deliver to 0x100 */
> +if (spapr->fwnmi_system_reset_addr != -1) {
> +uint64_t rtas_addr, addr;
> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> +CPUPPCState *env = >env;
> +
> +/* get rtas addr from fdt */
> +rtas_addr = spapr_get_rtas_addr();
> +if (!rtas_addr) {
> +qemu_system_guest_panicked(NULL);
> +return;
> +}
> +
> +addr = rtas_addr + RTAS_ERROR_LOG_MAX + cs->cpu_index * 
> sizeof(uint64_t)*2;
> +stq_be_phys(_space_memory, addr, env->gpr[3]);
> +stq_be_phys(_space_memory, addr + sizeof(uint64_t), 0);
> +env->gpr[3] = addr;
> +}
> +ppc_cpu_do_system_reset(cs, spapr->fwnmi_system_reset_addr);
>  }
> 
>  static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
> -- 
> 2.23.0
> 
> 

-- 
Mahesh J Salgaonkar




Re: [PATCH v2 3/8] ppc/spapr: Add FWNMI System Reset state

2020-03-16 Thread Mahesh J Salgaonkar
On 2020-03-17 00:26:08 Tue, Nicholas Piggin wrote:
> The FWNMI option must deliver system reset interrupts to their
> registered address, and there are a few constraints on the handler
> addresses specified in PAPR. Add the system reset address state and
> checks.
> 
> Signed-off-by: Nicholas Piggin 

Looks good to me.

Reviwed-by: Mahesh Salgaonkar 

Thanks,
-Mahesh.
> ---
>  hw/ppc/spapr.c |  2 ++
>  hw/ppc/spapr_rtas.c| 14 +-
>  include/hw/ppc/spapr.h |  3 ++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b03b26370d..5f93c49706 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1704,6 +1704,7 @@ static void spapr_machine_reset(MachineState *machine)
> 
>  spapr->cas_reboot = false;
> 
> +spapr->fwnmi_system_reset_addr = -1;
>  spapr->fwnmi_machine_check_addr = -1;
>  spapr->fwnmi_machine_check_interlock = -1;
> 
> @@ -2023,6 +2024,7 @@ static const VMStateDescription vmstate_spapr_fwnmi = {
>  .needed = spapr_fwnmi_needed,
>  .pre_save = spapr_fwnmi_pre_save,
>  .fields = (VMStateField[]) {
> +VMSTATE_UINT64(fwnmi_system_reset_addr, SpaprMachineState),
>  VMSTATE_UINT64(fwnmi_machine_check_addr, SpaprMachineState),
>  VMSTATE_INT32(fwnmi_machine_check_interlock, SpaprMachineState),
>  VMSTATE_END_OF_LIST()
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 0b8c481593..521e6b0b72 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -414,6 +414,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>uint32_t nret, target_ulong rets)
>  {
>  hwaddr rtas_addr;
> +target_ulong sreset_addr, mce_addr;
> 
>  if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI) == SPAPR_CAP_OFF) {
>  rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> @@ -426,7 +427,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>  return;
>  }
> 
> -spapr->fwnmi_machine_check_addr = rtas_ld(args, 1);
> +sreset_addr = rtas_ld(args, 0);
> +mce_addr = rtas_ld(args, 1);
> +
> +/* PAPR requires these are in the first 32M of memory and within RMA */
> +if (sreset_addr >= 32 * MiB || sreset_addr >= spapr->rma_size ||
> +   mce_addr >= 32 * MiB ||mce_addr >= spapr->rma_size) {
> +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +return;
> +}
> +
> +spapr->fwnmi_system_reset_addr = sreset_addr;
> +spapr->fwnmi_machine_check_addr = mce_addr;
> 
>  rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 64b83402cb..42d64a0368 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -194,9 +194,10 @@ struct SpaprMachineState {
> 
>  /* State related to FWNMI option */
> 
> -/* Machine Check Notification Routine address
> +/* System Reset and Machine Check Notification Routine addresses
>   * registered by "ibm,nmi-register" RTAS call.
>   */
> +target_ulong fwnmi_system_reset_addr;
>  target_ulong fwnmi_machine_check_addr;
> 
>  /* Machine Check FWNMI synchronization, fwnmi_machine_check_interlock is
> -- 
> 2.23.0
> 
> 

-- 
Mahesh J Salgaonkar