Re: [PATCH 3/8] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Mon, May 19, 2014 at 06:37:24PM -0600, Alex Williamson wrote: >On Tue, 2014-05-20 at 10:22 +1000, Gavin Shan wrote: >> On Mon, May 19, 2014 at 04:33:10PM -0600, Alex Williamson wrote: >> >On Wed, 2014-05-14 at 14:11 +1000, Gavin Shan wrote: >> >> The patch adds new IOCTL command VFIO_EEH_INFO to VFIO container >> >> to support EEH functionality for PCI devices, which have been >> >> passed from host to guest via VFIO. >> >> Thanks for your comments, Alex.W :-) >> >> > >> >Some comments throughout, but overall this seems to forgo every bit of >> >the device ownership and protection model used by VFIO and lets the user >> >pick arbitrary host devices and do various operations, mostly unchecked. >> >That's not acceptable. >> > >> >> As what I replied to patch[2], I'm going to let VFIO-PCI-dev fd handle >> the newly introduced IOCTL command. That way, we should follow the VFIO >> design principles (ownership and protection) because VFIO-PCI-dev fd >> is owned by QEMU process usually. >> >> Also, the address mapping maintained in EEH will be removed. >> >> >> Signed-off-by: Gavin Shan >> >> --- >> >> arch/powerpc/platforms/powernv/Makefile | 1 + >> >> arch/powerpc/platforms/powernv/eeh-vfio.c | 593 >> >> ++ >> >> drivers/vfio/vfio_iommu_spapr_tce.c | 12 + >> >> include/uapi/linux/vfio.h | 57 +++ >> >> 4 files changed, 663 insertions(+) >> >> create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c >> >> >> >> diff --git a/arch/powerpc/platforms/powernv/Makefile >> >> b/arch/powerpc/platforms/powernv/Makefile >> >> index 63cebb9..2b15a03 100644 >> >> --- a/arch/powerpc/platforms/powernv/Makefile >> >> +++ b/arch/powerpc/platforms/powernv/Makefile >> >> @@ -6,5 +6,6 @@ obj-y += opal-msglog.o >> >> obj-$(CONFIG_SMP)+= smp.o >> >> obj-$(CONFIG_PCI)+= pci.o pci-p5ioc2.o pci-ioda.o >> >> obj-$(CONFIG_EEH)+= eeh-ioda.o eeh-powernv.o >> >> +obj-$(CONFIG_VFIO_EEH) += eeh-vfio.o >> >> obj-$(CONFIG_PPC_SCOM) += opal-xscom.o >> >> obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o >> >> diff --git a/arch/powerpc/platforms/powernv/eeh-vfio.c >> >> b/arch/powerpc/platforms/powernv/eeh-vfio.c >> >> new file mode 100644 >> >> index 000..69d5f2d >> >> --- /dev/null >> >> +++ b/arch/powerpc/platforms/powernv/eeh-vfio.c >> >> @@ -0,0 +1,593 @@ >> >> +/* >> >> + * The file intends to support EEH funtionality for those PCI devices, >> >> + * which have been passed through from host to guest via VFIO. So this >> >> + * file is naturally part of VFIO implementation on PowerNV platform. >> >> + * >> >> + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014. >> >> + * >> >> + * This program is free software; you can redistribute it and/or modify >> >> + * it under the terms of the GNU General Public License as published by >> >> + * the Free Software Foundation; either version 2 of the License, or >> >> + * (at your option) any later version. >> >> + */ >> >> + >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> + >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> + >> >> +#include "powernv.h" >> >> +#include "pci.h" >> >> + >> >> +static int powernv_eeh_vfio_map(struct vfio_eeh_info *info) >> >> +{ >> >> + struct pci_bus *bus, *pe_bus; >> >> + struct pci_dev *pdev; >> >> + struct eeh_dev *edev; >> >> + struct eeh_pe *pe; >> >> + int domain, bus_no, devfn; >> >> + >> >> + /* Host address */ >> >> + domain = info->map.host_domain; >> >> + bus_no = (info->map.host_cfg_addr >> 8) & 0xff; >> >> + devfn = info->map.host_cfg_addr & 0xff; >> > >> >Where are we validating that the user has any legitimate claim to be >> >touching this device? >> > >> >> I'll let VFIO-PCI-dev fd handle the IOCTL command. With that, we shouldn't >> have the problem. >> >> >> + /* Find PCI bus */ >> >> + bus = pci_find_bus(domain, bus_no); >> >> + if (!bus) { >> >> + pr_warn("%s: PCI bus %04x:%02x not found\n", >> >> + __func__, domain, bus_no); >> >> + return -ENODEV; >> >> + } >> >> + >> >> + /* Find PCI device */ >> >> + pdev = pci_get_slot(bus, devfn); >> >> + if (!pdev) { >> >> + pr_warn("%s: PCI device %04x:%02x:%02x.%01x not found\n", >> >> + __func__, domain, bus_no, >> >> + PCI_SLOT(devfn), PCI_FUNC(devfn)); >> >> + return -ENODEV; >> >> + } >> >> + >> >> + /* No EEH device - almost impossible */ >> >> + edev = pci_dev_to_eeh_dev(pdev); >> >> + if (unlikely(!edev)) { >> >> + pci_dev_put(pdev); >> >> + pr_warn("%s: No EEH dev for PCI device %s\n", >> >> + __func__, pci_name(pdev)); >> >> + return -ENODEV; >> >> + } >> >> + >> >> + /* Doesn't
[PATCH 2/4] powerpc/eeh: Flags for passed device and PE
The patch introduces new flags for EEH device and PE to indicate that the device or PE has been passed through to guest. In turn, we will deliver EEH errors to guest for further handling, which will be done in subsequent patches. Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 7782056..34a2d83 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -72,6 +72,7 @@ struct device_node; #define EEH_PE_RESET (1 << 2)/* PE reset in progress */ #define EEH_PE_KEEP(1 << 8)/* Keep PE on hotplug */ +#define EEH_PE_PASSTHROUGH (1 << 9)/* PE owned by guest*/ struct eeh_pe { int type; /* PE type: PHB/Bus/Device */ @@ -93,6 +94,21 @@ struct eeh_pe { #define eeh_pe_for_each_dev(pe, edev, tmp) \ list_for_each_entry_safe(edev, tmp, &pe->edevs, list) +static inline bool eeh_pe_passed(struct eeh_pe *pe) +{ + return pe ? !!(pe->state & EEH_PE_PASSTHROUGH) : false; +} + +static inline void eeh_pe_set_passed(struct eeh_pe *pe, bool passed) +{ + if (pe) { + if (passed) + pe->state |= EEH_PE_PASSTHROUGH; + else + pe->state &= ~EEH_PE_PASSTHROUGH; + } +} + /* * The struct is used to trace EEH state for the associated * PCI device node or PCI device. In future, it might @@ -110,6 +126,7 @@ struct eeh_pe { #define EEH_DEV_SYSFS (1 << 9)/* Sysfs created*/ #define EEH_DEV_REMOVED(1 << 10) /* Removed permanently */ #define EEH_DEV_FRESET (1 << 11) /* Fundamental reset*/ +#define EEH_DEV_PASSTHROUGH(1 << 12) /* Owned by guest */ struct eeh_dev { int mode; /* EEH mode */ @@ -138,6 +155,21 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev) return edev ? edev->pdev : NULL; } +static inline bool eeh_dev_passed(struct eeh_dev *dev) +{ + return dev ? !!(dev->mode & EEH_DEV_PASSTHROUGH) : false; +} + +static inline void eeh_dev_set_passed(struct eeh_dev *dev, bool passed) +{ + if (dev) { + if (passed) + dev->mode |= EEH_DEV_PASSTHROUGH; + else + dev->mode &= ~EEH_DEV_PASSTHROUGH; + } +} + /* Return values from eeh_ops::next_error */ enum { EEH_NEXT_ERR_NONE = 0, -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] drivers/vfio: Introduce CONFIG_VFIO_PCI_EEH
The patch introduces CONFIG_VFIO_PCI_EEH for more IOCTL commands on VFIO PCI device support EEH funtionality for PCI devices that are passed through from host to guest. Signed-off-by: Gavin Shan --- drivers/vfio/pci/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index c41b01e..dd0e0f0 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -1,6 +1,7 @@ config VFIO_PCI tristate "VFIO support for PCI devices" depends on VFIO && PCI && EVENTFD + select VFIO_PCI_EEH if PPC_POWERNV help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. @@ -16,3 +17,8 @@ config VFIO_PCI_VGA BIOS and generic video drivers. If you don't know what to do here, say N. + +config VFIO_PCI_EEH + tristate + depends on EEH && VFIO_IOMMU_SPAPR_TCE + default n -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 5 files changed, 523 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 63cebb9..45cd833 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -6,5 +6,6 @@ obj-y += opal-msglog.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o obj-$(CONFIG_EEH) += eeh-ioda.o eeh-powernv.o +obj-$(CONFIG_VFIO_PCI_EEH) += eeh-vfio.o obj-$(CONFIG_PPC_SCOM) += opal-xscom.o obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o diff --git a/arch/powerpc/platforms/powernv/eeh-vfio.c b/arch/powerpc/platforms/powernv/eeh-vfio.c new file mode 100644 index 000..11adc55 --- /dev/null +++ b/arch/powerpc/platforms/powernv/eeh-vfio.c @@ -0,0 +1,445 @@ +/* + * The file intends to support EEH funtionality for those PCI devices, + * which have been passed through from host to guest via VFIO. So this + * file is naturally part of VFIO implementation on PowerNV platform. + * + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "pci.h" + +static int powernv_eeh_vfio_check_dev(struct pci_dev *pdev, + struct eeh_dev **pedev, + struct eeh_pe **ppe, + struct pnv_phb **pphb) +{ + struct eeh_dev *edev; + struct pnv_phb *phb; + + /* No device ? */ + if (!pdev) + return -ENODEV; + + edev = pci_dev_to_eeh_dev(pdev); + if (!edev || !eeh_dev_passed(edev) || + !edev->pe || !eeh_pe_passed(edev->pe)) + return -ENODEV; + + /* EEH isn't supported ? */ + phb = edev->phb->private_data; + if (!(phb->flags & PNV_PHB_FLAG_EEH)) + return -EACCES; + + if (pedev) + *pedev = edev; + if (ppe) + *ppe = edev->pe; + if (pphb) + *pphb = phb; + + return 0; +} + +static int powernv_eeh_vfio_set_option(struct pci_dev *pdev, + struct vfio_eeh_op *info) +{ + struct eeh_dev *edev; + struct eeh_pe *pe; + struct pnv_phb *phb; + int opcode = info->option.option; + int ret = 0; + + /* Device existing ? */ + ret = powernv_eeh_vfio_check_dev(pdev, &edev, &pe, &phb); + if (ret) { + pr_debug("%s: Cannot find device\n", + __func__); + info->option.ret = -7; + goto out; + } + + /* Invalid opcode ? */ + if (opcode < EEH_OPT_DISABLE || + opcode > EEH_OPT_THAW_DMA) { + pr_debug("%s: Opcode#%d out of range (%d, %d)\n", +__func__, opcode, EEH_OPT_DISABLE, EEH_OPT_THAW_DMA); + info->option.ret = -3; + ret = -EINVAL; + goto out; + } + + if (opcode == EEH_OPT_DISABLE || + opcode == EEH_OPT_ENABLE) { + info->option.ret = 0; + } else { + if (!phb->eeh_ops || !phb->eeh_ops->set_option) { + info->option.ret = -7; + ret = -ENOENT; + goto out; + } + + ret = phb->eeh_ops->set_option(pe, opcode); + if (ret) { + pr_debug("%s: Failure %d from backend\n", + __func__, ret); + info->option.ret = -3; + goto out; + } + + info->option.ret = 0; + } +out: + return ret; +} + +static int powernv_eeh_vfio_get_addr(struct pci_dev *pdev, +struct vfio_eeh_op *info) +{ + struct pci_bus *bus; + struct eeh_dev *edev; + struct eeh_pe *pe; + struct pnv_phb *phb; + int o
[PATCH 4/4] powerpc/eeh: Avoid event on passed PE
If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh.c | 8 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 9c6b899..6543f05 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev) if (ret > 0) return ret; + /* +* If the PE has been passed to guest, we won't check the +* state. Instead, let the guest handle it if the PE has +* been frozen. +*/ + if (eeh_pe_passed(pe)) + return 0; + /* If we already have a pending isolation event for this * slot, we know it's bad already, we don't need to check. * Do this checking under a lock; as multiple PCI devices diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 1b5982f..03a3ed2 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); ret = EEH_NEXT_ERR_NONE; - } else if ((*pe)->state & EEH_PE_ISOLATED) { + } else if ((*pe)->state & EEH_PE_ISOLATED || + eeh_pe_passed(*pe)) { ret = EEH_NEXT_ERR_NONE; } else { pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n", -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFCv4 0/4] EEH Support for VFIO PCI device
The series of patches intends to support EEH for PCI devices, which are passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly. If QEMU can't handle it, the request will be sent to host via newly introduced VFIO container IOCTL command (VFIO_EEH_OP) and gets handled in host kernel. The series of patches requires corresponding QEMU changes. Change log == v1 -> v2: * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl. The mechanism KVM in-kernel handling is dropped. * Error injection is reimplemented based syscall, instead of KVM in-kerenl handling. The logic for error injection token management is moved to QEMU. The error injection request is routed to QEMU and then possiblly to host kernel. v2 -> v3: * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based on the comments from Alexey. * Define macros for EEH VFIO operations (Alexey). * Clear frozen state after successful PE reset. * Merge original [PATCH 1/2/3] to one. v3 -> v4: * Remove the error injection from the patchset. Mike or I will work on that later. * Rename CONFIG_VFIO_EEH to VFIO_PCI_EEH. * Rename the IOCTL command to VFIO_EEH_OP and it's handled by VFIO-PCI device instead of VFIO container. * Rename the IOCTL argument structure to "vfio_eeh_op" accordingly. Also, more fields added to hold return values for RTAS requests. * The address mapping stuff is totally removed. When opening or releasing VFIO PCI device, notification sent to EEH to update the flags indicates the device is passed to guest or not. * Change pr_warn() to pr_debug() to avoid DOS as pointed by Alex.W * Argument size check issue pointed by Alex.W. Testing on P7 = - Emulex adapter Testing on P8 = Need testing for more after the code is finalized. The logic is required by P7/P8 machines shouldn't be different. Gavin Shan (4): drivers/vfio: Introduce CONFIG_VFIO_PCI_EEH powerpc/eeh: Flags for passed device and PE drivers/vfio: New IOCTL command VFIO_EEH_INFO powerpc/eeh: Avoid event on passed PE arch/powerpc/include/asm/eeh.h| 32 +++ arch/powerpc/kernel/eeh.c | 8 + arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +- arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/Kconfig | 6 + drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 9 files changed, 571 insertions(+), 7 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: Book3S PR: Expose TM registers
On Mon, May 19, 2014 at 03:09:07PM +0200, Alexander Graf wrote: > > On 17.05.14 08:20, Paul Mackerras wrote: > >On Tue, Apr 29, 2014 at 06:17:42PM +0200, Alexander Graf wrote: > >>POWER8 introduces transactional memory which brings along a number of new > >>registers and MSR bits. > >> > >>Implementing all of those is a pretty big headache, so for now let's at > >>least > >>emulate enough to make Linux's context switching code happy. > >[snip] > > > >>- if (!(vcpu->arch.fscr & (1ULL << fac))) { > >>+ /* We get TM interrupts only when EBB is disabled? Sigh. */ > >This comment doesn't make sense to me. Not every reason code reported > >in the high bits of FSCR corresponds directly to an enable bit in > >FSCR. In fact, of the 7 defined reason codes in POWER8, only three > >correspond to an enable bit... > > Is there any documentation on which relate to what? Yes, Power ISA v2.07 book 3S section 6.2.10 describes the FSCR enable bits and the interruption cause field. There are 6 cause values defined, of which 3 correspond to enable bits in the FSCR, and the other 3 correspond to things enabled/disabled in MMCR0 (usermode PMC anb BHRB access) or MSR (TM stuff). Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On 20.05.14 10:28, Gavin Shan wrote: On Mon, May 19, 2014 at 06:37:24PM -0600, Alex Williamson wrote: On Tue, 2014-05-20 at 10:22 +1000, Gavin Shan wrote: On Mon, May 19, 2014 at 04:33:10PM -0600, Alex Williamson wrote: On Wed, 2014-05-14 at 14:11 +1000, Gavin Shan wrote: The patch adds new IOCTL command VFIO_EEH_INFO to VFIO container to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. [...] diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index cb9023d..1fd1bfb 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -455,6 +455,63 @@ struct vfio_iommu_spapr_tce_info { #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) +/* + * The VFIO EEH info struct provides way to support EEH functionality + * for PCI device that is passed from host to guest via VFIO. + */ +#define VFIO_EEH_OP_MAP0 +#define VFIO_EEH_OP_UNMAP 1 +#define VFIO_EEH_OP_SET_OPTION 2 +#define VFIO_EEH_OP_GET_ADDR 3 +#define VFIO_EEH_OP_GET_STATE 4 +#define VFIO_EEH_OP_PE_RESET 5 +#define VFIO_EEH_OP_PE_CONFIG 6 Is this really an "info" ioctl? Yeah, "VFIO_EEH_INFO" isn't a good name. How about to have "VFIO_EEH_HANDLER" ? VFIO_EEH_OP perhaps. Thanks, Ok. Will rename it to VFIO_EEH_OP in next revision. Is there any benefit of a multiplexing EEH ioctl over just 7 individual ioctls? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, May 20, 2014 at 12:02:22PM +0200, Alexander Graf wrote: > >On 20.05.14 10:28, Gavin Shan wrote: >>On Mon, May 19, 2014 at 06:37:24PM -0600, Alex Williamson wrote: >>>On Tue, 2014-05-20 at 10:22 +1000, Gavin Shan wrote: On Mon, May 19, 2014 at 04:33:10PM -0600, Alex Williamson wrote: >On Wed, 2014-05-14 at 14:11 +1000, Gavin Shan wrote: >>The patch adds new IOCTL command VFIO_EEH_INFO to VFIO container >>to support EEH functionality for PCI devices, which have been >>passed from host to guest via VFIO. > >[...] > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index cb9023d..1fd1bfb 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -455,6 +455,63 @@ struct vfio_iommu_spapr_tce_info { #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) +/* + * The VFIO EEH info struct provides way to support EEH functionality + * for PCI device that is passed from host to guest via VFIO. + */ +#define VFIO_EEH_OP_MAP0 +#define VFIO_EEH_OP_UNMAP 1 +#define VFIO_EEH_OP_SET_OPTION 2 +#define VFIO_EEH_OP_GET_ADDR 3 +#define VFIO_EEH_OP_GET_STATE 4 +#define VFIO_EEH_OP_PE_RESET 5 +#define VFIO_EEH_OP_PE_CONFIG 6 >Is this really an "info" ioctl? > Yeah, "VFIO_EEH_INFO" isn't a good name. How about to have "VFIO_EEH_HANDLER" ? >>>VFIO_EEH_OP perhaps. Thanks, >>> >>Ok. Will rename it to VFIO_EEH_OP in next revision. > >Is there any benefit of a multiplexing EEH ioctl over just 7 >individual ioctls? > One benefit is pass one data struct to the real handler implemented in arch/powerpc/platforms/powernv/eeh-vfio.c. With 7 commands, we either passing "void *arg" + "command" to the handler, or export 7 functions with EXPORT_SYMBOL_GPL(). I just send RFCv4 out. VFIO_EEH_OP_{MAP, UNMAP} is removed there. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On 20.05.14 10:30, Gavin Shan wrote: The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 5 files changed, 523 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c Why doesn't this code live inside the vfio module? If I don't load the vfio module, I don't need that code to waste memory in my kernel, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On 20.05.14 10:30, Gavin Shan wrote: If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan How does the guest learn about this failure? We'd need to inject an error into it, no? I think what you want is an irqfd that the in-kernel eeh code notifies when it sees a failure. When such an fd exists, the kernel skips its own error handling. Alex --- arch/powerpc/kernel/eeh.c | 8 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 9c6b899..6543f05 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev) if (ret > 0) return ret; + /* +* If the PE has been passed to guest, we won't check the +* state. Instead, let the guest handle it if the PE has +* been frozen. +*/ + if (eeh_pe_passed(pe)) + return 0; + /* If we already have a pending isolation event for this * slot, we know it's bad already, we don't need to check. * Do this checking under a lock; as multiple PCI devices diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 1b5982f..03a3ed2 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); ret = EEH_NEXT_ERR_NONE; - } else if ((*pe)->state & EEH_PE_ISOLATED) { + } else if ((*pe)->state & EEH_PE_ISOLATED || + eeh_pe_passed(*pe)) { ret = EEH_NEXT_ERR_NONE; } else { pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n", -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On 20.05.14 13:21, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 5 files changed, 523 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c Why doesn't this code live inside the vfio module? If I don't load the vfio module, I don't need that code to waste memory in my kernel, no? So I think from a modeling point of view, you want VFIO code that calls reasonably generic helpers inside the kernel to deal with errors. The "generic helpers" don't have anything to do with VFIO. Everything that interfaces via ioctls with user space is 100% VFIO code. The latter should be tristate inside vfio.ko, the former can be =y. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, May 20, 2014 at 01:28:40PM +0200, Alexander Graf wrote: > >On 20.05.14 13:21, Alexander Graf wrote: >> >>On 20.05.14 10:30, Gavin Shan wrote: >>>The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device >>>to support EEH functionality for PCI devices, which have been >>>passed from host to guest via VFIO. >>> >>>Signed-off-by: Gavin Shan >>>--- >>> arch/powerpc/platforms/powernv/Makefile | 1 + >>> arch/powerpc/platforms/powernv/eeh-vfio.c | 445 >>>++ >>> drivers/vfio/pci/vfio_pci.c | 24 +- >>> drivers/vfio/pci/vfio_pci_private.h | 16 ++ >>> include/uapi/linux/vfio.h | 43 +++ >>> 5 files changed, 523 insertions(+), 6 deletions(-) >>> create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c >> >>Why doesn't this code live inside the vfio module? If I don't load >>the vfio module, I don't need that code to waste memory in my >>kernel, no? Yes, It saves some memory. > >So I think from a modeling point of view, you want VFIO code that >calls reasonably generic helpers inside the kernel to deal with >errors. > >The "generic helpers" don't have anything to do with VFIO. Everything >that interfaces via ioctls with user space is 100% VFIO code. > >The latter should be tristate inside vfio.ko, the former can be =y. > The main reason I put eeh-vfio.c to arch/powerpc/platforms/powernv/ is the source file needs access data structures (struct pnv_phb) defined in "pci.h" under that directory. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On 20.05.14 13:40, Gavin Shan wrote: On Tue, May 20, 2014 at 01:28:40PM +0200, Alexander Graf wrote: On 20.05.14 13:21, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 5 files changed, 523 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c Why doesn't this code live inside the vfio module? If I don't load the vfio module, I don't need that code to waste memory in my kernel, no? Yes, It saves some memory. So I think from a modeling point of view, you want VFIO code that calls reasonably generic helpers inside the kernel to deal with errors. The "generic helpers" don't have anything to do with VFIO. Everything that interfaces via ioctls with user space is 100% VFIO code. The latter should be tristate inside vfio.ko, the former can be =y. The main reason I put eeh-vfio.c to arch/powerpc/platforms/powernv/ is the source file needs access data structures (struct pnv_phb) defined in "pci.h" under that directory. Then create a good in-kernel framework from that directory and make use of it from the VFIO code :). But please don't mesh together VFIO, powernv EEH handling and RTAS. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: Book3S PR: Expose TM registers
On 20.05.2014, at 11:59, Paul Mackerras wrote: > On Mon, May 19, 2014 at 03:09:07PM +0200, Alexander Graf wrote: >> >> On 17.05.14 08:20, Paul Mackerras wrote: >>> On Tue, Apr 29, 2014 at 06:17:42PM +0200, Alexander Graf wrote: POWER8 introduces transactional memory which brings along a number of new registers and MSR bits. Implementing all of those is a pretty big headache, so for now let's at least emulate enough to make Linux's context switching code happy. >>> [snip] >>> - if (!(vcpu->arch.fscr & (1ULL << fac))) { + /* We get TM interrupts only when EBB is disabled? Sigh. */ >>> This comment doesn't make sense to me. Not every reason code reported >>> in the high bits of FSCR corresponds directly to an enable bit in >>> FSCR. In fact, of the 7 defined reason codes in POWER8, only three >>> correspond to an enable bit... >> >> Is there any documentation on which relate to what? > > Yes, Power ISA v2.07 book 3S section 6.2.10 describes the FSCR enable > bits and the interruption cause field. There are 6 cause values > defined, of which 3 correspond to enable bits in the FSCR, and the > other 3 correspond to things enabled/disabled in MMCR0 (usermode PMC > anb BHRB access) or MSR (TM stuff). I see. How's this? Alex commit a8e53f5f5e6c5d99363ad0d695a9ee520e1d262d Author: Alexander Graf Date: Tue Apr 29 17:54:40 2014 +0200 KVM: PPC: Book3S PR: Expose TM registers POWER8 introduces transactional memory which brings along a number of new registers and MSR bits. Implementing all of those is a pretty big headache, so for now let's at least emulate enough to make Linux's context switching code happy. Signed-off-by: Alexander Graf --- v1 -> v2: - move to book3s_64 only section - restrict to CONFIG_PPC_TRANSACTIONAL_MEM v2 -> v3: - check MSR.TM for TM enablement inside the guest diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index e1165ba..9bdff15 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -451,6 +451,17 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_EBBRR: vcpu->arch.ebbrr = spr_val; break; +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + case SPRN_TFHAR: + vcpu->arch.tfhar = spr_val; + break; + case SPRN_TEXASR: + vcpu->arch.texasr = spr_val; + break; + case SPRN_TFIAR: + vcpu->arch.tfiar = spr_val; + break; +#endif #endif case SPRN_ICTC: case SPRN_THRM1: @@ -572,6 +583,17 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val case SPRN_EBBRR: *spr_val = vcpu->arch.ebbrr; break; +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + case SPRN_TFHAR: + *spr_val = vcpu->arch.tfhar; + break; + case SPRN_TEXASR: + *spr_val = vcpu->arch.texasr; + break; + case SPRN_TFIAR: + *spr_val = vcpu->arch.tfiar; + break; +#endif #endif case SPRN_THRM1: case SPRN_THRM2: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 7d27a95..23367a7 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -794,9 +794,27 @@ static void kvmppc_emulate_fac(struct kvm_vcpu *vcpu, ulong fac) /* Enable facilities (TAR, EBB, DSCR) for the guest */ static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong fac) { + bool guest_fac_enabled; BUG_ON(!cpu_has_feature(CPU_FTR_ARCH_207S)); - if (!(vcpu->arch.fscr & (1ULL << fac))) { + /* +* Not every facility is enabled by FSCR bits, check whether the +* guest has this facility enabled at all. +*/ + switch (fac) { + case FSCR_TAR_LG: + case FSCR_EBB_LG: + guest_fac_enabled = (vcpu->arch.fscr & (1ULL << fac)); + break; + case FSCR_TM_LG: + guest_fac_enabled = kvmppc_get_msr(vcpu) & MSR_TM; + break; + default: + guest_fac_enabled = false; + break; + } + + if (!guest_fac_enabled) { /* Facility not enabled by the guest */ kvmppc_trigger_fac_interrupt(vcpu, fac); return RESUME_GUEST;-- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote: > >On 20.05.14 10:30, Gavin Shan wrote: >>If we detects frozen state on PE that has been passed to guest, we >>needn't handle it. Instead, we rely on the guest to detect and recover >>it. The patch avoid EEH event on the frozen passed PE so that the guest >>can have chance to handle that. >> >>Signed-off-by: Gavin Shan > >How does the guest learn about this failure? We'd need to inject an >error into it, no? > When error is existing in HW level, 0xFF's will be turned on reading PCI config space or memory BARs. Guest retrieves the failure state, which is captured by HW automatically, via RTAS call "ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW, the guest kernel starts to recovery. It can be called as "passive" reporting. There possible has one case that the error can't be reported for ever: No device driver binding to the VFIO PCI device and no access to device's config space and memory BARs. However, it doesn't matter. As we don't use the device, we needn't detect and recover the error at all. >I think what you want is an irqfd that the in-kernel eeh code >notifies when it sees a failure. When such an fd exists, the kernel >skips its own error handling. > Yeah, it's a good idea and something for me to improve in phase II. We can discuss for more later. For now, what I have in my head is something like this: [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU | -(A)- | Send one EEH event to guest kernel | Guest kernel starts the recovery (A): I didn't figure out one convienent way to do the EEH event injection yet. Thanks, Gavin >>--- >> arch/powerpc/kernel/eeh.c | 8 >> arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- >> 2 files changed, 10 insertions(+), 1 deletion(-) >> >>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c >>index 9c6b899..6543f05 100644 >>--- a/arch/powerpc/kernel/eeh.c >>+++ b/arch/powerpc/kernel/eeh.c >>@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev) >> if (ret > 0) >> return ret; >>+ /* >>+ * If the PE has been passed to guest, we won't check the >>+ * state. Instead, let the guest handle it if the PE has >>+ * been frozen. >>+ */ >>+ if (eeh_pe_passed(pe)) >>+ return 0; >>+ >> /* If we already have a pending isolation event for this >> * slot, we know it's bad already, we don't need to check. >> * Do this checking under a lock; as multiple PCI devices >>diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c >>b/arch/powerpc/platforms/powernv/eeh-ioda.c >>index 1b5982f..03a3ed2 100644 >>--- a/arch/powerpc/platforms/powernv/eeh-ioda.c >>+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c >>@@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) >> opal_pci_eeh_freeze_clear(phb->opal_id, >> frozen_pe_no, >> OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); >> ret = EEH_NEXT_ERR_NONE; >>- } else if ((*pe)->state & EEH_PE_ISOLATED) { >>+ } else if ((*pe)->state & EEH_PE_ISOLATED || >>+eeh_pe_passed(*pe)) { >> ret = EEH_NEXT_ERR_NONE; >> } else { >> pr_err("EEH: Frozen PHB#%x-PE#%x (%s) >> detected\n", > -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On 20.05.14 13:56, Gavin Shan wrote: On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan How does the guest learn about this failure? We'd need to inject an error into it, no? When error is existing in HW level, 0xFF's will be turned on reading PCI config space or memory BARs. Guest retrieves the failure state, which is captured by HW automatically, via RTAS call "ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW, the guest kernel starts to recovery. It can be called as "passive" reporting. There possible has one case that the error can't be reported for ever: No device driver binding to the VFIO PCI device and no access to device's config space and memory BARs. However, it doesn't matter. As we don't use the device, we needn't detect and recover the error at all. So if the guest is waiting for an interrupt to happen it will wait forever? Not really nice. I think what you want is an irqfd that the in-kernel eeh code notifies when it sees a failure. When such an fd exists, the kernel skips its own error handling. Yeah, it's a good idea and something for me to improve in phase II. We can discuss for more later. I think it makes sense to at least walk into that direction immediately. The reason I brought it up in the context of this patch is that with an irqfd you wouldn't need the passed flag at all. For now, what I have in my head is something like this: [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU | -(A)- | Send one EEH event to guest kernel | Guest kernel starts the recovery (A): I didn't figure out one convienent way to do the EEH event injection yet. How does the guest learn about errors in pHyp? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, May 20, 2014 at 01:44:21PM +0200, Alexander Graf wrote: > >On 20.05.14 13:40, Gavin Shan wrote: >>On Tue, May 20, 2014 at 01:28:40PM +0200, Alexander Graf wrote: >>>On 20.05.14 13:21, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: >The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device >to support EEH functionality for PCI devices, which have been >passed from host to guest via VFIO. > >Signed-off-by: Gavin Shan >--- > arch/powerpc/platforms/powernv/Makefile | 1 + > arch/powerpc/platforms/powernv/eeh-vfio.c | 445 >++ > drivers/vfio/pci/vfio_pci.c | 24 +- > drivers/vfio/pci/vfio_pci_private.h | 16 ++ > include/uapi/linux/vfio.h | 43 +++ > 5 files changed, 523 insertions(+), 6 deletions(-) > create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c Why doesn't this code live inside the vfio module? If I don't load the vfio module, I don't need that code to waste memory in my kernel, no? >>Yes, It saves some memory. >> >>>So I think from a modeling point of view, you want VFIO code that >>>calls reasonably generic helpers inside the kernel to deal with >>>errors. >>> >>>The "generic helpers" don't have anything to do with VFIO. Everything >>>that interfaces via ioctls with user space is 100% VFIO code. >>> >>>The latter should be tristate inside vfio.ko, the former can be =y. >>> >>The main reason I put eeh-vfio.c to arch/powerpc/platforms/powernv/ is >>the source file needs access data structures (struct pnv_phb) defined >>in "pci.h" under that directory. > >Then create a good in-kernel framework from that directory and make >use of it from the VFIO code :). But please don't mesh together VFIO, >powernv EEH handling and RTAS. > Yeah. How about this? :-) - Move eeh-vfio.c to drivers/vfio/pci/ - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call to the corresponding callbacks in "eeh_ops" based on incoming RTAS request. The file would be renamed to "vfio_eeh.c" as well after moving to VFIO driver directory. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On 20.05.14 14:21, Gavin Shan wrote: On Tue, May 20, 2014 at 01:44:21PM +0200, Alexander Graf wrote: On 20.05.14 13:40, Gavin Shan wrote: On Tue, May 20, 2014 at 01:28:40PM +0200, Alexander Graf wrote: On 20.05.14 13:21, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/eeh-vfio.c | 445 ++ drivers/vfio/pci/vfio_pci.c | 24 +- drivers/vfio/pci/vfio_pci_private.h | 16 ++ include/uapi/linux/vfio.h | 43 +++ 5 files changed, 523 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c Why doesn't this code live inside the vfio module? If I don't load the vfio module, I don't need that code to waste memory in my kernel, no? Yes, It saves some memory. So I think from a modeling point of view, you want VFIO code that calls reasonably generic helpers inside the kernel to deal with errors. The "generic helpers" don't have anything to do with VFIO. Everything that interfaces via ioctls with user space is 100% VFIO code. The latter should be tristate inside vfio.ko, the former can be =y. The main reason I put eeh-vfio.c to arch/powerpc/platforms/powernv/ is the source file needs access data structures (struct pnv_phb) defined in "pci.h" under that directory. Then create a good in-kernel framework from that directory and make use of it from the VFIO code :). But please don't mesh together VFIO, powernv EEH handling and RTAS. Yeah. How about this? :-) - Move eeh-vfio.c to drivers/vfio/pci/ - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call Hrm, I think it'd be nicer to just export individual functions that do thing you want to do from eeh.c. Alex to the corresponding callbacks in "eeh_ops" based on incoming RTAS request. The file would be renamed to "vfio_eeh.c" as well after moving to VFIO driver directory. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, May 20, 2014 at 02:25:45PM +0200, Alexander Graf wrote: > >On 20.05.14 14:21, Gavin Shan wrote: >>On Tue, May 20, 2014 at 01:44:21PM +0200, Alexander Graf wrote: >>>On 20.05.14 13:40, Gavin Shan wrote: On Tue, May 20, 2014 at 01:28:40PM +0200, Alexander Graf wrote: >On 20.05.14 13:21, Alexander Graf wrote: >>On 20.05.14 10:30, Gavin Shan wrote: >>>The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device >>>to support EEH functionality for PCI devices, which have been >>>passed from host to guest via VFIO. >>> >>>Signed-off-by: Gavin Shan >>>--- >>> arch/powerpc/platforms/powernv/Makefile | 1 + >>> arch/powerpc/platforms/powernv/eeh-vfio.c | 445 >>>++ >>> drivers/vfio/pci/vfio_pci.c | 24 +- >>> drivers/vfio/pci/vfio_pci_private.h | 16 ++ >>> include/uapi/linux/vfio.h | 43 +++ >>> 5 files changed, 523 insertions(+), 6 deletions(-) >>> create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c >>Why doesn't this code live inside the vfio module? If I don't load >>the vfio module, I don't need that code to waste memory in my >>kernel, no? Yes, It saves some memory. >So I think from a modeling point of view, you want VFIO code that >calls reasonably generic helpers inside the kernel to deal with >errors. > >The "generic helpers" don't have anything to do with VFIO. Everything >that interfaces via ioctls with user space is 100% VFIO code. > >The latter should be tristate inside vfio.ko, the former can be =y. > The main reason I put eeh-vfio.c to arch/powerpc/platforms/powernv/ is the source file needs access data structures (struct pnv_phb) defined in "pci.h" under that directory. >>>Then create a good in-kernel framework from that directory and make >>>use of it from the VFIO code :). But please don't mesh together VFIO, >>>powernv EEH handling and RTAS. >>> >>Yeah. How about this? :-) >> >>- Move eeh-vfio.c to drivers/vfio/pci/ >>- From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which >> is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call > >Hrm, I think it'd be nicer to just export individual functions that >do thing you want to do from eeh.c. > Ok. Got it. Thanks for your comments :) Thanks, Gavin > >Alex > >> to the corresponding callbacks in "eeh_ops" based on incoming RTAS request. >> >> The file would be renamed to "vfio_eeh.c" as well after moving to VFIO >> driver directory. >> >>Thanks, >>Gavin >> >>-- >>To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >-- >To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Tue, May 20, 2014 at 02:14:56PM +0200, Alexander Graf wrote: > >On 20.05.14 13:56, Gavin Shan wrote: >>On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote: >>>On 20.05.14 10:30, Gavin Shan wrote: If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan >>>How does the guest learn about this failure? We'd need to inject an >>>error into it, no? >>> >>When error is existing in HW level, 0xFF's will be turned on reading >>PCI config space or memory BARs. Guest retrieves the failure state, >>which is captured by HW automatically, via RTAS call >>"ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config >>space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW, >>the guest kernel starts to recovery. >> >>It can be called as "passive" reporting. There possible has one case that >>the error can't be reported for ever: No device driver binding to the VFIO >>PCI device and no access to device's config space and memory BARs. However, >>it doesn't matter. As we don't use the device, we needn't detect and recover >>the error at all. > >So if the guest is waiting for an interrupt to happen it will wait >forever? Not really nice. > Nope, the error reporting in guest isn't interrupt-driven. It's always "polling" :-) >>>I think what you want is an irqfd that the in-kernel eeh code >>>notifies when it sees a failure. When such an fd exists, the kernel >>>skips its own error handling. >>> >>Yeah, it's a good idea and something for me to improve in phase II. We >>can discuss for more later. > >I think it makes sense to at least walk into that direction >immediately. The reason I brought it up in the context of this patch >is that with an irqfd you wouldn't need the passed flag at all. > I don't see how it can avoid the "passed" flag. Without the flag, any PCI config and memory BAR access on host side could trigger EEH recovery for those PCI devices passed to guest. That's unexpected behaviour. For host, we have 2 ways to report errors: interrupt driven and polling. For the guest, we only have "polling" :-) >> For now, what I have in my head is something >>like this: >> >> [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU >>| >>-(A)- >>| >> Send one EEH event to guest kernel >>| >> Guest kernel starts the recovery >> >>(A): I didn't figure out one convienent way to do the EEH event injection yet. > >How does the guest learn about errors in pHyp? > It relies on "polling". Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On 20.05.14 14:45, Gavin Shan wrote: On Tue, May 20, 2014 at 02:14:56PM +0200, Alexander Graf wrote: On 20.05.14 13:56, Gavin Shan wrote: On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote: On 20.05.14 10:30, Gavin Shan wrote: If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan How does the guest learn about this failure? We'd need to inject an error into it, no? When error is existing in HW level, 0xFF's will be turned on reading PCI config space or memory BARs. Guest retrieves the failure state, which is captured by HW automatically, via RTAS call "ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW, the guest kernel starts to recovery. It can be called as "passive" reporting. There possible has one case that the error can't be reported for ever: No device driver binding to the VFIO PCI device and no access to device's config space and memory BARs. However, it doesn't matter. As we don't use the device, we needn't detect and recover the error at all. So if the guest is waiting for an interrupt to happen it will wait forever? Not really nice. Nope, the error reporting in guest isn't interrupt-driven. It's always "polling" :-) That sucks :). I think what you want is an irqfd that the in-kernel eeh code notifies when it sees a failure. When such an fd exists, the kernel skips its own error handling. Yeah, it's a good idea and something for me to improve in phase II. We can discuss for more later. I think it makes sense to at least walk into that direction immediately. The reason I brought it up in the context of this patch is that with an irqfd you wouldn't need the passed flag at all. I don't see how it can avoid the "passed" flag. Without the flag, any PCI config and memory BAR access on host side could trigger EEH recovery for those PCI devices passed to guest. That's unexpected behaviour. Instead of if (passed_flag) return; you would do if (trigger_irqfd) { trigger_irqfd(); return; } which would be a much nicer, generic interface. For host, we have 2 ways to report errors: interrupt driven and polling. For the guest, we only have "polling" :-) And the interrupt path is powernv specific? Does sPAPR specify anything here? For now, what I have in my head is something like this: [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU | -(A)- | Send one EEH event to guest kernel | Guest kernel starts the recovery (A): I didn't figure out one convienent way to do the EEH event injection yet. How does the guest learn about errors in pHyp? It relies on "polling". Sigh ;). So how about we just implement this whole thing properly as irqfd? Whether QEMU can actually do anything with the interrupt is a different question - we can leave it be for now. But we could model all the code with the assumption that it should either handle the error itself or trigger and irqfd write. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Tue, 2014-05-20 at 21:56 +1000, Gavin Shan wrote: .../... > >I think what you want is an irqfd that the in-kernel eeh code > >notifies when it sees a failure. When such an fd exists, the kernel > >skips its own error handling. > > > > Yeah, it's a good idea and something for me to improve in phase II. We > can discuss for more later. For now, what I have in my head is something > like this: However, this would be a deviation from (or extension of) PAPR. At the moment, the way things work in PAPR is that the guest is responsible for querying the EEH state when something "looks" like an error (ie, getting ff's back). This is also how it works in pHyp. We have an interrupt path in the host when doing "native" EEH, and it would be nice to extend PAPR to also be able to shoot an event to the guest possibly using RTAS events, but let's get the basics working and upstream first. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: > Instead of > >if (passed_flag) > return; > > you would do > >if (trigger_irqfd) { > trigger_irqfd(); > return; >} > > which would be a much nicer, generic interface. But that's not how PAPR works. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: > So how about we just implement this whole thing properly as irqfd? > Whether QEMU can actually do anything with the interrupt is a different > question - we can leave it be for now. But we could model all the code > with the assumption that it should either handle the error itself or > trigger and irqfd write. I don't object to the idea... however this smells of Deja Vu... You often tend to want to turn something submitted that fills a specific gap and implements a specific spec/function into some kind of idealized grand design :-) And that means nothing gets upstream for weeks or monthes as we churn and churn... Sometimes it's probably worth it. Here I would argue against it and would advocate for doing the basic functionality first, as it is used by guests, and later add the irqfd option. I don't see any emergency here and adding the irqfd will not cause fundamental design changes: The "passed" flag (though I'm not fan of the name) is really something we want in the low level handlers to avoid triggering host side EEH in various places, regardless of whether we use irqfd or not. This is totally orthogonal from the mechanism used for notifications. Even in host, the detection path doesn't always involve interrupts, and we can detect some things as a result of a host side config space access for example etc... So let's keep things nice and separate here. The interrupt notification is just an "optimization" which will speed up discovery of the error in *some* cases later on (but adds its own complexity since we have multiple discovery path in host, so we need to keep track whether we have notified yet or not etc...) so let's keep it for later. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, 2014-05-20 at 14:25 +0200, Alexander Graf wrote: > > - Move eeh-vfio.c to drivers/vfio/pci/ > > - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, > which > >is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. > Call > > Hrm, I think it'd be nicer to just export individual functions that do > thing you want to do from eeh.c. We already have an eeh_ops backend system with callbacks since we have different backends for RTAS and powernv, so we could do what you suggest but it would probably just boil down to wrappers around the EEH ops. No big opinion either way on my side though. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Tue, 2014-05-20 at 22:39 +1000, Gavin Shan wrote: > >>Yeah. How about this? :-) > >> > >>- Move eeh-vfio.c to drivers/vfio/pci/ > >>- From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which > >> is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call > > > >Hrm, I think it'd be nicer to just export individual functions that > >do thing you want to do from eeh.c. > > > > Ok. Got it. Thanks for your comments :) The interesting thing with this approach is that VFIO per-se can work with EEH RTAS backend too in the host. IE, with PR KVM for example or with non-KVM uses of VFIO, it would be possible to use a device in a user process and exploit EEH even when running under a PAPR hypervisor. That is, vfio-eeh uses "generic" exported EEH APIs from the EEH core that will work on both powernv and RTAS backends. Note to Alex: This definitely kills the notifier idea for now though, at least as a first class citizen of the design. We can add it as an optional optimization on top later. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
On Wed, May 21, 2014 at 10:23:52AM +1000, Benjamin Herrenschmidt wrote: >On Tue, 2014-05-20 at 22:39 +1000, Gavin Shan wrote: >> >>Yeah. How about this? :-) >> >> >> >>- Move eeh-vfio.c to drivers/vfio/pci/ >> >>- From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which >> >> is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call >> > >> >Hrm, I think it'd be nicer to just export individual functions that >> >do thing you want to do from eeh.c. >> > >> >> Ok. Got it. Thanks for your comments :) > >The interesting thing with this approach is that VFIO per-se can work >with EEH RTAS backend too in the host. > Yeah, it's another benefit for the approach. I need think about it later. Thanks, Ben. It's really good point. >IE, with PR KVM for example or with non-KVM uses of VFIO, it would be >possible to use a device in a user process and exploit EEH even when >running under a PAPR hypervisor. > >That is, vfio-eeh uses "generic" exported EEH APIs from the EEH core >that will work on both powernv and RTAS backends. > >Note to Alex: This definitely kills the notifier idea for now though, >at least as a first class citizen of the design. We can add it as an >optional optimization on top later. > Yeah, I'll address the notifier later. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
On Wed, May 21, 2014 at 10:12:11AM +1000, Benjamin Herrenschmidt wrote: >On Tue, 2014-05-20 at 21:56 +1000, Gavin Shan wrote: > > .../... > >> >I think what you want is an irqfd that the in-kernel eeh code >> >notifies when it sees a failure. When such an fd exists, the kernel >> >skips its own error handling. >> > >> >> Yeah, it's a good idea and something for me to improve in phase II. We >> can discuss for more later. For now, what I have in my head is something >> like this: > >However, this would be a deviation from (or extension of) PAPR. At the >moment, the way things work in PAPR is that the guest is responsible for >querying the EEH state when something "looks" like an error (ie, getting >ff's back). This is also how it works in pHyp. > >We have an interrupt path in the host when doing "native" EEH, and it >would be nice to extend PAPR to also be able to shoot an event to the >guest possibly using RTAS events, but let's get the basics working and >upstream first. > Got it. Thanks, Ben :-) Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/4] EEH Support for VFIO PCI device
The series of patches intends to support EEH for PCI devices, which are passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly. If QEMU can't handle it, the request will be sent to host via newly introduced VFIO container IOCTL command (VFIO_EEH_OP) and gets handled in host kernel. The series of patches requires corresponding QEMU changes. Change log == v1 -> v2: * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl. The mechanism KVM in-kernel handling is dropped. * Error injection is reimplemented based syscall, instead of KVM in-kerenl handling. The logic for error injection token management is moved to QEMU. The error injection request is routed to QEMU and then possiblly to host kernel. v2 -> v3: * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based on the comments from Alexey. * Define macros for EEH VFIO operations (Alexey). * Clear frozen state after successful PE reset. * Merge original [PATCH 1/2/3] to one. v3 -> v4: * Remove the error injection from the patchset. Mike or I will work on that later. * Rename CONFIG_VFIO_EEH to VFIO_PCI_EEH. * Rename the IOCTL command to VFIO_EEH_OP and it's handled by VFIO-PCI device instead of VFIO container. * Rename the IOCTL argument structure to "vfio_eeh_op" accordingly. Also, more fields added to hold return values for RTAS requests. * The address mapping stuff is totally removed. When opening or releasing VFIO PCI device, notification sent to EEH to update the flags indicates the device is passed to guest or not. * Change pr_warn() to pr_debug() to avoid DOS as pointed by Alex.W * Argument size check issue pointed by Alex.W. v4 -> v5: * Functions for VFIO PCI EEH support are moved to eeh.c and exported from there. VFIO PCI driver just uses those functions to tackle IOCTL command VFIO_EEH_OP. All of this is to make the code organized in a good way as suggested by Alex.G. Another potential benefit is PowerNV/pSeries are sharing "eeh_ops" and same infrastructure could possiblly work for KVM_PR and KVM_HV mode at the same time. * Don't clear error injection registers after finishing PE reset as the patchset is doing nothing related to error injection. * Amending Documentation/vfio.txt, which was missed in last revision. * No QEMU changes for this revision. "v4" works well. Also, remove "RFC" from the subject as the design is basically recognized. Gavin Shan (4): drivers/vfio: Introduce CONFIG_VFIO_PCI_EEH powerpc/eeh: Flags for passed device and PE drivers/vfio: EEH support for VFIO PCI device powerpc/eeh: Avoid event on passed PE Documentation/vfio.txt| 6 +- arch/powerpc/include/asm/eeh.h| 42 arch/powerpc/kernel/eeh.c | 331 ++ arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +- drivers/vfio/pci/Kconfig | 6 + drivers/vfio/pci/vfio_pci.c | 99 - include/uapi/linux/vfio.h | 43 7 files changed, 522 insertions(+), 8 deletions(-) -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/4] drivers/vfio: EEH support for VFIO PCI device
The patch adds new IOCTL command VFIO_EEH_OP to VFIO PCI device to support EEH functionality for PCI devices, which have been passed from host to guest via VFIO. Signed-off-by: Gavin Shan --- Documentation/vfio.txt | 6 +- arch/powerpc/include/asm/eeh.h | 10 ++ arch/powerpc/kernel/eeh.c | 323 + drivers/vfio/pci/vfio_pci.c| 99 - include/uapi/linux/vfio.h | 43 ++ 5 files changed, 474 insertions(+), 7 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index b9ca023..bb17ec7 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -305,7 +305,10 @@ faster, the map/unmap handling has been implemented in real mode which provides an excellent performance which has limitations such as inability to do locked pages accounting in real time. -So 3 additional ioctls have been added: +4) PPC64 guests detect PCI errors and recover from them via EEH RTAS services, +which works on the basis of additional ioctl command VFIO_EEH_OP. + +So 4 additional ioctls have been added: VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start of the DMA window on the PCI bus. @@ -316,6 +319,7 @@ So 3 additional ioctls have been added: VFIO_IOMMU_DISABLE - disables the container. + VFIO_EEH_OP - EEH dependent operations The code flow from the example above should be slightly changed: diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 34a2d83..93922ef 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -305,6 +305,16 @@ void eeh_add_device_late(struct pci_dev *); void eeh_add_device_tree_late(struct pci_bus *); void eeh_add_sysfs_files(struct pci_bus *); void eeh_remove_device(struct pci_dev *); +#ifdef CONFIG_VFIO_PCI_EEH +int eeh_vfio_open(struct pci_dev *pdev); +void eeh_vfio_release(struct pci_dev *pdev); +int eeh_vfio_set_pe_option(struct pci_dev *pdev, int option, int *retval); +int eeh_vfio_get_pe_addr(struct pci_dev *pdev, int option, +int *retval, int *info); +int eeh_vfio_get_pe_state(struct pci_dev *pdev, int *retval, int *state); +int eeh_vfio_reset_pe(struct pci_dev *pdev, int option, int *retval); +int eeh_vfio_configure_pe(struct pci_dev *pdev, int *retval); +#endif /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 9c6b899..2aaf90e 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1098,6 +1098,329 @@ void eeh_remove_device(struct pci_dev *dev) edev->mode &= ~EEH_DEV_SYSFS; } +#ifdef CONFIG_VFIO_PCI_EEH +int eeh_vfio_open(struct pci_dev *pdev) +{ + struct eeh_dev *edev; + + /* No PCI device ? */ + if (!pdev) + return -ENODEV; + + /* No EEH device ? */ + edev = pci_dev_to_eeh_dev(pdev); + if (!edev || !edev->pe) + return -ENODEV; + + eeh_dev_set_passed(edev, true); + eeh_pe_set_passed(edev->pe, true); + + return 0; +} +EXPORT_SYMBOL_GPL(eeh_vfio_open); + +void eeh_vfio_release(struct pci_dev *pdev) +{ + bool release_pe = true; + struct eeh_pe *pe = NULL; + struct eeh_dev *tmp, *edev; + + /* No PCI device ? */ + if (!pdev) + return; + + /* No EEH device ? */ + edev = pci_dev_to_eeh_dev(pdev); + if (!edev || !eeh_dev_passed(edev) || + !edev->pe || !eeh_pe_passed(pe)) + return; + + /* Release device */ + pe = edev->pe; + eeh_dev_set_passed(edev, false); + + /* Release PE */ + eeh_pe_for_each_dev(pe, edev, tmp) { + if (eeh_dev_passed(edev)) { + release_pe = false; + break; + } + } + + if (release_pe) + eeh_pe_set_passed(pe, false); +} +EXPORT_SYMBOL(eeh_vfio_release); + +static int eeh_vfio_check_dev(struct pci_dev *pdev, + struct eeh_dev **pedev, + struct eeh_pe **ppe) +{ + struct eeh_dev *edev; + + /* No device ? */ + if (!pdev) + return -ENODEV; + + edev = pci_dev_to_eeh_dev(pdev); + if (!edev || !eeh_dev_passed(edev) || + !edev->pe || !eeh_pe_passed(edev->pe)) + return -ENODEV; + + if (pedev) + *pedev = edev; + if (ppe) + *ppe = edev->pe; + + return 0; +} + +int eeh_vfio_set_pe_option(struct pci_dev *pdev, int option, int *retval) +{ + struct eeh_dev *edev; + struct eeh_pe *pe; + int ret = 0; + + /* Device existing ? */ + ret = eeh_vfio_check_dev(pdev, &edev, &pe); + if (ret) { + pr_debug("%s: Cannot find device %s\n", + __func__, pdev ? pci_name(pdev) : "NULL"); + *retval = -7; +
[PATCH v5 2/4] powerpc/eeh: Flags for passed device and PE
The patch introduces new flags for EEH device and PE to indicate that the device or PE has been passed through to guest. In turn, we will deliver EEH errors to guest for further handling, which will be done in subsequent patches. Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 7782056..34a2d83 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -72,6 +72,7 @@ struct device_node; #define EEH_PE_RESET (1 << 2)/* PE reset in progress */ #define EEH_PE_KEEP(1 << 8)/* Keep PE on hotplug */ +#define EEH_PE_PASSTHROUGH (1 << 9)/* PE owned by guest*/ struct eeh_pe { int type; /* PE type: PHB/Bus/Device */ @@ -93,6 +94,21 @@ struct eeh_pe { #define eeh_pe_for_each_dev(pe, edev, tmp) \ list_for_each_entry_safe(edev, tmp, &pe->edevs, list) +static inline bool eeh_pe_passed(struct eeh_pe *pe) +{ + return pe ? !!(pe->state & EEH_PE_PASSTHROUGH) : false; +} + +static inline void eeh_pe_set_passed(struct eeh_pe *pe, bool passed) +{ + if (pe) { + if (passed) + pe->state |= EEH_PE_PASSTHROUGH; + else + pe->state &= ~EEH_PE_PASSTHROUGH; + } +} + /* * The struct is used to trace EEH state for the associated * PCI device node or PCI device. In future, it might @@ -110,6 +126,7 @@ struct eeh_pe { #define EEH_DEV_SYSFS (1 << 9)/* Sysfs created*/ #define EEH_DEV_REMOVED(1 << 10) /* Removed permanently */ #define EEH_DEV_FRESET (1 << 11) /* Fundamental reset*/ +#define EEH_DEV_PASSTHROUGH(1 << 12) /* Owned by guest */ struct eeh_dev { int mode; /* EEH mode */ @@ -138,6 +155,21 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev) return edev ? edev->pdev : NULL; } +static inline bool eeh_dev_passed(struct eeh_dev *dev) +{ + return dev ? !!(dev->mode & EEH_DEV_PASSTHROUGH) : false; +} + +static inline void eeh_dev_set_passed(struct eeh_dev *dev, bool passed) +{ + if (dev) { + if (passed) + dev->mode |= EEH_DEV_PASSTHROUGH; + else + dev->mode &= ~EEH_DEV_PASSTHROUGH; + } +} + /* Return values from eeh_ops::next_error */ enum { EEH_NEXT_ERR_NONE = 0, -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/4] drivers/vfio: Introduce CONFIG_VFIO_PCI_EEH
The patch introduces CONFIG_VFIO_PCI_EEH for more IOCTL commands on VFIO PCI device support EEH funtionality for PCI devices that are passed through from host to guest. Signed-off-by: Gavin Shan --- drivers/vfio/pci/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index c41b01e..dd0e0f0 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -1,6 +1,7 @@ config VFIO_PCI tristate "VFIO support for PCI devices" depends on VFIO && PCI && EVENTFD + select VFIO_PCI_EEH if PPC_POWERNV help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. @@ -16,3 +17,8 @@ config VFIO_PCI_VGA BIOS and generic video drivers. If you don't know what to do here, say N. + +config VFIO_PCI_EEH + tristate + depends on EEH && VFIO_IOMMU_SPAPR_TCE + default n -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/4] powerpc/eeh: Avoid event on passed PE
If we detects frozen state on PE that has been passed to guest, we needn't handle it. Instead, we rely on the guest to detect and recover it. The patch avoid EEH event on the frozen passed PE so that the guest can have chance to handle that. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh.c | 8 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 2aaf90e..25fd12d 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev) if (ret > 0) return ret; + /* +* If the PE has been passed to guest, we won't check the +* state. Instead, let the guest handle it if the PE has +* been frozen. +*/ + if (eeh_pe_passed(pe)) + return 0; + /* If we already have a pending isolation event for this * slot, we know it's bad already, we don't need to check. * Do this checking under a lock; as multiple PCI devices diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 1b5982f..03a3ed2 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); ret = EEH_NEXT_ERR_NONE; - } else if ((*pe)->state & EEH_PE_ISOLATED) { + } else if ((*pe)->state & EEH_PE_ISOLATED || + eeh_pe_passed(*pe)) { ret = EEH_NEXT_ERR_NONE; } else { pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n", -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
> Am 21.05.2014 um 02:13 schrieb Benjamin Herrenschmidt > : > >> On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: >> Instead of >> >> if (passed_flag) >> return; >> >> you would do >> >> if (trigger_irqfd) { >> trigger_irqfd(); >> return; >> } >> >> which would be a much nicer, generic interface. > > But that's not how PAPR works. But it's what a non-QEMU VFIO user would want, and it should be easy to implement. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] powerpc/eeh: Avoid event on passed PE
> Am 21.05.2014 um 02:19 schrieb Benjamin Herrenschmidt > : > >> On Tue, 2014-05-20 at 15:49 +0200, Alexander Graf wrote: >> So how about we just implement this whole thing properly as irqfd? >> Whether QEMU can actually do anything with the interrupt is a different >> question - we can leave it be for now. But we could model all the code >> with the assumption that it should either handle the error itself or >> trigger and irqfd write. > > I don't object to the idea... however this smells of Deja Vu... > > You often tend to want to turn something submitted that fills a specific > gap and implements a specific spec/function into some kind of idealized > grand design :-) And that means nothing gets upstream for weeks or monthes > as we churn and churn... > > Sometimes it's probably worth it. Here I would argue against it and would > advocate for doing the basic functionality first, as it is used by guests, > and later add the irqfd option. I don't see any emergency here and adding > the irqfd will not cause fundamental design changes: > > The "passed" flag (though I'm not fan of the name) is really something > we want in the low level handlers to avoid triggering host side EEH in > various places, regardless of whether we use irqfd or not. > > This is totally orthogonal from the mechanism used for notifications. > > Even in host, the detection path doesn't always involve interrupts, and > we can detect some things as a result of a host side config space access > for example etc... > > So let's keep things nice and separate here. The interrupt notification > is just an "optimization" which will speed up discovery of the error in > *some* cases later on (but adds its own complexity since we have multiple > discovery path in host, so we need to keep track whether we have notified > yet or not etc...) so let's keep it for later. EEH handling is your call, but I only see reduced complexity here. I won't nak the current approach though. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] drivers/vfio: New IOCTL command VFIO_EEH_INFO
> Am 21.05.2014 um 02:23 schrieb Benjamin Herrenschmidt > : > > On Tue, 2014-05-20 at 22:39 +1000, Gavin Shan wrote: Yeah. How about this? :-) - Move eeh-vfio.c to drivers/vfio/pci/ - From eeh-vfio.c, dereference arch/powerpc/kernel/eeh.c::eeh_ops, which is arch/powerpc/plaforms/powernv/eeh-powernv.c::powernv_eeh_ops. Call >>> >>> Hrm, I think it'd be nicer to just export individual functions that >>> do thing you want to do from eeh.c. >> >> Ok. Got it. Thanks for your comments :) > > The interesting thing with this approach is that VFIO per-se can work > with EEH RTAS backend too in the host. > > IE, with PR KVM for example or with non-KVM uses of VFIO, it would be > possible to use a device in a user process and exploit EEH even when > running under a PAPR hypervisor. > > That is, vfio-eeh uses "generic" exported EEH APIs from the EEH core > that will work on both powernv and RTAS backends. > > Note to Alex: This definitely kills the notifier idea for now though, > at least as a first class citizen of the design. We can add it as an > optional optimization on top later. I don't think it does. The notifier would just get triggered on config space read failures for example :). It's really just an aid for the vfio user to have a common code path for error handling. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html