Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:
> On 28.05.14 02:39, Alex Williamson wrote:
> > On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
> >> On 27.05.14 20:15, Alex Williamson wrote:
> >>> On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
>  The patch adds new IOCTL commands for sPAPR VFIO container device
>  to support EEH functionality for PCI devices, which have been passed
>  through from host to somebody else via VFIO.
> 
>  Signed-off-by: Gavin Shan 
>  ---
> Documentation/vfio.txt  | 92 
>  -
> drivers/vfio/pci/Makefile   |  1 +
> drivers/vfio/pci/vfio_pci.c | 20 +---
> drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
> drivers/vfio/pci/vfio_pci_private.h |  5 ++
> drivers/vfio/vfio_iommu_spapr_tce.c | 85 
>  ++
> include/uapi/linux/vfio.h   | 66 ++
> 7 files changed, 308 insertions(+), 7 deletions(-)
> create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
> >> [...]
> >>
>  +
>  +return ret;
>  +}
>  +
> static long tce_iommu_ioctl(void *iommu_data,
>    unsigned int cmd, unsigned long arg)
> {
>  @@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
>   tce_iommu_disable(container);
>   mutex_unlock(&container->lock);
>   return 0;
>  +case VFIO_EEH_PE_SET_OPTION:
>  +case VFIO_EEH_PE_GET_STATE:
>  +case VFIO_EEH_PE_RESET:
>  +case VFIO_EEH_PE_CONFIGURE:
>  +return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
> >>> This is where it would have really made sense to have a single
> >>> VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
> >>> AlexG, are you really attached to splitting these out into separate
> >>> ioctls?
> >> I don't see the problem. We need to forward 4 ioctls to a separate piece
> >> of code, so we forward 4 ioctls to a separate piece of code :). Putting
> >> them into one ioctl just moves the switch() into another function.
> > And uses an extra 3 ioctl numbers and gives us extra things to update if
> > we ever need to add more ioctls, etc.  ioctl numbers are an address
> > space, how much address space do we really want to give to EEH?  It's
> > not a big difference, but I don't think it's completely even either.
> > Thanks,
> 
> Yes, that's the point. I by far prefer to have you push back on anyone 
> who introduces useless ioctls rather than have a separate EEH number 
> space that people can just throw anything in they like ;).

Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Gavin Shan
On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
>On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
>> The patch adds new IOCTL commands for sPAPR VFIO container device
>> to support EEH functionality for PCI devices, which have been passed
>> through from host to somebody else via VFIO.
>> 
>> Signed-off-by: Gavin Shan 
>> ---
>>  Documentation/vfio.txt  | 92 
>> -
>>  drivers/vfio/pci/Makefile   |  1 +
>>  drivers/vfio/pci/vfio_pci.c | 20 +---
>>  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
>>  drivers/vfio/pci/vfio_pci_private.h |  5 ++
>>  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
>>  include/uapi/linux/vfio.h   | 66 ++
>>  7 files changed, 308 insertions(+), 7 deletions(-)
>>  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
>> 
>> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
>> index b9ca023..d890fed 100644
>> --- a/Documentation/vfio.txt
>> +++ b/Documentation/vfio.txt
>> @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
>> real mode which provides
>>  an excellent performance which has limitations such as inability to do
>>  locked pages accounting in real time.
>>  
>> -So 3 additional ioctls have been added:
>> +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
>> +subtree that can be treated as a unit for the purposes of partitioning and
>> +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
>> +function of a multi-function IOA, or multiple IOAs (possibly including 
>> switch
>> +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
>> errors
>> +and recover from them via EEH RTAS services, which works on the basis of
>> +additional ioctl commands.
>> +
>> +So 7 additional ioctls have been added:
>>  
>>  VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
>>  of the DMA window on the PCI bus.
>> @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
>>  
>>  VFIO_IOMMU_DISABLE - disables the container.
>>  
>> +VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
>> +specified device. Also, it can be used to remove IO or DMA
>> +stopped state on the frozen PE.
>> +
>> +VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
>> +
>> +VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
>> +error recovering.
>> +
>> +VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
>> +one of the major steps for error recoverying.
>>  
>>  The code flow from the example above should be slightly changed:
>>  
>> @@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
>> changed:
>>  ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
>>  .
>>  
>> +Based on the initial example we have, the following piece of code could be
>> +reference for EEH setup and error handling:
>> +
>> +struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
>> +struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
>> +struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
>> +struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
>> +
>> +
>> +
>> +/* Get a file descriptor for the device */
>> +device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, ":06:0d.0");
>> +
>> +/* Enable the EEH functionality on the device */
>> +option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
>> +ioctl(container, VFIO_EEH_PE_SET_OPTION, &option);
>> +
>> +/* You're suggested to create additional data struct to represent
>> + * PE, and put child devices belonging to same IOMMU group to the
>> + * PE instance for later reference.
>> + */
>> +
>> +/* Check the PE's state and make sure it's in functional state */
>> +ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
>> +
>> +/* Save device's state. pci_save_state() would be good enough
>> + * as an example.
>> + */
>> +
>> +/* Test and setup the device */
>> +ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
>> +
>> +
>> +
>> +/* When 0xFF's returned from reading PCI config space or IO BARs
>> + * of the PCI device. Check the PE state to see if that has been
>> + * frozen.
>> + */
>> +ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
>> +
>> +/* Waiting for pending PCI transactions to be completed and don't
>> + * produce any more PCI traffic from/to the affected PE until
>> + * recovery is finished.
>> + */
>> +
>> +/* Enable IO for the affected PE and collect logs. Usually, the
>> + * standard part of PCI config space, AER registers are dumped
>> + * as logs for further analysis.
>> + */
>> +option.option = VFIO_EEH_PE_SET_OPT_IO;
>> +ioctl(contain

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alexander Graf


On 28.05.14 02:39, Alex Williamson wrote:

On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:

On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan 
---
   Documentation/vfio.txt  | 92 
-
   drivers/vfio/pci/Makefile   |  1 +
   drivers/vfio/pci/vfio_pci.c | 20 +---
   drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
   drivers/vfio/pci/vfio_pci_private.h |  5 ++
   drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
   include/uapi/linux/vfio.h   | 66 ++
   7 files changed, 308 insertions(+), 7 deletions(-)
   create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


+
+   return ret;
+}
+
   static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
   {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(&container->lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?

I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,


Yes, that's the point. I by far prefer to have you push back on anyone 
who introduces useless ioctls rather than have a separate EEH number 
space that people can just throw anything in they like ;).



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
> On 27.05.14 20:15, Alex Williamson wrote:
> > On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
> >> The patch adds new IOCTL commands for sPAPR VFIO container device
> >> to support EEH functionality for PCI devices, which have been passed
> >> through from host to somebody else via VFIO.
> >>
> >> Signed-off-by: Gavin Shan 
> >> ---
> >>   Documentation/vfio.txt  | 92 
> >> -
> >>   drivers/vfio/pci/Makefile   |  1 +
> >>   drivers/vfio/pci/vfio_pci.c | 20 +---
> >>   drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
> >>   drivers/vfio/pci/vfio_pci_private.h |  5 ++
> >>   drivers/vfio/vfio_iommu_spapr_tce.c | 85 
> >> ++
> >>   include/uapi/linux/vfio.h   | 66 ++
> >>   7 files changed, 308 insertions(+), 7 deletions(-)
> >>   create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
> 
> [...]
> 
> >> +
> >> +  return ret;
> >> +}
> >> +
> >>   static long tce_iommu_ioctl(void *iommu_data,
> >> unsigned int cmd, unsigned long arg)
> >>   {
> >> @@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
> >>tce_iommu_disable(container);
> >>mutex_unlock(&container->lock);
> >>return 0;
> >> +  case VFIO_EEH_PE_SET_OPTION:
> >> +  case VFIO_EEH_PE_GET_STATE:
> >> +  case VFIO_EEH_PE_RESET:
> >> +  case VFIO_EEH_PE_CONFIGURE:
> >> +  return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
> > This is where it would have really made sense to have a single
> > VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
> > AlexG, are you really attached to splitting these out into separate
> > ioctls?
> 
> I don't see the problem. We need to forward 4 ioctls to a separate piece 
> of code, so we forward 4 ioctls to a separate piece of code :). Putting 
> them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alexander Graf


On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan 
---
  Documentation/vfio.txt  | 92 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c


[...]


+
+   return ret;
+}
+
  static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
  {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(&container->lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?


I don't see the problem. We need to forward 4 ioctls to a separate piece 
of code, so we forward 4 ioctls to a separate piece of code :). Putting 
them into one ioctl just moves the switch() into another function.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 14:37 -0600, Alex Williamson wrote:

> > The usual way is the driver asks for one or the other, this plumbs back
> > into the guest EEH code which itself plumbs into the PCIe error recovery
> > framework in Linux.
> 
> So magic?

Yes. The driver is expected to more or less knows what kind of reset it
wants for its device. Ideally hot reset is sufficient but some drivers
knows that the device they drive is crappy enough that it mostly ignores
hot reset and really needs a PERST for example...

Also we have other reasons to expose those interfaces outside of EEH. 

For example, some drivers might want to specifically trigger a PERST
after a microcode update. IE. There are path outside of EEH error
recovery where drivers in the guest might want to trigger a reset
to the device and they have control under some circumstances on
which kind of reset they are doing (and the guest Linux does  have
different code path to do a hot reset vs. a fundamental reset).

So we need to expose that distinction to be able to honor the guest
decision.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 06:30 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-27 at 12:15 -0600, Alex Williamson wrote:
> 
> > > +/*
> > > + * Reset is the major step to recover problematic PE. The following
> > > + * command helps on that.
> > > + */
> > > +struct vfio_eeh_pe_reset {
> > > + __u32 argsz;
> > > + __u32 flags;
> > > + __u32 option;
> > > +#define VFIO_EEH_PE_RESET_DEACTIVATE 0   /* Deactivate reset 
> > > */
> > > +#define VFIO_EEH_PE_RESET_HOT1   /* Hot reset
> > > */
> > > +#define VFIO_EEH_PE_RESET_FUNDAMENTAL3   /* Fundamental reset
> > > */
> > 
> > How does a user know which of these to use?
> 
> The usual way is the driver asks for one or the other, this plumbs back
> into the guest EEH code which itself plumbs into the PCIe error recovery
> framework in Linux.

So magic?

> 
> However I do have a question for Gavin here: Why do we expose an
> explicit "deactivate" ? The reset functions should do the whole
> reset sequence (assertion, delay, deassertion). In fact the firmware
> doesn't really give you a choice for PERST right ? Or do we have
> a requirement to expose both phases for RTAS? (In that case I'm
> happy to ignore the deassertion there too).
> 
> Cheers,
> Ben.
> 



--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 12:15 -0600, Alex Williamson wrote:

> > +/*
> > + * Reset is the major step to recover problematic PE. The following
> > + * command helps on that.
> > + */
> > +struct vfio_eeh_pe_reset {
> > +   __u32 argsz;
> > +   __u32 flags;
> > +   __u32 option;
> > +#define VFIO_EEH_PE_RESET_DEACTIVATE   0   /* Deactivate reset 
> > */
> > +#define VFIO_EEH_PE_RESET_HOT  1   /* Hot reset
> > */
> > +#define VFIO_EEH_PE_RESET_FUNDAMENTAL  3   /* Fundamental reset
> > */
> 
> How does a user know which of these to use?

The usual way is the driver asks for one or the other, this plumbs back
into the guest EEH code which itself plumbs into the PCIe error recovery
framework in Linux.

However I do have a question for Gavin here: Why do we expose an
explicit "deactivate" ? The reset functions should do the whole
reset sequence (assertion, delay, deassertion). In fact the firmware
doesn't really give you a choice for PERST right ? Or do we have
a requirement to expose both phases for RTAS? (In that case I'm
happy to ignore the deassertion there too).

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
> The patch adds new IOCTL commands for sPAPR VFIO container device
> to support EEH functionality for PCI devices, which have been passed
> through from host to somebody else via VFIO.
> 
> Signed-off-by: Gavin Shan 
> ---
>  Documentation/vfio.txt  | 92 
> -
>  drivers/vfio/pci/Makefile   |  1 +
>  drivers/vfio/pci/vfio_pci.c | 20 +---
>  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
>  drivers/vfio/pci/vfio_pci_private.h |  5 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
>  include/uapi/linux/vfio.h   | 66 ++
>  7 files changed, 308 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
> 
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index b9ca023..d890fed 100644
> --- a/Documentation/vfio.txt
> +++ b/Documentation/vfio.txt
> @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
> real mode which provides
>  an excellent performance which has limitations such as inability to do
>  locked pages accounting in real time.
>  
> -So 3 additional ioctls have been added:
> +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
> +subtree that can be treated as a unit for the purposes of partitioning and
> +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
> +function of a multi-function IOA, or multiple IOAs (possibly including switch
> +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
> errors
> +and recover from them via EEH RTAS services, which works on the basis of
> +additional ioctl commands.
> +
> +So 7 additional ioctls have been added:
>  
>   VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
>   of the DMA window on the PCI bus.
> @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
>  
>   VFIO_IOMMU_DISABLE - disables the container.
>  
> + VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
> + specified device. Also, it can be used to remove IO or DMA
> + stopped state on the frozen PE.
> +
> + VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
> +
> + VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
> + error recovering.
> +
> + VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
> + one of the major steps for error recoverying.
>  
>  The code flow from the example above should be slightly changed:
>  
> @@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
> changed:
>   ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
>   .
>  
> +Based on the initial example we have, the following piece of code could be
> +reference for EEH setup and error handling:
> +
> + struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
> + struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
> + struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
> + struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
> +
> + 
> +
> + /* Get a file descriptor for the device */
> + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, ":06:0d.0");
> +
> + /* Enable the EEH functionality on the device */
> + option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
> + ioctl(container, VFIO_EEH_PE_SET_OPTION, &option);
> +
> + /* You're suggested to create additional data struct to represent
> +  * PE, and put child devices belonging to same IOMMU group to the
> +  * PE instance for later reference.
> +  */
> +
> + /* Check the PE's state and make sure it's in functional state */
> + ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
> +
> + /* Save device's state. pci_save_state() would be good enough
> +  * as an example.
> +  */
> +
> + /* Test and setup the device */
> + ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
> +
> + 
> +
> + /* When 0xFF's returned from reading PCI config space or IO BARs
> +  * of the PCI device. Check the PE state to see if that has been
> +  * frozen.
> +  */
> + ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
> +
> + /* Waiting for pending PCI transactions to be completed and don't
> +  * produce any more PCI traffic from/to the affected PE until
> +  * recovery is finished.
> +  */
> +
> + /* Enable IO for the affected PE and collect logs. Usually, the
> +  * standard part of PCI config space, AER registers are dumped
> +  * as logs for further analysis.
> +  */
> + option.option = VFIO_EEH_PE_SET_OPT_IO;
> + ioctl(container, VFIO_EEH_PE_SET_OPTION, &option);
> +
> + /* Issue PE reset */
> + reset.option = VFIO_EEH_PE_RESET_HOT;
> + ioctl(con

Re: [PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Sat, 2014-05-24 at 12:06 +1000, Gavin Shan wrote:
> On Fri, May 23, 2014 at 08:29:59AM -0600, Alex Williamson wrote:
> >On Fri, 2014-05-23 at 14:37 +1000, Gavin Shan wrote:
> >> On Thu, May 22, 2014 at 09:10:53PM -0600, Alex Williamson wrote:
> >> >On Thu, 2014-05-22 at 18:23 +1000, Gavin Shan wrote:
> 
> .../...
> 
> >No, sorry, I mean how does the user get information about the error?
> >The interface we have here is:
> >a) find that something bad has happened
> >b) kick it into working again
> >c) continue
> >
> >How does the user figure out what happened and if it makes sense to
> >attempt to recover?  Where does the user learn that their disk is on
> >fire?
> >
> 
> When 0xFF's returned from config or IO read, user should check the
> device (PE)'s state with ioctl command VFIO_EEH_PE_GET_STATE. If the
> device (PE) has been put into "frozen" state, It's confirmed the device
> ("disk" you mentioned) is on fire.

No, this only confirms that something bad happened, not _what_ bad thing
happened.

>  User should kick off recovery, which
> includes:

And here you're just describing the kick operation again...

> 
> - User stops any operatins (config, IO, DMA) on the device because any
>   PCI traffic to "frozen" device will be dropped from software or hardware
>   level. Also, we don't expect DMA traffic during recovery. Otherwise,
>   we will bump into recursive errors and the recovery should fail.
> - VFIO_EEH_PE_SET_OPTION to enable I/O path ("DMA" path is still under frozen
>   state). EEH_VFIO_PE_CONFIGURE to reconfigure affected PCI bridges and then
>   do error log retrieval.

These logs, where do they go?  How does the user get access?  That's
what I'm trying to ask about.

> - VFIO_EEH_PE_RESET to reset the affected device (PE). EEH_VFIO_PE_CONFIUGRE
>   to restore BARs.
> - User resumes the device to start PCI traffic and device is brought to
>   funtional state.
> 
> .../...
> 
> >
> >No, I prefer to stay consistent with the rest of the VFIO API and use
> >argsz + flags.
> >
> 
> Here's the recap for previous reply: I have several cases for ioctl().
> 
> - ioctl(fd, cmd, NULL):   I needn't any input info.
> - ioctl(fd, cmd, &data):  I need input info
> 
> For all the cases, should I simply have a data struct to include 
> "argsz+flags"?

Anything that requires data should have argsz+flags, if it doesn't
require data, it doesn't need them, but think long an hard about whether
there's any possibility that we'll need parameters in the future.

> For return value from ioctl(), can we simply to have additional field in the
> above data struct to carry it? "0" is the information I have to return for
> some of the cases.

If for instance your ioctl is returning something like "number of
errors", then it's perfectly fine to use that as the ioctl return.  <0
is error, >= zero is a success with value.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Gavin Shan
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan 
---
 Documentation/vfio.txt  | 92 -
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
 include/uapi/linux/vfio.h   | 66 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index b9ca023..d890fed 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
real mode which provides
 an excellent performance which has limitations such as inability to do
 locked pages accounting in real time.
 
-So 3 additional ioctls have been added:
+4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
+subtree that can be treated as a unit for the purposes of partitioning and
+error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
+function of a multi-function IOA, or multiple IOAs (possibly including switch
+and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors
+and recover from them via EEH RTAS services, which works on the basis of
+additional ioctl commands.
+
+So 7 additional ioctls have been added:
 
VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
of the DMA window on the PCI bus.
@@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
 
VFIO_IOMMU_DISABLE - disables the container.
 
+   VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
+   specified device. Also, it can be used to remove IO or DMA
+   stopped state on the frozen PE.
+
+   VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
+
+   VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
+   error recovering.
+
+   VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
+   one of the major steps for error recoverying.
 
 The code flow from the example above should be slightly changed:
 
@@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
changed:
ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
.
 
+Based on the initial example we have, the following piece of code could be
+reference for EEH setup and error handling:
+
+   struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
+   struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
+   struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
+   struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
+
+   
+
+   /* Get a file descriptor for the device */
+   device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, ":06:0d.0");
+
+   /* Enable the EEH functionality on the device */
+   option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
+   ioctl(container, VFIO_EEH_PE_SET_OPTION, &option);
+
+   /* You're suggested to create additional data struct to represent
+* PE, and put child devices belonging to same IOMMU group to the
+* PE instance for later reference.
+*/
+
+   /* Check the PE's state and make sure it's in functional state */
+   ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
+
+   /* Save device's state. pci_save_state() would be good enough
+* as an example.
+*/
+
+   /* Test and setup the device */
+   ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
+
+   
+
+   /* When 0xFF's returned from reading PCI config space or IO BARs
+* of the PCI device. Check the PE state to see if that has been
+* frozen.
+*/
+   ioctl(container, VFIO_EEH_PE_GET_STATE, &state);
+
+   /* Waiting for pending PCI transactions to be completed and don't
+* produce any more PCI traffic from/to the affected PE until
+* recovery is finished.
+*/
+
+   /* Enable IO for the affected PE and collect logs. Usually, the
+* standard part of PCI config space, AER registers are dumped
+* as logs for further analysis.
+*/
+   option.option = VFIO_EEH_PE_SET_OPT_IO;
+   ioctl(container, VFIO_EEH_PE_SET_OPTION, &option);
+
+   /* Issue PE reset */
+   reset.option = VFIO_EEH_PE_RESET_HOT;
+   ioctl(container, VFIO_EEH_PE_RESET, &reset);
+   reset.option = VFIO_EEH_PE_RESET_DEACTIVATE;
+   ioctl(container, VFIO_EEH_PE_RESET, &reset);
+
+   /* Configure the PCI bridges for 

[PATCH v7 1/3] powerpc/eeh: Avoid event on passed PE

2014-05-27 Thread Gavin Shan
If we detects frozen state on PE that has been passed through to somebody
else. we needn't handle it. Instead, we rely on the device's owner to
detect and recover it. The patch avoid EEH event on the frozen passed PE
so that the device's owner can have chance to handle that.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h| 32 +++
 arch/powerpc/kernel/eeh.c |  8 
 arch/powerpc/platforms/powernv/eeh-ioda.c |  3 ++-
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7782056..34a2d83 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -72,6 +72,7 @@ struct device_node;
 #define EEH_PE_RESET   (1 << 2)/* PE reset in progress */
 
 #define EEH_PE_KEEP(1 << 8)/* Keep PE on hotplug   */
+#define EEH_PE_PASSTHROUGH (1 << 9)/* PE owned by guest*/
 
 struct eeh_pe {
int type;   /* PE type: PHB/Bus/Device  */
@@ -93,6 +94,21 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+static inline bool eeh_pe_passed(struct eeh_pe *pe)
+{
+   return pe ? !!(pe->state & EEH_PE_PASSTHROUGH) : false;
+}
+
+static inline void eeh_pe_set_passed(struct eeh_pe *pe, bool passed)
+{
+   if (pe) {
+   if (passed)
+   pe->state |= EEH_PE_PASSTHROUGH;
+   else
+   pe->state &= ~EEH_PE_PASSTHROUGH;
+   }
+}
+
 /*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
@@ -110,6 +126,7 @@ struct eeh_pe {
 #define EEH_DEV_SYSFS  (1 << 9)/* Sysfs created*/
 #define EEH_DEV_REMOVED(1 << 10)   /* Removed permanently  
*/
 #define EEH_DEV_FRESET (1 << 11)   /* Fundamental reset*/
+#define EEH_DEV_PASSTHROUGH(1 << 12)   /* Owned by guest   */
 
 struct eeh_dev {
int mode;   /* EEH mode */
@@ -138,6 +155,21 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct 
eeh_dev *edev)
return edev ? edev->pdev : NULL;
 }
 
+static inline bool eeh_dev_passed(struct eeh_dev *dev)
+{
+   return dev ? !!(dev->mode & EEH_DEV_PASSTHROUGH) : false;
+}
+
+static inline void eeh_dev_set_passed(struct eeh_dev *dev, bool passed)
+{
+   if (dev) {
+   if (passed)
+   dev->mode |= EEH_DEV_PASSTHROUGH;
+   else
+   dev->mode &= ~EEH_DEV_PASSTHROUGH;
+   }
+}
+
 /* Return values from eeh_ops::next_error */
 enum {
EEH_NEXT_ERR_NONE = 0,
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9c6b899..3bc8b12 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
if (ret > 0)
return ret;
 
+   /*
+* If the PE isn't owned by us, we shouldn't check the
+* state. Instead, let the owner handle it if the PE has
+* been frozen.
+*/
+   if (eeh_pe_passed(pe))
+   return 0;
+
/* If we already have a pending isolation event for this
 * slot, we know it's bad already, we don't need to check.
 * Do this checking under a lock; as multiple PCI devices
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index cab3e62..79193eb 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -892,7 +892,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
opal_pci_eeh_freeze_clear(phb->opal_id, 
frozen_pe_no,
OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
ret = EEH_NEXT_ERR_NONE;
-   } else if ((*pe)->state & EEH_PE_ISOLATED) {
+   } else if ((*pe)->state & EEH_PE_ISOLATED ||
+  eeh_pe_passed(*pe)) {
ret = EEH_NEXT_ERR_NONE;
} else {
pr_err("EEH: Frozen PHB#%x-PE#%x (%s) 
detected\n",
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 2/3] powerpc/eeh: EEH support for VFIO PCI device

2014-05-27 Thread Gavin Shan
The patch exports functions to be used by new ioctl commands, which
will be introduced in subsequent patch, to support EEH functinality
for VFIO PCI device.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h |  15 +++
 arch/powerpc/kernel/eeh.c  | 286 +
 2 files changed, 301 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 34a2d83..ffc95e7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+struct iommu_table;
 struct pci_dev;
 struct pci_bus;
 struct device_node;
@@ -191,6 +192,8 @@ enum {
 #define EEH_OPT_ENABLE 1   /* EEH enable   */
 #define EEH_OPT_THAW_MMIO  2   /* MMIO enable  */
 #define EEH_OPT_THAW_DMA   3   /* DMA enable   */
+#define EEH_OPT_GET_PE_ADDR0   /* Get PE addr  */
+#define EEH_OPT_GET_PE_MODE1   /* Get PE mode  */
 #define EEH_STATE_UNAVAILABLE  (1 << 0)/* State unavailable*/
 #define EEH_STATE_NOT_SUPPORT  (1 << 1)/* EEH not supported*/
 #define EEH_STATE_RESET_ACTIVE (1 << 2)/* Active reset */
@@ -198,6 +201,11 @@ enum {
 #define EEH_STATE_DMA_ACTIVE   (1 << 4)/* Active DMA   */
 #define EEH_STATE_MMIO_ENABLED (1 << 5)/* MMIO enabled */
 #define EEH_STATE_DMA_ENABLED  (1 << 6)/* DMA enabled  */
+#define EEH_PE_STATE_NORMAL0   /* Normal state */
+#define EEH_PE_STATE_RESET 1   /* PE reset */
+#define EEH_PE_STATE_STOPPED_IO_DMA2   /* Stopped  */
+#define EEH_PE_STATE_STOPPED_DMA   4   /* Stopped DMA  */
+#define EEH_PE_STATE_UNAVAIL   5   /* Unavailable  */
 #define EEH_RESET_DEACTIVATE   0   /* Deactivate the PE reset  */
 #define EEH_RESET_HOT  1   /* Hot reset*/
 #define EEH_RESET_FUNDAMENTAL  3   /* Fundamental reset*/
@@ -305,6 +313,13 @@ void eeh_add_device_late(struct pci_dev *);
 void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
 void eeh_remove_device(struct pci_dev *);
+int eeh_dev_open(struct pci_dev *pdev);
+void eeh_dev_release(struct pci_dev *pdev);
+struct eeh_pe *eeh_iommu_table_to_pe(struct iommu_table *tbl);
+int eeh_pe_set_option(struct eeh_pe *pe, int option);
+int eeh_pe_get_state(struct eeh_pe *pe);
+int eeh_pe_reset(struct eeh_pe *pe, int option);
+int eeh_pe_configure(struct eeh_pe *pe);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3bc8b12..30693c1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -108,6 +109,9 @@ struct eeh_ops *eeh_ops = NULL;
 /* Lock to avoid races due to multiple reports of an error */
 DEFINE_RAW_SPINLOCK(confirm_error_lock);
 
+/* Lock to protect passed flags */
+static DEFINE_MUTEX(eeh_dev_mutex);
+
 /* Buffer for reporting pci register dumps. Its here in BSS, and
  * not dynamically alloced, so that it ends up in RMO where RTAS
  * can access it.
@@ -1106,6 +1110,288 @@ void eeh_remove_device(struct pci_dev *dev)
edev->mode &= ~EEH_DEV_SYSFS;
 }
 
+/**
+ * eeh_dev_open - Mark EEH device and PE as passed through
+ * @pdev: PCI device
+ *
+ * Mark the indicated EEH device and PE as passed through.
+ * In the result, the EEH errors detected on the PE won't be
+ * reported. The owner of the device will be responsible for
+ * detection and recovery.
+ */
+int eeh_dev_open(struct pci_dev *pdev)
+{
+   struct eeh_dev *edev;
+
+   mutex_lock(&eeh_dev_mutex);
+
+   /* No PCI device ? */
+   if (!pdev) {
+   mutex_unlock(&eeh_dev_mutex);
+   return -ENODEV;
+   }
+
+   /* No EEH device ? */
+   edev = pci_dev_to_eeh_dev(pdev);
+   if (!edev || !edev->pe) {
+   mutex_unlock(&eeh_dev_mutex);
+   return -ENODEV;
+   }
+
+   eeh_dev_set_passed(edev, true);
+   eeh_pe_set_passed(edev->pe, true);
+   mutex_unlock(&eeh_dev_mutex);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(eeh_dev_open);
+
+/**
+ * eeh_dev_release - Reclaim the ownership of EEH device
+ * @pdev: PCI device
+ *
+ * Reclaim ownership of EEH device, potentially the corresponding
+ * PE. In the result, the EEH errors detected on the PE will be
+ * reported and handled as usual.
+ */
+void eeh_dev_release(struct pci_dev *pdev)
+{
+   bool release_pe = true;
+   struct eeh_pe *pe = NULL;
+   struct eeh_dev *tmp, *edev;
+
+   mutex_lock(&eeh_dev_mutex);
+
+   /* No PCI device ? */
+   if (!pdev) {
+   mutex_unlock(&eeh_dev_mutex);
+   return;
+   }
+
+   /* No EEH device ? */
+   edev = pci_dev_to_eeh_dev(pdev);

[PATCH v7 0/3] EEH Support for VFIO PCI Device

2014-05-27 Thread Gavin Shan
The series of patches intends to support EEH for PCI devices, which are
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to
support EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly.
  If QEMU can't handle it, the request will be sent to host via newly introduced
  VFIO container IOCTL command (VFIO_EEH_OP) and gets handled in host kernel.

The series of patches requires corresponding QEMU changes.

Change log
==
v1 -> v2:
* EEH RTAS requests are routed to QEMU, and then possiblly to host 
kerenl.
  The mechanism KVM in-kernel handling is dropped.
* Error injection is reimplemented based syscall, instead of KVM 
in-kerenl
  handling. The logic for error injection token management is moved to
  QEMU. The error injection request is routed to QEMU and then possiblly
  to host kernel.
v2 -> v3:
* Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info 
based
  on the comments from Alexey.
* Define macros for EEH VFIO operations (Alexey).
* Clear frozen state after successful PE reset. 
* Merge original [PATCH 1/2/3] to one.
v3 -> v4:
* Remove the error injection from the patchset. Mike or I will work on 
that
  later.
* Rename CONFIG_VFIO_EEH to VFIO_PCI_EEH.
* Rename the IOCTL command to VFIO_EEH_OP and it's handled by VFIO-PCI 
device
  instead of VFIO container.
* Rename the IOCTL argument structure to "vfio_eeh_op" accordingly. 
Also, more
  fields added to hold return values for RTAS requests.
* The address mapping stuff is totally removed. When opening or 
releasing VFIO
  PCI device, notification sent to EEH to update the flags indicates 
the device
  is passed to guest or not.
* Change pr_warn() to pr_debug() to avoid DOS as pointed by Alex.W
* Argument size check issue pointed by Alex.W.
v4 -> v5:
* Functions for VFIO PCI EEH support are moved to eeh.c and exported 
from there.
  VFIO PCI driver just uses those functions to tackle IOCTL command 
VFIO_EEH_OP.
  All of this is to make the code organized in a good way as suggested 
by Alex.G.
  Another potential benefit is PowerNV/pSeries are sharing "eeh_ops" 
and same
  infrastructure could possiblly work for KVM_PR and KVM_HV mode at the 
same time.
* Don't clear error injection registers after finishing PE reset as the 
patchset
  is doing nothing related to error injection.
* Amending Documentation/vfio.txt, which was missed in last revision.
* No QEMU changes for this revision. "v4" works well. Also, remove 
"RFC" from the
  subject as the design is basically recognized.
v5 -> v6:
* CONFIG_VFIO_PCI_EEH removed. Instead to use CONFIG_EEH.
* Split one ioctl command to 5.
* In eeh.c, description has been added for those exported functions. 
Also, the
  functions have negative return values for error and information with 
other values.
  All digital numbers have been replaced by macros defined in eeh.h. 
The comments,
  including the function names have been amended not to mention "guest" 
or "vfio".
* Add one mutex to protect flag in eeh_dev_open()/release().
* More information on how to use those ioctl commands to 
Documentation/vfio.txt.
v6 -> v7:
* Remove ioctl command VFIO_EEH_PE_GET_ADDR, the PE address will be 
figured out
  in userland (e.g. QEMU) as Alex.G suggested.
* Let sPAPR VFIO container process the ioctl commands as VFIO container 
is naturally
  corresponds to IOMMU group (aka PE on sPAPR platform).
* All VFIO PCI EEH ioctl commands have "argsz+flags" for its companion 
data struct.
* For VFIO PCI EEH ioctl commands, ioctl() returns negative number to 
indicate error
  or zero for success. Additinal output information is transported by 
the companion
  data struct.
* Explaining PE in Documentation/vfio.txt, typo fixes, more comments 
suggested by
  Alex.G.
* Split/merge patches according to suggestions from Alex.G and Alex.W.
* To have EEH stub in drivers/vfio/pci/, which was suggested by Alex.W.
* Define various EEH options as macros in vfio.h for userland to use.

Gavin Shan (3):
  powerpc/eeh: Avoid event on passed PE
  powerpc/eeh: EEH support for VFIO PCI device
  drivers/vfio: EEH support for VFIO PCI device

Documentation/vfio.txt|  92 
+++-
arch/powerpc/include/asm/eeh.h|  47 +++
arch/powerpc/kernel/eeh.c | 294 
+