Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-21 Thread Etienne Martineau

On 09/20/2012 05:13 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 16:36 -0400, Etienne Martineau wrote:

On 09/20/2012 03:37 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 15:08 -0400, Etienne Martineau wrote:

On 09/20/2012 02:16 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:

In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
result in a hw_error(). Similarly a pwrite() error part of
assigned_dev_pci_write() also result in a hw_error().

Would there be a way to avoid terminating the guest for those cases? How
about we deassign the device upon error?


By terminating the guest we contain the error vs allowing the guest to
continue running with invalid data.  De-assigning the device is
asynchronous and relies on guest involvement, so damage is potentially
already done.  Is this a theoretical problem or do you actually have
hardware that hits this?  Thanks,

Alex



This problem is in the context of a Hot-pluggable device assigned to the
guest. If the guest rd/wr the config space at the same time than the
device is physically taken out then the guest will terminate with
hw_error().

Because this limits the availability of the guest I think we should try
to recover instead. I don't see what other damage can happen since
guest's MMIO access to the stale device will go nowhere?


So you're looking at implementing surprise device removal?  There's not
just config space, there's slow bar access and mmap'd spaces to worry
about too.  What does going nowhere mean?  If it means reads return -1
and the guest is trying to read the data portion of a packet from the
network or an hba, we've now passed bad data to the guest.  Thanks,

Alex





Thanks for your answer;

Yes we are doing 'surprise device removal' for assigned device. Note
that the problem also exist with standard 'attention button' device removal.

The problem is all about fault isolation. Ideally, only the
corresponding driver should be affected by this 'surprise device
removal'. I think that taking down the guest is too coarse. Think about
a 'surprise device removal' on the host. In that case the host is not
taken down so why not do the same with the guest?


It depends on the host hardware.  Some x86 hardware will try to isolate
the fault with an NMI other architectures such as ia64 would pull a
machine check on a driver access to unresponsive devices.


Yes some badness will be latched into the guest but really this not any
different that having a mis-behaving device.


... which is a bad thing, but often undetectable.  This is detectable.
Thanks,

Alex



Our hardware is throwing a surprise link down PCIe AER and we are acting 
on it. I agree that for the generalized case NMI can be an issue.


Let me ask you that question. What would be the best way to support 
device removal (surprise or not) for guest assigned device then? How 
about signaling the guest from vfio_pci_remove()?


thanks,
Etienne

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-21 Thread Alex Williamson
On Fri, 2012-09-21 at 11:17 -0400, Etienne Martineau wrote:
 On 09/20/2012 05:13 PM, Alex Williamson wrote:
  On Thu, 2012-09-20 at 16:36 -0400, Etienne Martineau wrote:
  On 09/20/2012 03:37 PM, Alex Williamson wrote:
  On Thu, 2012-09-20 at 15:08 -0400, Etienne Martineau wrote:
  On 09/20/2012 02:16 PM, Alex Williamson wrote:
  On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:
  In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
  result in a hw_error(). Similarly a pwrite() error part of
  assigned_dev_pci_write() also result in a hw_error().
 
  Would there be a way to avoid terminating the guest for those cases? 
  How
  about we deassign the device upon error?
 
  By terminating the guest we contain the error vs allowing the guest to
  continue running with invalid data.  De-assigning the device is
  asynchronous and relies on guest involvement, so damage is potentially
  already done.  Is this a theoretical problem or do you actually have
  hardware that hits this?  Thanks,
 
  Alex
 
 
  This problem is in the context of a Hot-pluggable device assigned to the
  guest. If the guest rd/wr the config space at the same time than the
  device is physically taken out then the guest will terminate with
  hw_error().
 
  Because this limits the availability of the guest I think we should try
  to recover instead. I don't see what other damage can happen since
  guest's MMIO access to the stale device will go nowhere?
 
  So you're looking at implementing surprise device removal?  There's not
  just config space, there's slow bar access and mmap'd spaces to worry
  about too.  What does going nowhere mean?  If it means reads return -1
  and the guest is trying to read the data portion of a packet from the
  network or an hba, we've now passed bad data to the guest.  Thanks,
 
  Alex
 
 
 
 
  Thanks for your answer;
 
  Yes we are doing 'surprise device removal' for assigned device. Note
  that the problem also exist with standard 'attention button' device 
  removal.
 
  The problem is all about fault isolation. Ideally, only the
  corresponding driver should be affected by this 'surprise device
  removal'. I think that taking down the guest is too coarse. Think about
  a 'surprise device removal' on the host. In that case the host is not
  taken down so why not do the same with the guest?
 
  It depends on the host hardware.  Some x86 hardware will try to isolate
  the fault with an NMI other architectures such as ia64 would pull a
  machine check on a driver access to unresponsive devices.
 
  Yes some badness will be latched into the guest but really this not any
  different that having a mis-behaving device.
 
  ... which is a bad thing, but often undetectable.  This is detectable.
  Thanks,
 
  Alex
 
 
 Our hardware is throwing a surprise link down PCIe AER and we are acting 
 on it. I agree that for the generalized case NMI can be an issue.
 
 Let me ask you that question. What would be the best way to support 
 device removal (surprise or not) for guest assigned device then? How 
 about signaling the guest from vfio_pci_remove()?

Thanks for using vfio! :)

The 440fx chipset is really not designed to deal with these kinds of
problems.  Generally the best answer to how should we expose foo to the
guest is to do it exactly like it is on the host.  That means sending a
surprise link down aer to the guest.  That should be possible with q35.
We could potentially signal that in vfio_pci_remove, but we probably
want to figure out how to relay the aer event to the guest and inject it
into the emulated chipset.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-21 Thread Etienne Martineau

On 09/21/2012 11:49 AM, Alex Williamson wrote:

On Fri, 2012-09-21 at 11:17 -0400, Etienne Martineau wrote:

On 09/20/2012 05:13 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 16:36 -0400, Etienne Martineau wrote:

On 09/20/2012 03:37 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 15:08 -0400, Etienne Martineau wrote:

On 09/20/2012 02:16 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:

In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
result in a hw_error(). Similarly a pwrite() error part of
assigned_dev_pci_write() also result in a hw_error().

Would there be a way to avoid terminating the guest for those cases? How
about we deassign the device upon error?


By terminating the guest we contain the error vs allowing the guest to
continue running with invalid data.  De-assigning the device is
asynchronous and relies on guest involvement, so damage is potentially
already done.  Is this a theoretical problem or do you actually have
hardware that hits this?  Thanks,

Alex



This problem is in the context of a Hot-pluggable device assigned to the
guest. If the guest rd/wr the config space at the same time than the
device is physically taken out then the guest will terminate with
hw_error().

Because this limits the availability of the guest I think we should try
to recover instead. I don't see what other damage can happen since
guest's MMIO access to the stale device will go nowhere?


So you're looking at implementing surprise device removal?  There's not
just config space, there's slow bar access and mmap'd spaces to worry
about too.  What does going nowhere mean?  If it means reads return -1
and the guest is trying to read the data portion of a packet from the
network or an hba, we've now passed bad data to the guest.  Thanks,

Alex





Thanks for your answer;

Yes we are doing 'surprise device removal' for assigned device. Note
that the problem also exist with standard 'attention button' device removal.

The problem is all about fault isolation. Ideally, only the
corresponding driver should be affected by this 'surprise device
removal'. I think that taking down the guest is too coarse. Think about
a 'surprise device removal' on the host. In that case the host is not
taken down so why not do the same with the guest?


It depends on the host hardware.  Some x86 hardware will try to isolate
the fault with an NMI other architectures such as ia64 would pull a
machine check on a driver access to unresponsive devices.


Yes some badness will be latched into the guest but really this not any
different that having a mis-behaving device.


... which is a bad thing, but often undetectable.  This is detectable.
Thanks,

Alex



Our hardware is throwing a surprise link down PCIe AER and we are acting
on it. I agree that for the generalized case NMI can be an issue.

Let me ask you that question. What would be the best way to support
device removal (surprise or not) for guest assigned device then? How
about signaling the guest from vfio_pci_remove()?


Thanks for using vfio! :)

The 440fx chipset is really not designed to deal with these kinds of
problems.  Generally the best answer to how should we expose foo to the
guest is to do it exactly like it is on the host.  That means sending a
surprise link down aer to the guest.  That should be possible with q35.
We are using q35 at this time for those reasons but the original qemu 
problem still exist. By the time the SPLD aer reached the guest, the 
device is physically gone on the host. Any transient guest MMIO/PCIcfg 
access to the stale assigned device can be fatal ( hw_error() ).



We could potentially signal that in vfio_pci_remove, but we probably
want to figure out how to relay the aer event to the guest and inject it
into the emulated chipset.
We tried that but there was some problems such as mangling the tlp to 
match the guest pci topology or the propagation latency caused by the 
chipset emulation layer during AER delivery. Right now we are using a 
straight lookup in the guest and fire the AER directly into the driver 
callback pci_error. We are doing that to minimize the exposition to the 
stale assigned device.


thanks,
Etienne
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-20 Thread Jan Kiszka
On 2012-09-20 19:27, Etienne Martineau wrote:
 In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
 result in a hw_error(). Similarly a pwrite() error part of
 assigned_dev_pci_write() also result in a hw_error().
 
 Would there be a way to avoid terminating the guest for those cases? How
 about we deassign the device upon error?

First of all, is this a regression of latest QEMU / qemu-kvm? Or was it
always like this for you?

Then, can you provide more information about the device (lscpi -vv) and
what accesses go wrong (printf, complete console output)?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-20 Thread Alex Williamson
On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:
 In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read() 
 result in a hw_error(). Similarly a pwrite() error part of 
 assigned_dev_pci_write() also result in a hw_error().
 
 Would there be a way to avoid terminating the guest for those cases? How 
 about we deassign the device upon error?

By terminating the guest we contain the error vs allowing the guest to
continue running with invalid data.  De-assigning the device is
asynchronous and relies on guest involvement, so damage is potentially
already done.  Is this a theoretical problem or do you actually have
hardware that hits this?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-20 Thread Etienne Martineau

On 09/20/2012 03:37 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 15:08 -0400, Etienne Martineau wrote:

On 09/20/2012 02:16 PM, Alex Williamson wrote:

On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:

In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
result in a hw_error(). Similarly a pwrite() error part of
assigned_dev_pci_write() also result in a hw_error().

Would there be a way to avoid terminating the guest for those cases? How
about we deassign the device upon error?


By terminating the guest we contain the error vs allowing the guest to
continue running with invalid data.  De-assigning the device is
asynchronous and relies on guest involvement, so damage is potentially
already done.  Is this a theoretical problem or do you actually have
hardware that hits this?  Thanks,

Alex



This problem is in the context of a Hot-pluggable device assigned to the
guest. If the guest rd/wr the config space at the same time than the
device is physically taken out then the guest will terminate with
hw_error().

Because this limits the availability of the guest I think we should try
to recover instead. I don't see what other damage can happen since
guest's MMIO access to the stale device will go nowhere?


So you're looking at implementing surprise device removal?  There's not
just config space, there's slow bar access and mmap'd spaces to worry
about too.  What does going nowhere mean?  If it means reads return -1
and the guest is trying to read the data portion of a packet from the
network or an hba, we've now passed bad data to the guest.  Thanks,

Alex





Thanks for your answer;

Yes we are doing 'surprise device removal' for assigned device. Note 
that the problem also exist with standard 'attention button' device removal.


The problem is all about fault isolation. Ideally, only the 
corresponding driver should be affected by this 'surprise device 
removal'. I think that taking down the guest is too coarse. Think about 
a 'surprise device removal' on the host. In that case the host is not 
taken down so why not do the same with the guest?


Yes some badness will be latched into the guest but really this not any 
different that having a mis-behaving device.


thanks,
Etienne


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-assign terminates the guest upon pread() / pwrite() error?

2012-09-20 Thread Alex Williamson
On Thu, 2012-09-20 at 16:36 -0400, Etienne Martineau wrote:
 On 09/20/2012 03:37 PM, Alex Williamson wrote:
  On Thu, 2012-09-20 at 15:08 -0400, Etienne Martineau wrote:
  On 09/20/2012 02:16 PM, Alex Williamson wrote:
  On Thu, 2012-09-20 at 13:27 -0400, Etienne Martineau wrote:
  In hw/kvm/pci-assign.c a pread() error part of assigned_dev_pci_read()
  result in a hw_error(). Similarly a pwrite() error part of
  assigned_dev_pci_write() also result in a hw_error().
 
  Would there be a way to avoid terminating the guest for those cases? How
  about we deassign the device upon error?
 
  By terminating the guest we contain the error vs allowing the guest to
  continue running with invalid data.  De-assigning the device is
  asynchronous and relies on guest involvement, so damage is potentially
  already done.  Is this a theoretical problem or do you actually have
  hardware that hits this?  Thanks,
 
  Alex
 
 
  This problem is in the context of a Hot-pluggable device assigned to the
  guest. If the guest rd/wr the config space at the same time than the
  device is physically taken out then the guest will terminate with
  hw_error().
 
  Because this limits the availability of the guest I think we should try
  to recover instead. I don't see what other damage can happen since
  guest's MMIO access to the stale device will go nowhere?
 
  So you're looking at implementing surprise device removal?  There's not
  just config space, there's slow bar access and mmap'd spaces to worry
  about too.  What does going nowhere mean?  If it means reads return -1
  and the guest is trying to read the data portion of a packet from the
  network or an hba, we've now passed bad data to the guest.  Thanks,
 
  Alex
 
 
 
 
 Thanks for your answer;
 
 Yes we are doing 'surprise device removal' for assigned device. Note 
 that the problem also exist with standard 'attention button' device removal.
 
 The problem is all about fault isolation. Ideally, only the 
 corresponding driver should be affected by this 'surprise device 
 removal'. I think that taking down the guest is too coarse. Think about 
 a 'surprise device removal' on the host. In that case the host is not 
 taken down so why not do the same with the guest?

It depends on the host hardware.  Some x86 hardware will try to isolate
the fault with an NMI other architectures such as ia64 would pull a
machine check on a driver access to unresponsive devices.

 Yes some badness will be latched into the guest but really this not any 
 different that having a mis-behaving device.

... which is a bad thing, but often undetectable.  This is detectable.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html