Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability

2009-02-13 Thread Andi Kleen
Yu Zhao yu.z...@intel.com writes:
 +
 +
 +static int sriov_init(struct pci_dev *dev, int pos)
 +{
 + int i;
 + int rc;
 + int nres;
 + u32 pgsz;
 + u16 ctrl, total, offset, stride;
 + struct pci_sriov *iov;
 + struct resource *res;
 + struct pci_dev *pdev;
 +
 + if (dev-pcie_type != PCI_EXP_TYPE_RC_END 
 + dev-pcie_type != PCI_EXP_TYPE_ENDPOINT)
 + return -ENODEV;
 +

It would be a good idea to put a might_sleep() here just in 
case the msleep happens below and drivers call it incorrectly.

 + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
 + if (ctrl  PCI_SRIOV_CTRL_VFE) {
 + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
 + msleep(100);


That's really long. Hopefully that's really needed.

 +
 + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
 + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total);
 + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset);
 + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride);
 + if (!offset || (total  1  !stride))
 + return -EIO;
 +
 + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz);
 + i = PAGE_SHIFT  12 ? PAGE_SHIFT - 12 : 0;
 + pgsz = ~((1  i) - 1);
 + if (!pgsz)
 + return -EIO;

All the error paths don't seem to undo the config space writes.
How will the devices behave with half initialized context?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability

2009-02-13 Thread Matthew Wilcox
On Fri, Feb 13, 2009 at 05:56:44PM +0100, Andi Kleen wrote:
  +   pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
  +   if (ctrl  PCI_SRIOV_CTRL_VFE) {
  +   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
  +   msleep(100);
 
 That's really long. Hopefully that's really needed.

Yes and no.  The spec says:

  To allow components to perform internal initialization, system software
  must wait for at least 100 ms after changing the VF Enable bit from
  a 0 to a 1, before it is permitted to issue Configuration Requests to
  the VFs which are enabled by that VF Enable bit.

So we don't have to wait here, but we do have to wait before exposing
all these virtual functions to the rest of the system.  Should we add
more complexity, perhaps spawn a thread to do it asynchronously, or add
0.1 seconds to device initialisation?  A question without an easy
answer, iMO.

  +
  +   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
  +   pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total);
  +   pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset);
  +   pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride);
  +   if (!offset || (total  1  !stride))
  +   return -EIO;
  +
  +   pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz);
  +   i = PAGE_SHIFT  12 ? PAGE_SHIFT - 12 : 0;
  +   pgsz = ~((1  i) - 1);
  +   if (!pgsz)
  +   return -EIO;
 
 All the error paths don't seem to undo the config space writes.
 How will the devices behave with half initialized context?

I think we should clear the VF_ENABLE bit.  That action is also fraught
with danger:

  If software Clears VF Enable, software must allow 1 second after VF
  Enable is Cleared before reading any field in the SR-IOV Extended
  Capability or the VF Migration State Array (see Section 3.3.15.1).

Another msleep(1000) here?  Not pretty, but what else can we do?

Not to mention the danger of something else innocently using lspci -
to read a field in the extended capability -- I suspect we also need to
block user config accesses before clearing this bit.

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability

2009-02-13 Thread Yu Zhao
On Sat, Feb 14, 2009 at 12:56:44AM +0800, Andi Kleen wrote:
 Yu Zhao yu.z...@intel.com writes:
  +
  +
  +static int sriov_init(struct pci_dev *dev, int pos)
  +{
  +   int i;
  +   int rc;
  +   int nres;
  +   u32 pgsz;
  +   u16 ctrl, total, offset, stride;
  +   struct pci_sriov *iov;
  +   struct resource *res;
  +   struct pci_dev *pdev;
  +
  +   if (dev-pcie_type != PCI_EXP_TYPE_RC_END 
  +   dev-pcie_type != PCI_EXP_TYPE_ENDPOINT)
  +   return -ENODEV;
  +
 
 It would be a good idea to put a might_sleep() here just in 
 case the msleep happens below and drivers call it incorrectly.

Yes, will do.

  +   pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
  +   if (ctrl  PCI_SRIOV_CTRL_VFE) {
  +   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
  +   msleep(100);
 
 That's really long. Hopefully that's really needed.

It's needed according to SR-IOV spec, however, these lines clear
the VF Enable bit if the BIOS or something else has set it. So it
doesn't always run into this.

  +
  +   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
  +   pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total);
  +   pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset);
  +   pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride);
  +   if (!offset || (total  1  !stride))
  +   return -EIO;
  +
  +   pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz);
  +   i = PAGE_SHIFT  12 ? PAGE_SHIFT - 12 : 0;
  +   pgsz = ~((1  i) - 1);
  +   if (!pgsz)
  +   return -EIO;
 
 All the error paths don't seem to undo the config space writes.
 How will the devices behave with half initialized context?

Since the VF Enable bit is cleared before the initialization, setting
others SR-IOV registers won't change state of the device. So it should
be OK even without undo these writes as long as the VF Enable bit is
not set.

Thanks,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability

2009-02-13 Thread Yu Zhao
On Sat, Feb 14, 2009 at 01:49:59AM +0800, Matthew Wilcox wrote:
 On Fri, Feb 13, 2009 at 05:56:44PM +0100, Andi Kleen wrote:
   + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
   + if (ctrl  PCI_SRIOV_CTRL_VFE) {
   + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
   + msleep(100);
  
  That's really long. Hopefully that's really needed.
 
 Yes and no.  The spec says:
 
   To allow components to perform internal initialization, system software
   must wait for at least 100 ms after changing the VF Enable bit from
   a 0 to a 1, before it is permitted to issue Configuration Requests to
   the VFs which are enabled by that VF Enable bit.
 
 So we don't have to wait here, but we do have to wait before exposing
 all these virtual functions to the rest of the system.  Should we add
 more complexity, perhaps spawn a thread to do it asynchronously, or add
 0.1 seconds to device initialisation?  A question without an easy
 answer, iMO.

This clears the VF Enable bit only if the BIOS has set it, so it doesn't
always happen. Actually the `msleep(100)' should be `ssleep(1)' here,
according to the spec you showed us below. I remembered the waiting time
incorrectly as 100ms which is the requirment for setting the VF Enable
bit rather than clearing it.

   +
   + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
   + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total);
   + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset);
   + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride);
   + if (!offset || (total  1  !stride))
   + return -EIO;
   +
   + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz);
   + i = PAGE_SHIFT  12 ? PAGE_SHIFT - 12 : 0;
   + pgsz = ~((1  i) - 1);
   + if (!pgsz)
   + return -EIO;
  
  All the error paths don't seem to undo the config space writes.
  How will the devices behave with half initialized context?
 
 I think we should clear the VF_ENABLE bit.  That action is also fraught
 with danger:

The VF Eanble bit hasn't been set yet :-) Actually the spec forbids the
s/w to write those registers (NumVFs, Supported Page Size, etc.) when the
enabling bit is set.

 
   If software Clears VF Enable, software must allow 1 second after VF
   Enable is Cleared before reading any field in the SR-IOV Extended
   Capability or the VF Migration State Array (see Section 3.3.15.1).
 
 Another msleep(1000) here?  Not pretty, but what else can we do?
 
 Not to mention the danger of something else innocently using lspci -
 to read a field in the extended capability -- I suspect we also need to
 block user config accesses before clearing this bit.

Yes, we should block user config access.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html