Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability
Yu Zhao yu.z...@intel.com writes: + + +static int sriov_init(struct pci_dev *dev, int pos) +{ + int i; + int rc; + int nres; + u32 pgsz; + u16 ctrl, total, offset, stride; + struct pci_sriov *iov; + struct resource *res; + struct pci_dev *pdev; + + if (dev-pcie_type != PCI_EXP_TYPE_RC_END + dev-pcie_type != PCI_EXP_TYPE_ENDPOINT) + return -ENODEV; + It would be a good idea to put a might_sleep() here just in case the msleep happens below and drivers call it incorrectly. + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0); + msleep(100); That's really long. Hopefully that's really needed. + + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride); + if (!offset || (total 1 !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz); + i = PAGE_SHIFT 12 ? PAGE_SHIFT - 12 : 0; + pgsz = ~((1 i) - 1); + if (!pgsz) + return -EIO; All the error paths don't seem to undo the config space writes. How will the devices behave with half initialized context? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability
On Fri, Feb 13, 2009 at 05:56:44PM +0100, Andi Kleen wrote: + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0); + msleep(100); That's really long. Hopefully that's really needed. Yes and no. The spec says: To allow components to perform internal initialization, system software must wait for at least 100 ms after changing the VF Enable bit from a 0 to a 1, before it is permitted to issue Configuration Requests to the VFs which are enabled by that VF Enable bit. So we don't have to wait here, but we do have to wait before exposing all these virtual functions to the rest of the system. Should we add more complexity, perhaps spawn a thread to do it asynchronously, or add 0.1 seconds to device initialisation? A question without an easy answer, iMO. + + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride); + if (!offset || (total 1 !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz); + i = PAGE_SHIFT 12 ? PAGE_SHIFT - 12 : 0; + pgsz = ~((1 i) - 1); + if (!pgsz) + return -EIO; All the error paths don't seem to undo the config space writes. How will the devices behave with half initialized context? I think we should clear the VF_ENABLE bit. That action is also fraught with danger: If software Clears VF Enable, software must allow 1 second after VF Enable is Cleared before reading any field in the SR-IOV Extended Capability or the VF Migration State Array (see Section 3.3.15.1). Another msleep(1000) here? Not pretty, but what else can we do? Not to mention the danger of something else innocently using lspci - to read a field in the extended capability -- I suspect we also need to block user config accesses before clearing this bit. -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability
On Sat, Feb 14, 2009 at 12:56:44AM +0800, Andi Kleen wrote: Yu Zhao yu.z...@intel.com writes: + + +static int sriov_init(struct pci_dev *dev, int pos) +{ + int i; + int rc; + int nres; + u32 pgsz; + u16 ctrl, total, offset, stride; + struct pci_sriov *iov; + struct resource *res; + struct pci_dev *pdev; + + if (dev-pcie_type != PCI_EXP_TYPE_RC_END + dev-pcie_type != PCI_EXP_TYPE_ENDPOINT) + return -ENODEV; + It would be a good idea to put a might_sleep() here just in case the msleep happens below and drivers call it incorrectly. Yes, will do. + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0); + msleep(100); That's really long. Hopefully that's really needed. It's needed according to SR-IOV spec, however, these lines clear the VF Enable bit if the BIOS or something else has set it. So it doesn't always run into this. + + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride); + if (!offset || (total 1 !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz); + i = PAGE_SHIFT 12 ? PAGE_SHIFT - 12 : 0; + pgsz = ~((1 i) - 1); + if (!pgsz) + return -EIO; All the error paths don't seem to undo the config space writes. How will the devices behave with half initialized context? Since the VF Enable bit is cleared before the initialization, setting others SR-IOV registers won't change state of the device. So it should be OK even without undo these writes as long as the VF Enable bit is not set. Thanks, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/7] PCI: initialize and release SR-IOV capability
On Sat, Feb 14, 2009 at 01:49:59AM +0800, Matthew Wilcox wrote: On Fri, Feb 13, 2009 at 05:56:44PM +0100, Andi Kleen wrote: + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0); + msleep(100); That's really long. Hopefully that's really needed. Yes and no. The spec says: To allow components to perform internal initialization, system software must wait for at least 100 ms after changing the VF Enable bit from a 0 to a 1, before it is permitted to issue Configuration Requests to the VFs which are enabled by that VF Enable bit. So we don't have to wait here, but we do have to wait before exposing all these virtual functions to the rest of the system. Should we add more complexity, perhaps spawn a thread to do it asynchronously, or add 0.1 seconds to device initialisation? A question without an easy answer, iMO. This clears the VF Enable bit only if the BIOS has set it, so it doesn't always happen. Actually the `msleep(100)' should be `ssleep(1)' here, according to the spec you showed us below. I remembered the waiting time incorrectly as 100ms which is the requirment for setting the VF Enable bit rather than clearing it. + + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride); + if (!offset || (total 1 !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz); + i = PAGE_SHIFT 12 ? PAGE_SHIFT - 12 : 0; + pgsz = ~((1 i) - 1); + if (!pgsz) + return -EIO; All the error paths don't seem to undo the config space writes. How will the devices behave with half initialized context? I think we should clear the VF_ENABLE bit. That action is also fraught with danger: The VF Eanble bit hasn't been set yet :-) Actually the spec forbids the s/w to write those registers (NumVFs, Supported Page Size, etc.) when the enabling bit is set. If software Clears VF Enable, software must allow 1 second after VF Enable is Cleared before reading any field in the SR-IOV Extended Capability or the VF Migration State Array (see Section 3.3.15.1). Another msleep(1000) here? Not pretty, but what else can we do? Not to mention the danger of something else innocently using lspci - to read a field in the extended capability -- I suspect we also need to block user config accesses before clearing this bit. Yes, we should block user config access. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html