On 04/14/2016 11:40 AM, David Gibson wrote:
On Fri, Apr 08, 2016 at 04:36:44PM +1000, Alexey Kardashevskiy wrote:
When SRIOV is disabled, the existing code presumes there is no
virtual function (VF) in use and destroys all associated PEs.
However it is possible to get into the situation when the user
activated SRIOV disabling while a VF is still in use via VFIO.
For example, unbinding a physical function (PF) while there is a guest
running with a VF passed throuhgh via VFIO will trigger the bug.

This defines an IODA2-specific IOMMU group release() callback.
This moves all the disposal code from pnv_ioda_release_vf_PE() to this
new callback so the cleanup happens when the last user of an IOMMU
group released the reference.

As pnv_pci_ioda2_release_dma_pe() was reduced to just calling
iommu_group_put(), this merges pnv_pci_ioda2_release_dma_pe()
into pnv_ioda_release_vf_PE().

Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++++++++++++------------------
  1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ce9f2bf..8108c54 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1333,27 +1333,25 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe 
*pe, bool enable);
  static void pnv_pci_ioda2_group_release(void *iommu_data)
  {
        struct iommu_table_group *table_group = iommu_data;
+       struct pnv_ioda_pe *pe = container_of(table_group,
+                       struct pnv_ioda_pe, table_group);
+       struct pci_controller *hose = pci_bus_to_host(pe->parent_dev->bus);
+       struct pnv_phb *phb = hose->private_data;
+       struct iommu_table *tbl = pe->table_group.tables[0];
+       int64_t rc;

-       table_group->group = NULL;
-}
-
-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct 
pnv_ioda_pe *pe)
-{
-       struct iommu_table    *tbl;
-       int64_t               rc;
-
-       tbl = pe->table_group.tables[0];
        rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);

Is it safe to go manipulating the PE windows, etc. after SR-IOV is
disabled?

Manipulating windows in this case is just updating 8 bytes in the TVT. At this point a VF is expected to be destroyed but PE is expected to remain not free so pnv_ioda2_pick_m64_pe() (or pnv_ioda2_reserve_m64_pe()?) won't use it.


When SR-IOV is disabled, you need to immediately disable the VF (I'm
guessing that happens somewhere) and stop all access to the VF
"hardware".

drivers/pci/iov.c
===
static void sriov_disable(struct pci_dev *dev)
{
...
for (i = 0; i < iov->num_VFs; i++)
        pci_iov_remove_virtfn(dev, i, 0);
...
pcibios_sriov_disable(dev);
===

pcibios_sriov_disable() is where pnv_pci_ioda2_release_dma_pe() is called from.

Only the iommu group structure *has* to stick around
until the reference count drops to zero.  I think other structures and
hardware reconfiguration can be deferred or done immediately,
whichever is more convenient.

I deferred everything because of convenience as iommu_table_group is embedded into pnv_ioda struct, not a pointer.



        if (rc)
                pe_warn(pe, "OPAL error %ld release DMA window\n", rc);

        pnv_pci_ioda2_set_bypass(pe, false);
-       if (pe->table_group.group) {
-               iommu_group_put(pe->table_group.group);
-               BUG_ON(pe->table_group.group);
-       }
+
+       BUG_ON(!tbl);
        pnv_pci_ioda2_table_free_pages(tbl);
-       iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
+       iommu_free_table(tbl, of_node_full_name(pe->parent_dev->dev.of_node));
+
+       pnv_ioda_deconfigure_pe(phb, pe);
+       pnv_ioda_free_pe(phb, pe->pe_number);
  }

  static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
@@ -1376,16 +1374,13 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
                if (pe->parent_dev != pdev)
                        continue;

-               pnv_pci_ioda2_release_dma_pe(pdev, pe);
-
                /* Remove from list */
                mutex_lock(&phb->ioda.pe_list_mutex);
                list_del(&pe->list);
                mutex_unlock(&phb->ioda.pe_list_mutex);

-               pnv_ioda_deconfigure_pe(phb, pe);
-
-               pnv_ioda_free_pe(phb, pe->pe_number);
+               if (pe->table_group.group)
+                       iommu_group_put(pe->table_group.group);
        }
  }




--
Alexey
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to