On Wed, Jul 15, 2020 at 3:24 PM Alexey Kardashevskiy <a...@ozlabs.ru> wrote: > > > > @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > > pci_dev *pdev) > > goto disable_iov; > > pdev->dev.archdata.iov_data = iov; > > > > + /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem > > */ > > > WARN_ON_ONCE() then?
can't hurt > > @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > > pci_dev *pdev) > > goto disable_iov; > > } > > > > - total_vf_bar_sz += pci_iov_resource_size(pdev, > > - i + PCI_IOV_RESOURCES); > > + vf_bar_sz = pci_iov_resource_size(pdev, i + > > PCI_IOV_RESOURCES); > > > > /* > > - * If bigger than quarter of M64 segment size, just round up > > - * power of two. > > + * Generally, one segmented M64 BAR maps one IOV BAR. However, > > + * if a VF BAR is too large we end up wasting a lot of space. > > + * If we've got a BAR that's bigger than greater than 1/4 of > > the > > > bigger, greater, huger? :) > > Also, a nit: s/got a BAR/got a VF BAR/ whatever, it's just words > > + * default window's segment size then switch to using single > > PE > > + * windows. This limits the total number of VFs we can > > support. > > Just to get idea about absolute numbers here. > > On my P9: > > ./pciex@600c3c0300000/ibm,opal-m64-window > 00060200 00000000 00060200 00000000 00000040 00000000 > > so that default window's segment size is 0x40.0000.0000/512 = 512MB? Yeah. It'll vary a bit since PHB3 and some PHB4s have 256. > > * > > - * Generally, one M64 BAR maps one IOV BAR. To avoid conflict > > - * with other devices, IOV BAR size is expanded to be > > - * (total_pe * VF_BAR_size). When VF_BAR_size is half of M64 > > - * segment size , the expanded size would equal to half of the > > - * whole M64 space size, which will exhaust the M64 Space and > > - * limit the system flexibility. This is a design decision to > > - * set the boundary to quarter of the M64 segment size. > > + * The 1/4 limit is arbitrary and can be tweaked. > > */ > > - if (total_vf_bar_sz > gate) { > > - mul = roundup_pow_of_two(total_vfs); > > - dev_info(&pdev->dev, > > - "VF BAR Total IOV size %llx > %llx, roundup > > to %d VFs\n", > > - total_vf_bar_sz, gate, mul); > > - iov->m64_single_mode = true; > > - break; > > - } > > - } > > + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) { > > + /* > > + * On PHB3, the minimum size alignment of M64 BAR in > > + * single mode is 32MB. If this VF BAR is smaller than > > + * 32MB, but still too large for a segmented window > > + * then we can't map it and need to disable SR-IOV for > > + * this device. > > > Why not use single PE mode for such BAR? Better than nothing. Suppose you could, but I figured VFs were mainly interesting since you could give each VF to a separate guest. If there's multiple VFs under the same single PE BAR then they'd have to be assigned to the same guest in order to retain the freeze/unfreeze behaviour that PAPR requires. I guess that's how it used to work, but it seems better just to disable them rather than having VFs which sort of work.