Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-23 Thread Oliver O'Halloran
On Wed, Jul 22, 2020 at 8:06 PM Alexey Kardashevskiy  wrote:
>
> >> Well, realistically the segment size should be 8MB to make this matter
> >> (or the whole window 2GB) which does not seem to happen so it does not
> >> matter.
> >
> > I'm not sure what you mean.
>
> I mean how can we possibly hit this case, what m64_segsize would the
> platform have to trigger this. The whole check seems useless but whatever.

Yeah maybe.

IIRC some old P8 FSP systems had tiny M64 windows so it might have
been an issue there. Maybe we can get rid of it., but I'd rather just
leave the behaviour as-is for now.


Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-22 Thread Alexey Kardashevskiy



On 22/07/2020 15:39, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 6:00 PM Alexey Kardashevskiy  wrote:
>>
>*
> -  * Generally, one M64 BAR maps one IOV BAR. To avoid 
> conflict
> -  * with other devices, IOV BAR size is expanded to be
> -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of 
> M64
> -  * segment size , the expanded size would equal to half of 
> the
> -  * whole M64 space size, which will exhaust the M64 Space 
> and
> -  * limit the system flexibility.  This is a design decision 
> to
> -  * set the boundary to quarter of the M64 segment size.
> +  * The 1/4 limit is arbitrary and can be tweaked.
>*/
> - if (total_vf_bar_sz > gate) {
> - mul = roundup_pow_of_two(total_vfs);
> - dev_info(>dev,
> - "VF BAR Total IOV size %llx > %llx, roundup 
> to %d VFs\n",
> - total_vf_bar_sz, gate, mul);
> - iov->m64_single_mode = true;
> - break;
> - }
> - }
> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> + /*
> +  * On PHB3, the minimum size alignment of M64 BAR in
> +  * single mode is 32MB. If this VF BAR is smaller 
> than
> +  * 32MB, but still too large for a segmented window
> +  * then we can't map it and need to disable SR-IOV 
> for
> +  * this device.


 Why not use single PE mode for such BAR? Better than nothing.
>>>
>>> Suppose you could, but I figured VFs were mainly interesting since you
>>> could give each VF to a separate guest. If there's multiple VFs under
>>> the same single PE BAR then they'd have to be assigned to the same
>>
>> True. But with one PE per VF we can still have 15 (or 14?) isolated VFs
>> which is not hundreds but better than 0.
> 
> We can only use single PE BARs if the per-VF size is >= 32MB due to
> the alignment requirements on P8. If the per-VF size is smaller then
> we're stuck with multiple VFs inside the same BAR which is bad due to
> the PAPR requirements mentioned below. Sure we could look at doing
> something else, but considering this matches the current behaviour
> it's a bit hard to care...
>
>>> guest in order to retain the freeze/unfreeze behaviour that PAPR
>>> requires. I guess that's how it used to work, but it seems better just
>>> to disable them rather than having VFs which sort of work.
>>
>> Well, realistically the segment size should be 8MB to make this matter
>> (or the whole window 2GB) which does not seem to happen so it does not
>> matter.
> 
> I'm not sure what you mean.

I mean how can we possibly hit this case, what m64_segsize would the
platform have to trigger this. The whole check seems useless but whatever.



-- 
Alexey


Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-21 Thread Oliver O'Halloran
On Wed, Jul 15, 2020 at 6:00 PM Alexey Kardashevskiy  wrote:
>
> >>>*
> >>> -  * Generally, one M64 BAR maps one IOV BAR. To avoid 
> >>> conflict
> >>> -  * with other devices, IOV BAR size is expanded to be
> >>> -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of 
> >>> M64
> >>> -  * segment size , the expanded size would equal to half of 
> >>> the
> >>> -  * whole M64 space size, which will exhaust the M64 Space 
> >>> and
> >>> -  * limit the system flexibility.  This is a design decision 
> >>> to
> >>> -  * set the boundary to quarter of the M64 segment size.
> >>> +  * The 1/4 limit is arbitrary and can be tweaked.
> >>>*/
> >>> - if (total_vf_bar_sz > gate) {
> >>> - mul = roundup_pow_of_two(total_vfs);
> >>> - dev_info(>dev,
> >>> - "VF BAR Total IOV size %llx > %llx, roundup 
> >>> to %d VFs\n",
> >>> - total_vf_bar_sz, gate, mul);
> >>> - iov->m64_single_mode = true;
> >>> - break;
> >>> - }
> >>> - }
> >>> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> >>> + /*
> >>> +  * On PHB3, the minimum size alignment of M64 BAR in
> >>> +  * single mode is 32MB. If this VF BAR is smaller 
> >>> than
> >>> +  * 32MB, but still too large for a segmented window
> >>> +  * then we can't map it and need to disable SR-IOV 
> >>> for
> >>> +  * this device.
> >>
> >>
> >> Why not use single PE mode for such BAR? Better than nothing.
> >
> > Suppose you could, but I figured VFs were mainly interesting since you
> > could give each VF to a separate guest. If there's multiple VFs under
> > the same single PE BAR then they'd have to be assigned to the same
>
> True. But with one PE per VF we can still have 15 (or 14?) isolated VFs
> which is not hundreds but better than 0.

We can only use single PE BARs if the per-VF size is >= 32MB due to
the alignment requirements on P8. If the per-VF size is smaller then
we're stuck with multiple VFs inside the same BAR which is bad due to
the PAPR requirements mentioned below. Sure we could look at doing
something else, but considering this matches the current behaviour
it's a bit hard to care...

> > guest in order to retain the freeze/unfreeze behaviour that PAPR
> > requires. I guess that's how it used to work, but it seems better just
> > to disable them rather than having VFs which sort of work.
>
> Well, realistically the segment size should be 8MB to make this matter
> (or the whole window 2GB) which does not seem to happen so it does not
> matter.

I'm not sure what you mean.


Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-15 Thread Alexey Kardashevskiy



On 15/07/2020 16:16, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 3:24 PM Alexey Kardashevskiy  wrote:
>>
>>
>>> @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>> pci_dev *pdev)
>>>   goto disable_iov;
>>>   pdev->dev.archdata.iov_data = iov;
>>>
>>> + /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem 
>>> */
>>
>>
>> WARN_ON_ONCE() then?
> 
> can't hurt
> 
>>> @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>> pci_dev *pdev)
>>>   goto disable_iov;
>>>   }
>>>
>>> - total_vf_bar_sz += pci_iov_resource_size(pdev,
>>> - i + PCI_IOV_RESOURCES);
>>> + vf_bar_sz = pci_iov_resource_size(pdev, i + 
>>> PCI_IOV_RESOURCES);
>>>
>>>   /*
>>> -  * If bigger than quarter of M64 segment size, just round up
>>> -  * power of two.
>>> +  * Generally, one segmented M64 BAR maps one IOV BAR. However,
>>> +  * if a VF BAR is too large we end up wasting a lot of space.
>>> +  * If we've got a BAR that's bigger than greater than 1/4 of 
>>> the
>>
>>
>> bigger, greater, huger? :)
>>
>> Also, a nit: s/got a BAR/got a VF BAR/
> 
> whatever, it's just words

You are talking about these BARs and those BARs and since we want "to
help out
the next sucker^Wperson who needs to tinker with it", using precise term
is kinda essential here.


> 
>>> +  * default window's segment size then switch to using single 
>>> PE
>>> +  * windows. This limits the total number of VFs we can 
>>> support.
>>
>> Just to get idea about absolute numbers here.
>>
>> On my P9:
>>
>> ./pciex@600c3c030/ibm,opal-m64-window
>>  00060200  00060200  0040 
>>
>> so that default window's segment size is 0x40../512 = 512MB?
> 
> Yeah. It'll vary a bit since PHB3 and some PHB4s have 256.
> 
>>>*
>>> -  * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
>>> -  * with other devices, IOV BAR size is expanded to be
>>> -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
>>> -  * segment size , the expanded size would equal to half of the
>>> -  * whole M64 space size, which will exhaust the M64 Space and
>>> -  * limit the system flexibility.  This is a design decision to
>>> -  * set the boundary to quarter of the M64 segment size.
>>> +  * The 1/4 limit is arbitrary and can be tweaked.
>>>*/
>>> - if (total_vf_bar_sz > gate) {
>>> - mul = roundup_pow_of_two(total_vfs);
>>> - dev_info(>dev,
>>> - "VF BAR Total IOV size %llx > %llx, roundup 
>>> to %d VFs\n",
>>> - total_vf_bar_sz, gate, mul);
>>> - iov->m64_single_mode = true;
>>> - break;
>>> - }
>>> - }
>>> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
>>> + /*
>>> +  * On PHB3, the minimum size alignment of M64 BAR in
>>> +  * single mode is 32MB. If this VF BAR is smaller than
>>> +  * 32MB, but still too large for a segmented window
>>> +  * then we can't map it and need to disable SR-IOV for
>>> +  * this device.
>>
>>
>> Why not use single PE mode for such BAR? Better than nothing.
> 
> Suppose you could, but I figured VFs were mainly interesting since you
> could give each VF to a separate guest. If there's multiple VFs under
> the same single PE BAR then they'd have to be assigned to the same

True. But with one PE per VF we can still have 15 (or 14?) isolated VFs
which is not hundreds but better than 0.


> guest in order to retain the freeze/unfreeze behaviour that PAPR
> requires. I guess that's how it used to work, but it seems better just
> to disable them rather than having VFs which sort of work.

Well, realistically the segment size should be 8MB to make this matter
(or the whole window 2GB) which does not seem to happen so it does not
matter.


-- 
Alexey


Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-15 Thread Oliver O'Halloran
On Wed, Jul 15, 2020 at 3:24 PM Alexey Kardashevskiy  wrote:
>
>
> > @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> > pci_dev *pdev)
> >   goto disable_iov;
> >   pdev->dev.archdata.iov_data = iov;
> >
> > + /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem 
> > */
>
>
> WARN_ON_ONCE() then?

can't hurt

> > @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> > pci_dev *pdev)
> >   goto disable_iov;
> >   }
> >
> > - total_vf_bar_sz += pci_iov_resource_size(pdev,
> > - i + PCI_IOV_RESOURCES);
> > + vf_bar_sz = pci_iov_resource_size(pdev, i + 
> > PCI_IOV_RESOURCES);
> >
> >   /*
> > -  * If bigger than quarter of M64 segment size, just round up
> > -  * power of two.
> > +  * Generally, one segmented M64 BAR maps one IOV BAR. However,
> > +  * if a VF BAR is too large we end up wasting a lot of space.
> > +  * If we've got a BAR that's bigger than greater than 1/4 of 
> > the
>
>
> bigger, greater, huger? :)
>
> Also, a nit: s/got a BAR/got a VF BAR/

whatever, it's just words

> > +  * default window's segment size then switch to using single 
> > PE
> > +  * windows. This limits the total number of VFs we can 
> > support.
>
> Just to get idea about absolute numbers here.
>
> On my P9:
>
> ./pciex@600c3c030/ibm,opal-m64-window
>  00060200  00060200  0040 
>
> so that default window's segment size is 0x40../512 = 512MB?

Yeah. It'll vary a bit since PHB3 and some PHB4s have 256.

> >*
> > -  * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> > -  * with other devices, IOV BAR size is expanded to be
> > -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> > -  * segment size , the expanded size would equal to half of the
> > -  * whole M64 space size, which will exhaust the M64 Space and
> > -  * limit the system flexibility.  This is a design decision to
> > -  * set the boundary to quarter of the M64 segment size.
> > +  * The 1/4 limit is arbitrary and can be tweaked.
> >*/
> > - if (total_vf_bar_sz > gate) {
> > - mul = roundup_pow_of_two(total_vfs);
> > - dev_info(>dev,
> > - "VF BAR Total IOV size %llx > %llx, roundup 
> > to %d VFs\n",
> > - total_vf_bar_sz, gate, mul);
> > - iov->m64_single_mode = true;
> > - break;
> > - }
> > - }
> > + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> > + /*
> > +  * On PHB3, the minimum size alignment of M64 BAR in
> > +  * single mode is 32MB. If this VF BAR is smaller than
> > +  * 32MB, but still too large for a segmented window
> > +  * then we can't map it and need to disable SR-IOV for
> > +  * this device.
>
>
> Why not use single PE mode for such BAR? Better than nothing.

Suppose you could, but I figured VFs were mainly interesting since you
could give each VF to a separate guest. If there's multiple VFs under
the same single PE BAR then they'd have to be assigned to the same
guest in order to retain the freeze/unfreeze behaviour that PAPR
requires. I guess that's how it used to work, but it seems better just
to disable them rather than having VFs which sort of work.


Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-14 Thread Alexey Kardashevskiy



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> Using single PE BARs to map an SR-IOV BAR is really a choice about what
> strategy to use when mapping a BAR. It doesn't make much sense for this to
> be a global setting since a device might have one large BAR which needs to
> be mapped with single PE windows and another smaller BAR that can be mapped
> with a regular segmented window. Make the segmented vs single decision a
> per-BAR setting and clean up the logic that decides which mode to use.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 131 +++--
>  arch/powerpc/platforms/powernv/pci.h   |  10 +-
>  2 files changed, 75 insertions(+), 66 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
> b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 8de03636888a..87377d95d648 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -146,10 +146,9 @@
>  static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  {
>   struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> - const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>   struct resource *res;
>   int i;
> - resource_size_t size, total_vf_bar_sz;
> + resource_size_t vf_bar_sz;
>   struct pnv_iov_data *iov;
>   int mul, total_vfs;
>  
> @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   goto disable_iov;
>   pdev->dev.archdata.iov_data = iov;
>  
> + /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem */


WARN_ON_ONCE() then?


>   total_vfs = pci_sriov_get_totalvfs(pdev);
>   mul = phb->ioda.total_pe_num;
> - total_vf_bar_sz = 0;
>  
>   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   res = >resource[i + PCI_IOV_RESOURCES];
> @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   goto disable_iov;
>   }
>  
> - total_vf_bar_sz += pci_iov_resource_size(pdev,
> - i + PCI_IOV_RESOURCES);
> + vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  
>   /*
> -  * If bigger than quarter of M64 segment size, just round up
> -  * power of two.
> +  * Generally, one segmented M64 BAR maps one IOV BAR. However,
> +  * if a VF BAR is too large we end up wasting a lot of space.
> +  * If we've got a BAR that's bigger than greater than 1/4 of the


bigger, greater, huger? :)

Also, a nit: s/got a BAR/got a VF BAR/


> +  * default window's segment size then switch to using single PE
> +  * windows. This limits the total number of VFs we can support.

Just to get idea about absolute numbers here.

On my P9:

./pciex@600c3c030/ibm,opal-m64-window
 00060200  00060200  0040 

so that default window's segment size is 0x40../512 = 512MB?


>*
> -  * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> -  * with other devices, IOV BAR size is expanded to be
> -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> -  * segment size , the expanded size would equal to half of the
> -  * whole M64 space size, which will exhaust the M64 Space and
> -  * limit the system flexibility.  This is a design decision to
> -  * set the boundary to quarter of the M64 segment size.
> +  * The 1/4 limit is arbitrary and can be tweaked.
>*/
> - if (total_vf_bar_sz > gate) {
> - mul = roundup_pow_of_two(total_vfs);
> - dev_info(>dev,
> - "VF BAR Total IOV size %llx > %llx, roundup to 
> %d VFs\n",
> - total_vf_bar_sz, gate, mul);
> - iov->m64_single_mode = true;
> - break;
> - }
> - }
> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> + /*
> +  * On PHB3, the minimum size alignment of M64 BAR in
> +  * single mode is 32MB. If this VF BAR is smaller than
> +  * 32MB, but still too large for a segmented window
> +  * then we can't map it and need to disable SR-IOV for
> +  * this device.


Why not use single PE mode for such BAR? Better than nothing.


> +  */
> + if (vf_bar_sz < SZ_32M) {
> + pci_err(pdev, "VF BAR%d: %pR can't be mapped in 
> single PE mode\n",
> + i, res);
> + goto disable_iov;
> + }
>  
> - for (i = 0; i 

[PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-07-09 Thread Oliver O'Halloran
Using single PE BARs to map an SR-IOV BAR is really a choice about what
strategy to use when mapping a BAR. It doesn't make much sense for this to
be a global setting since a device might have one large BAR which needs to
be mapped with single PE windows and another smaller BAR that can be mapped
with a regular segmented window. Make the segmented vs single decision a
per-BAR setting and clean up the logic that decides which mode to use.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-sriov.c | 131 +++--
 arch/powerpc/platforms/powernv/pci.h   |  10 +-
 2 files changed, 75 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 8de03636888a..87377d95d648 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -146,10 +146,9 @@
 static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 {
struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
-   const resource_size_t gate = phb->ioda.m64_segsize >> 2;
struct resource *res;
int i;
-   resource_size_t size, total_vf_bar_sz;
+   resource_size_t vf_bar_sz;
struct pnv_iov_data *iov;
int mul, total_vfs;
 
@@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev 
*pdev)
goto disable_iov;
pdev->dev.archdata.iov_data = iov;
 
+   /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem */
total_vfs = pci_sriov_get_totalvfs(pdev);
mul = phb->ioda.total_pe_num;
-   total_vf_bar_sz = 0;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = >resource[i + PCI_IOV_RESOURCES];
@@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
goto disable_iov;
}
 
-   total_vf_bar_sz += pci_iov_resource_size(pdev,
-   i + PCI_IOV_RESOURCES);
+   vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
 
/*
-* If bigger than quarter of M64 segment size, just round up
-* power of two.
+* Generally, one segmented M64 BAR maps one IOV BAR. However,
+* if a VF BAR is too large we end up wasting a lot of space.
+* If we've got a BAR that's bigger than greater than 1/4 of the
+* default window's segment size then switch to using single PE
+* windows. This limits the total number of VFs we can support.
 *
-* Generally, one M64 BAR maps one IOV BAR. To avoid conflict
-* with other devices, IOV BAR size is expanded to be
-* (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
-* segment size , the expanded size would equal to half of the
-* whole M64 space size, which will exhaust the M64 Space and
-* limit the system flexibility.  This is a design decision to
-* set the boundary to quarter of the M64 segment size.
+* The 1/4 limit is arbitrary and can be tweaked.
 */
-   if (total_vf_bar_sz > gate) {
-   mul = roundup_pow_of_two(total_vfs);
-   dev_info(>dev,
-   "VF BAR Total IOV size %llx > %llx, roundup to 
%d VFs\n",
-   total_vf_bar_sz, gate, mul);
-   iov->m64_single_mode = true;
-   break;
-   }
-   }
+   if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
+   /*
+* On PHB3, the minimum size alignment of M64 BAR in
+* single mode is 32MB. If this VF BAR is smaller than
+* 32MB, but still too large for a segmented window
+* then we can't map it and need to disable SR-IOV for
+* this device.
+*/
+   if (vf_bar_sz < SZ_32M) {
+   pci_err(pdev, "VF BAR%d: %pR can't be mapped in 
single PE mode\n",
+   i, res);
+   goto disable_iov;
+   }
 
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = >resource[i + PCI_IOV_RESOURCES];
-   if (!res->flags || res->parent)
+   iov->m64_single_mode[i] = true;
continue;
+   }
+
 
-   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
/*
-* On PHB3, the minimum size alignment of M64 BAR in single
-* mode is 32MB.
+* This BAR can be mapped with one segmented window, so