Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-31 Thread Ilya Maximets
On 1/31/23 02:07, Cheng Li wrote:
> On Mon, Jan 30, 2023 at 05:51:57PM +0100, Ilya Maximets wrote:
>> On 1/30/23 16:13, Cheng Li wrote:
>>> On Mon, Jan 30, 2023 at 01:38:45PM +0100, Ilya Maximets wrote:
 On 1/29/23 02:33, Cheng Li wrote:
> On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
>> On 1/24/23 16:52, Kevin Traynor wrote:
>>> On 08/01/2023 03:55, Cheng Li wrote:
 In my test, if one logical core is pinned to PMD thread while the
 other logical(of the same physical core) is not. The PMD
 performance is affected the by the not-pinned logical core load.
 This maks it difficult to estimate the loads during a dry-run.

 Signed-off-by: Cheng Li 
 ---
   Documentation/topics/dpdk/pmd.rst | 4 
   1 file changed, 4 insertions(+)

 diff --git a/Documentation/topics/dpdk/pmd.rst 
 b/Documentation/topics/dpdk/pmd.rst
 index 9006fd4..b220199 100644
 --- a/Documentation/topics/dpdk/pmd.rst
 +++ b/Documentation/topics/dpdk/pmd.rst
 @@ -312,6 +312,10 @@ If not set, the default variance improvement 
 threshold is 25%.
   when all PMD threads are running on cores from a single NUMA 
 node. In this
   case cross-NUMA datapaths will not change after reassignment.
   +    For the same reason, please ensure that the pmd threads are 
 pinned to SMT
 +    siblings if HyperThreading is enabled. Otherwise, PMDs within a 
 NUMA may
 +    not have the same performance.
>>
>> Uhm... Am I reading this wrong or this note suggests to pin PMD threads
>> to SMT siblings?  It sounds like that's the opposite of what you were
>> trying to say.  Siblings are sharing the same physical core, so if some
>> PMDs are pinned to siblings, the load prediction can not work correctly.
>
> Thanks for the review, Ilya.
>
> The note indeed suggests to pin PMD threads to sliblings. Sliblings share
> the same physical core, if PMDs pin one slibling while leaving the other
> slibling of the same physical core not pinned, the load prediction may
> not work correctly. Because the pinned slibling performance may affected
> by the not-pinned slibling workload. So we sugguest to pin both
> sliblings of the same physical core.

 But this makes sense only if all the PMD threads are on siblings of the
 same physical core.  If more than one physical core is involved, the load
 calculations will be incorrect.  For example, let's say we have 4 threads
 A, B, C and D, where A and B are siblings and C and D are siblings.  And
 it happened that we have only 2 ports, both of which are assigned to A.
 It makes a huge difference whether we move one of the ports from A to B
 or if we move it from A to C.  It is an oversimplified example, but we
 can't rely on load calculations in general case if PMD threads are running
 on SMT siblings.
>>>
>>> Thanks for the detail explanation, now I get your point.
>>>
>>> In your example, PMD B,C,D having no rxq assigned will be in sleep, which
>>> costs little CPU cycles. When a logical core(B) is sleeping, its
>>> slibling core(A) uses most of the physical core resource, so it's
>>> powerfull. If we move one port from A to B, one physical core is
>>> running. If we move one port from A to C, two physical cores are
>>> running. So the result perfromance will be huge different. Hope I
>>> understand correctly.
>>
>> Yes.
>>
>> And even if they are not sleeping.  E.g. if each thread has 1 port
>> to poll, and thread A has 3 ports to poll.  The calculated variance
>> will not match the actual performance impact as the actual available
>> CPU power is different across cores.
> 
> "CPU power is different across cores". How does this happen? Because of
> cpufreq may be different across cores? Or because of different poll
> count?
> As I understand it, no matter how many rxq to poll, no matter the rxqs
> are busy or free(pps), PMD is always in poll loop which cost 100% CPU
> cycles. Maybe it costs different cache resource?

While it's true that PMD threads are just running at 100% CPU usage, when
we talk about SMT siblings, their, let's call it, "effective power" is not
the same as "effective power" of the sibling of another physical core.
Simply because their siblings are running different workloads and utilizing
different physical components of their cores at different times, including,
yes, differences in cache utilization.  So, 100% of one core is not equal to
100% of another core.  Unless they are fully independent physical cores with
no noisy siblings.

Hopefully, thresholds can amortize the difference.  But they will not help
is some of the threads are actually sleeping.

I didn't take into account dynamic frequency adjustments which, of course,
will make everything even more complicated.  Though, if t

Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-30 Thread Cheng Li
On Mon, Jan 30, 2023 at 05:51:57PM +0100, Ilya Maximets wrote:
> On 1/30/23 16:13, Cheng Li wrote:
> > On Mon, Jan 30, 2023 at 01:38:45PM +0100, Ilya Maximets wrote:
> >> On 1/29/23 02:33, Cheng Li wrote:
> >>> On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
>  On 1/24/23 16:52, Kevin Traynor wrote:
> > On 08/01/2023 03:55, Cheng Li wrote:
> >> In my test, if one logical core is pinned to PMD thread while the
> >> other logical(of the same physical core) is not. The PMD
> >> performance is affected the by the not-pinned logical core load.
> >> This maks it difficult to estimate the loads during a dry-run.
> >>
> >> Signed-off-by: Cheng Li 
> >> ---
> >>   Documentation/topics/dpdk/pmd.rst | 4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/Documentation/topics/dpdk/pmd.rst 
> >> b/Documentation/topics/dpdk/pmd.rst
> >> index 9006fd4..b220199 100644
> >> --- a/Documentation/topics/dpdk/pmd.rst
> >> +++ b/Documentation/topics/dpdk/pmd.rst
> >> @@ -312,6 +312,10 @@ If not set, the default variance improvement 
> >> threshold is 25%.
> >>   when all PMD threads are running on cores from a single NUMA 
> >> node. In this
> >>   case cross-NUMA datapaths will not change after reassignment.
> >>   +    For the same reason, please ensure that the pmd threads are 
> >> pinned to SMT
> >> +    siblings if HyperThreading is enabled. Otherwise, PMDs within a 
> >> NUMA may
> >> +    not have the same performance.
> 
>  Uhm... Am I reading this wrong or this note suggests to pin PMD threads
>  to SMT siblings?  It sounds like that's the opposite of what you were
>  trying to say.  Siblings are sharing the same physical core, so if some
>  PMDs are pinned to siblings, the load prediction can not work correctly.
> >>>
> >>> Thanks for the review, Ilya.
> >>>
> >>> The note indeed suggests to pin PMD threads to sliblings. Sliblings share
> >>> the same physical core, if PMDs pin one slibling while leaving the other
> >>> slibling of the same physical core not pinned, the load prediction may
> >>> not work correctly. Because the pinned slibling performance may affected
> >>> by the not-pinned slibling workload. So we sugguest to pin both
> >>> sliblings of the same physical core.
> >>
> >> But this makes sense only if all the PMD threads are on siblings of the
> >> same physical core.  If more than one physical core is involved, the load
> >> calculations will be incorrect.  For example, let's say we have 4 threads
> >> A, B, C and D, where A and B are siblings and C and D are siblings.  And
> >> it happened that we have only 2 ports, both of which are assigned to A.
> >> It makes a huge difference whether we move one of the ports from A to B
> >> or if we move it from A to C.  It is an oversimplified example, but we
> >> can't rely on load calculations in general case if PMD threads are running
> >> on SMT siblings.
> > 
> > Thanks for the detail explanation, now I get your point.
> > 
> > In your example, PMD B,C,D having no rxq assigned will be in sleep, which
> > costs little CPU cycles. When a logical core(B) is sleeping, its
> > slibling core(A) uses most of the physical core resource, so it's
> > powerfull. If we move one port from A to B, one physical core is
> > running. If we move one port from A to C, two physical cores are
> > running. So the result perfromance will be huge different. Hope I
> > understand correctly.
> 
> Yes.
> 
> And even if they are not sleeping.  E.g. if each thread has 1 port
> to poll, and thread A has 3 ports to poll.  The calculated variance
> will not match the actual performance impact as the actual available
> CPU power is different across cores.

"CPU power is different across cores". How does this happen? Because of
cpufreq may be different across cores? Or because of different poll
count?
As I understand it, no matter how many rxq to poll, no matter the rxqs
are busy or free(pps), PMD is always in poll loop which cost 100% CPU
cycles. Maybe it costs different cache resource?

> 
> > 
> > To cover this case, one choice is to use only one of the sliblings while
> > leaving the other slibling unused(isolate). I have ever done some tests,
> > using both sliblings have 25% performance improvement than using only
> > one slibling while leaving the other slibing unused. So this may not be
> > a good choice.
> 
> Leaving 20-25% of performance on the table might not be a wise choice,
> but it seems to be the only way to have predictable results with the
> current implementation of auto load-balancing.
> 
> To confidently suggest users to use SMT siblings with ALB enabled, we
> will need to make ALB aware of SMT topology.  Maybe make it hierarchical,
> i.e. balance between SMT siblings, then between physical core packages
> (or in the opposite order).

Agree, this would be a good solution for the case you mentioned.

> 
> N

Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-30 Thread Ilya Maximets
On 1/30/23 16:13, Cheng Li wrote:
> On Mon, Jan 30, 2023 at 01:38:45PM +0100, Ilya Maximets wrote:
>> On 1/29/23 02:33, Cheng Li wrote:
>>> On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
 On 1/24/23 16:52, Kevin Traynor wrote:
> On 08/01/2023 03:55, Cheng Li wrote:
>> In my test, if one logical core is pinned to PMD thread while the
>> other logical(of the same physical core) is not. The PMD
>> performance is affected the by the not-pinned logical core load.
>> This maks it difficult to estimate the loads during a dry-run.
>>
>> Signed-off-by: Cheng Li 
>> ---
>>   Documentation/topics/dpdk/pmd.rst | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/Documentation/topics/dpdk/pmd.rst 
>> b/Documentation/topics/dpdk/pmd.rst
>> index 9006fd4..b220199 100644
>> --- a/Documentation/topics/dpdk/pmd.rst
>> +++ b/Documentation/topics/dpdk/pmd.rst
>> @@ -312,6 +312,10 @@ If not set, the default variance improvement 
>> threshold is 25%.
>>   when all PMD threads are running on cores from a single NUMA node. 
>> In this
>>   case cross-NUMA datapaths will not change after reassignment.
>>   +    For the same reason, please ensure that the pmd threads are 
>> pinned to SMT
>> +    siblings if HyperThreading is enabled. Otherwise, PMDs within a 
>> NUMA may
>> +    not have the same performance.

 Uhm... Am I reading this wrong or this note suggests to pin PMD threads
 to SMT siblings?  It sounds like that's the opposite of what you were
 trying to say.  Siblings are sharing the same physical core, so if some
 PMDs are pinned to siblings, the load prediction can not work correctly.
>>>
>>> Thanks for the review, Ilya.
>>>
>>> The note indeed suggests to pin PMD threads to sliblings. Sliblings share
>>> the same physical core, if PMDs pin one slibling while leaving the other
>>> slibling of the same physical core not pinned, the load prediction may
>>> not work correctly. Because the pinned slibling performance may affected
>>> by the not-pinned slibling workload. So we sugguest to pin both
>>> sliblings of the same physical core.
>>
>> But this makes sense only if all the PMD threads are on siblings of the
>> same physical core.  If more than one physical core is involved, the load
>> calculations will be incorrect.  For example, let's say we have 4 threads
>> A, B, C and D, where A and B are siblings and C and D are siblings.  And
>> it happened that we have only 2 ports, both of which are assigned to A.
>> It makes a huge difference whether we move one of the ports from A to B
>> or if we move it from A to C.  It is an oversimplified example, but we
>> can't rely on load calculations in general case if PMD threads are running
>> on SMT siblings.
> 
> Thanks for the detail explanation, now I get your point.
> 
> In your example, PMD B,C,D having no rxq assigned will be in sleep, which
> costs little CPU cycles. When a logical core(B) is sleeping, its
> slibling core(A) uses most of the physical core resource, so it's
> powerfull. If we move one port from A to B, one physical core is
> running. If we move one port from A to C, two physical cores are
> running. So the result perfromance will be huge different. Hope I
> understand correctly.

Yes.

And even if they are not sleeping.  E.g. if each thread has 1 port
to poll, and thread A has 3 ports to poll.  The calculated variance
will not match the actual performance impact as the actual available
CPU power is different across cores.

> 
> To cover this case, one choice is to use only one of the sliblings while
> leaving the other slibling unused(isolate). I have ever done some tests,
> using both sliblings have 25% performance improvement than using only
> one slibling while leaving the other slibing unused. So this may not be
> a good choice.

Leaving 20-25% of performance on the table might not be a wise choice,
but it seems to be the only way to have predictable results with the
current implementation of auto load-balancing.

To confidently suggest users to use SMT siblings with ALB enabled, we
will need to make ALB aware of SMT topology.  Maybe make it hierarchical,
i.e. balance between SMT siblings, then between physical core packages
(or in the opposite order).

Note: balancing between NUMA nodes is more complicated because of device
  locality.

Best regards, Ilya Maximets.

> 
>>
>>>
>>>

 Nit: s/pmd/PMD/

 Best regards, Ilya Maximets.

>> +
>>   The minimum time between 2 consecutive PMD auto load balancing 
>> iterations can
>>   also be configured by::
>>   
>
> I don't think it's a hard requirement as siblings should not impact as 
> much as cross-numa might but it's probably good advice in general.
>
> Acked-by: Kevin Traynor 
>
> ___
> dev mailing list
> d...@openvswitch

Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-30 Thread Cheng Li
On Mon, Jan 30, 2023 at 01:38:45PM +0100, Ilya Maximets wrote:
> On 1/29/23 02:33, Cheng Li wrote:
> > On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
> >> On 1/24/23 16:52, Kevin Traynor wrote:
> >>> On 08/01/2023 03:55, Cheng Li wrote:
>  In my test, if one logical core is pinned to PMD thread while the
>  other logical(of the same physical core) is not. The PMD
>  performance is affected the by the not-pinned logical core load.
>  This maks it difficult to estimate the loads during a dry-run.
> 
>  Signed-off-by: Cheng Li 
>  ---
>    Documentation/topics/dpdk/pmd.rst | 4 
>    1 file changed, 4 insertions(+)
> 
>  diff --git a/Documentation/topics/dpdk/pmd.rst 
>  b/Documentation/topics/dpdk/pmd.rst
>  index 9006fd4..b220199 100644
>  --- a/Documentation/topics/dpdk/pmd.rst
>  +++ b/Documentation/topics/dpdk/pmd.rst
>  @@ -312,6 +312,10 @@ If not set, the default variance improvement 
>  threshold is 25%.
>    when all PMD threads are running on cores from a single NUMA node. 
>  In this
>    case cross-NUMA datapaths will not change after reassignment.
>    +    For the same reason, please ensure that the pmd threads are 
>  pinned to SMT
>  +    siblings if HyperThreading is enabled. Otherwise, PMDs within a 
>  NUMA may
>  +    not have the same performance.
> >>
> >> Uhm... Am I reading this wrong or this note suggests to pin PMD threads
> >> to SMT siblings?  It sounds like that's the opposite of what you were
> >> trying to say.  Siblings are sharing the same physical core, so if some
> >> PMDs are pinned to siblings, the load prediction can not work correctly.
> > 
> > Thanks for the review, Ilya.
> > 
> > The note indeed suggests to pin PMD threads to sliblings. Sliblings share
> > the same physical core, if PMDs pin one slibling while leaving the other
> > slibling of the same physical core not pinned, the load prediction may
> > not work correctly. Because the pinned slibling performance may affected
> > by the not-pinned slibling workload. So we sugguest to pin both
> > sliblings of the same physical core.
> 
> But this makes sense only if all the PMD threads are on siblings of the
> same physical core.  If more than one physical core is involved, the load
> calculations will be incorrect.  For example, let's say we have 4 threads
> A, B, C and D, where A and B are siblings and C and D are siblings.  And
> it happened that we have only 2 ports, both of which are assigned to A.
> It makes a huge difference whether we move one of the ports from A to B
> or if we move it from A to C.  It is an oversimplified example, but we
> can't rely on load calculations in general case if PMD threads are running
> on SMT siblings.

Thanks for the detail explanation, now I get your point.

In your example, PMD B,C,D having no rxq assigned will be in sleep, which
costs little CPU cycles. When a logical core(B) is sleeping, its
slibling core(A) uses most of the physical core resource, so it's
powerfull. If we move one port from A to B, one physical core is
running. If we move one port from A to C, two physical cores are
running. So the result perfromance will be huge different. Hope I
understand correctly.

To cover this case, one choice is to use only one of the sliblings while
leaving the other slibling unused(isolate). I have ever done some tests,
using both sliblings have 25% performance improvement than using only
one slibling while leaving the other slibing unused. So this may not be
a good choice.

> 
> > 
> > 
> >>
> >> Nit: s/pmd/PMD/
> >>
> >> Best regards, Ilya Maximets.
> >>
>  +
>    The minimum time between 2 consecutive PMD auto load balancing 
>  iterations can
>    also be configured by::
>    
> >>>
> >>> I don't think it's a hard requirement as siblings should not impact as 
> >>> much as cross-numa might but it's probably good advice in general.
> >>>
> >>> Acked-by: Kevin Traynor 
> >>>
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-30 Thread Ilya Maximets
On 1/29/23 02:33, Cheng Li wrote:
> On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
>> On 1/24/23 16:52, Kevin Traynor wrote:
>>> On 08/01/2023 03:55, Cheng Li wrote:
 In my test, if one logical core is pinned to PMD thread while the
 other logical(of the same physical core) is not. The PMD
 performance is affected the by the not-pinned logical core load.
 This maks it difficult to estimate the loads during a dry-run.

 Signed-off-by: Cheng Li 
 ---
   Documentation/topics/dpdk/pmd.rst | 4 
   1 file changed, 4 insertions(+)

 diff --git a/Documentation/topics/dpdk/pmd.rst 
 b/Documentation/topics/dpdk/pmd.rst
 index 9006fd4..b220199 100644
 --- a/Documentation/topics/dpdk/pmd.rst
 +++ b/Documentation/topics/dpdk/pmd.rst
 @@ -312,6 +312,10 @@ If not set, the default variance improvement 
 threshold is 25%.
   when all PMD threads are running on cores from a single NUMA node. 
 In this
   case cross-NUMA datapaths will not change after reassignment.
   +    For the same reason, please ensure that the pmd threads are pinned 
 to SMT
 +    siblings if HyperThreading is enabled. Otherwise, PMDs within a NUMA 
 may
 +    not have the same performance.
>>
>> Uhm... Am I reading this wrong or this note suggests to pin PMD threads
>> to SMT siblings?  It sounds like that's the opposite of what you were
>> trying to say.  Siblings are sharing the same physical core, so if some
>> PMDs are pinned to siblings, the load prediction can not work correctly.
> 
> Thanks for the review, Ilya.
> 
> The note indeed suggests to pin PMD threads to sliblings. Sliblings share
> the same physical core, if PMDs pin one slibling while leaving the other
> slibling of the same physical core not pinned, the load prediction may
> not work correctly. Because the pinned slibling performance may affected
> by the not-pinned slibling workload. So we sugguest to pin both
> sliblings of the same physical core.

But this makes sense only if all the PMD threads are on siblings of the
same physical core.  If more than one physical core is involved, the load
calculations will be incorrect.  For example, let's say we have 4 threads
A, B, C and D, where A and B are siblings and C and D are siblings.  And
it happened that we have only 2 ports, both of which are assigned to A.
It makes a huge difference whether we move one of the ports from A to B
or if we move it from A to C.  It is an oversimplified example, but we
can't rely on load calculations in general case if PMD threads are running
on SMT siblings.

> 
> 
>>
>> Nit: s/pmd/PMD/
>>
>> Best regards, Ilya Maximets.
>>
 +
   The minimum time between 2 consecutive PMD auto load balancing 
 iterations can
   also be configured by::
   
>>>
>>> I don't think it's a hard requirement as siblings should not impact as much 
>>> as cross-numa might but it's probably good advice in general.
>>>
>>> Acked-by: Kevin Traynor 
>>>
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-28 Thread Cheng Li
On Fri, Jan 27, 2023 at 04:04:55PM +0100, Ilya Maximets wrote:
> On 1/24/23 16:52, Kevin Traynor wrote:
> > On 08/01/2023 03:55, Cheng Li wrote:
> >> In my test, if one logical core is pinned to PMD thread while the
> >> other logical(of the same physical core) is not. The PMD
> >> performance is affected the by the not-pinned logical core load.
> >> This maks it difficult to estimate the loads during a dry-run.
> >>
> >> Signed-off-by: Cheng Li 
> >> ---
> >>   Documentation/topics/dpdk/pmd.rst | 4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/Documentation/topics/dpdk/pmd.rst 
> >> b/Documentation/topics/dpdk/pmd.rst
> >> index 9006fd4..b220199 100644
> >> --- a/Documentation/topics/dpdk/pmd.rst
> >> +++ b/Documentation/topics/dpdk/pmd.rst
> >> @@ -312,6 +312,10 @@ If not set, the default variance improvement 
> >> threshold is 25%.
> >>   when all PMD threads are running on cores from a single NUMA node. 
> >> In this
> >>   case cross-NUMA datapaths will not change after reassignment.
> >>   +    For the same reason, please ensure that the pmd threads are pinned 
> >> to SMT
> >> +    siblings if HyperThreading is enabled. Otherwise, PMDs within a NUMA 
> >> may
> >> +    not have the same performance.
> 
> Uhm... Am I reading this wrong or this note suggests to pin PMD threads
> to SMT siblings?  It sounds like that's the opposite of what you were
> trying to say.  Siblings are sharing the same physical core, so if some
> PMDs are pinned to siblings, the load prediction can not work correctly.

Thanks for the review, Ilya.

The note indeed suggests to pin PMD threads to sliblings. Sliblings share
the same physical core, if PMDs pin one slibling while leaving the other
slibling of the same physical core not pinned, the load prediction may
not work correctly. Because the pinned slibling performance may affected
by the not-pinned slibling workload. So we sugguest to pin both
sliblings of the same physical core.


> 
> Nit: s/pmd/PMD/
> 
> Best regards, Ilya Maximets.
> 
> >> +
> >>   The minimum time between 2 consecutive PMD auto load balancing 
> >> iterations can
> >>   also be configured by::
> >>   
> > 
> > I don't think it's a hard requirement as siblings should not impact as much 
> > as cross-numa might but it's probably good advice in general.
> > 
> > Acked-by: Kevin Traynor 
> > 
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-27 Thread Ilya Maximets
On 1/24/23 16:52, Kevin Traynor wrote:
> On 08/01/2023 03:55, Cheng Li wrote:
>> In my test, if one logical core is pinned to PMD thread while the
>> other logical(of the same physical core) is not. The PMD
>> performance is affected the by the not-pinned logical core load.
>> This maks it difficult to estimate the loads during a dry-run.
>>
>> Signed-off-by: Cheng Li 
>> ---
>>   Documentation/topics/dpdk/pmd.rst | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/Documentation/topics/dpdk/pmd.rst 
>> b/Documentation/topics/dpdk/pmd.rst
>> index 9006fd4..b220199 100644
>> --- a/Documentation/topics/dpdk/pmd.rst
>> +++ b/Documentation/topics/dpdk/pmd.rst
>> @@ -312,6 +312,10 @@ If not set, the default variance improvement threshold 
>> is 25%.
>>   when all PMD threads are running on cores from a single NUMA node. In 
>> this
>>   case cross-NUMA datapaths will not change after reassignment.
>>   +    For the same reason, please ensure that the pmd threads are pinned to 
>> SMT
>> +    siblings if HyperThreading is enabled. Otherwise, PMDs within a NUMA may
>> +    not have the same performance.

Uhm... Am I reading this wrong or this note suggests to pin PMD threads
to SMT siblings?  It sounds like that's the opposite of what you were
trying to say.  Siblings are sharing the same physical core, so if some
PMDs are pinned to siblings, the load prediction can not work correctly.

Nit: s/pmd/PMD/

Best regards, Ilya Maximets.

>> +
>>   The minimum time between 2 consecutive PMD auto load balancing iterations 
>> can
>>   also be configured by::
>>   
> 
> I don't think it's a hard requirement as siblings should not impact as much 
> as cross-numa might but it's probably good advice in general.
> 
> Acked-by: Kevin Traynor 
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-24 Thread Kevin Traynor

On 08/01/2023 03:55, Cheng Li wrote:

In my test, if one logical core is pinned to PMD thread while the
other logical(of the same physical core) is not. The PMD
performance is affected the by the not-pinned logical core load.
This maks it difficult to estimate the loads during a dry-run.

Signed-off-by: Cheng Li 
---
  Documentation/topics/dpdk/pmd.rst | 4 
  1 file changed, 4 insertions(+)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 9006fd4..b220199 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -312,6 +312,10 @@ If not set, the default variance improvement threshold is 
25%.
  when all PMD threads are running on cores from a single NUMA node. In this
  case cross-NUMA datapaths will not change after reassignment.
  
+For the same reason, please ensure that the pmd threads are pinned to SMT

+siblings if HyperThreading is enabled. Otherwise, PMDs within a NUMA may
+not have the same performance.
+
  The minimum time between 2 consecutive PMD auto load balancing iterations can
  also be configured by::
  


I don't think it's a hard requirement as siblings should not impact as 
much as cross-numa might but it's probably good advice in general.


Acked-by: Kevin Traynor 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Add HyperThreading notes for auto-lb usage.

2023-01-20 Thread Simon Horman
On Sun, Jan 08, 2023 at 03:55:45AM +, Cheng Li wrote:
> In my test, if one logical core is pinned to PMD thread while the
> other logical(of the same physical core) is not. The PMD
> performance is affected the by the not-pinned logical core load.
> This maks it difficult to estimate the loads during a dry-run.
> 
> Signed-off-by: Cheng Li 

Makes sense to me.

Reviewed-by: Simon Horman 

> ---
>  Documentation/topics/dpdk/pmd.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/pmd.rst 
> b/Documentation/topics/dpdk/pmd.rst
> index 9006fd4..b220199 100644
> --- a/Documentation/topics/dpdk/pmd.rst
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -312,6 +312,10 @@ If not set, the default variance improvement threshold 
> is 25%.
>  when all PMD threads are running on cores from a single NUMA node. In 
> this
>  case cross-NUMA datapaths will not change after reassignment.
>  
> +For the same reason, please ensure that the pmd threads are pinned to SMT
> +siblings if HyperThreading is enabled. Otherwise, PMDs within a NUMA may
> +not have the same performance.
> +
>  The minimum time between 2 consecutive PMD auto load balancing iterations can
>  also be configured by::
>  
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev