> Thanks for sharing your experience with it. My fear with the proposal is that
> someone turns this on and then tells us performance is worse and/or OVS
> assignments/ALB are broken, because it has an impact on their case.
> 
> In terms of limiting possible negative effects,
> - it can be opt-in and recommended only for phy ports
> - could print a warning when it is enabled
> - ALB is currently disabled with cross-numa polling (except a limited
> case) but it's clear you want to remove that restriction too
> - for ALB, a user could increase the improvement threshold to account for any
> reassignments triggered by inaccuracies

[Jan] Yes, we want to enable cross-NUMA polling of selected (typically phy) 
ports in ALB "group" mode as an opt-in config option (default off). Based on 
our observations we are not too much concerned with the loss of ALB prediction 
accuracy but increasing the threshold may be a way of taking that into account, 
if wanted.

> 
> There is also some improvements that can be made to the proposed method
> when used with group assignment,
> - we can prefer local numa where there is no difference between pmd cores.
> (e.g. two unused cores available, pick the local numa one)
> - we can flatten the list of pmds, so best pmd can be selected. This will 
> remove
> issues with RR numa when there are different num of pmd cores or loads per
> numa.
> - I wrote an RFC that does these two items, I can post when(/if!) consensus is
> reached on the broader topic

[Jan] In our alternative version of the current upstream "group" ALB [1] we 
already maintained a flat list of PMDs. So we would support that feature. Using 
NUMA-locality as a tie-breaker makes sense.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384546.html

> 
> In summary, it's a trade-off,
> 
> With no cross-numa polling (current):
> - won't have any impact to OVS assignment or ALB accuracy
> - there could be a bottleneck on one numa pmds while other numa pmd cores
> are idle and unused
> 
> With cross-numa rx pinning (current):
> - will have access to pmd cores on all numas
> - may require more cycles for some traffic paths
> - won't have any impact to OVS assignment or ALB accuracy
> - >1 pinned rxqs per core may cause a bottleneck depending on traffic
> 
> With cross-numa interface setting (proposed):
> - will have access to all pmd cores on all numas (i.e. no unused pmd cores
> during highest load)
> - will require more cycles for some traffic paths
> - will impact on OVS assignment and ALB accuracy
> 
> Anything missing above, or is it a reasonable summary?

I think that is a reasonable summary, albeit I would have characterized the 
third option a bit more positively:
- Gives ALB maximum freedom to balance load of PMDs on all NUMA nodes (in the 
likely scenario of uneven VM load on the NUMAs)
- Accepts an increase of cycles on cross-NUMA paths for a better utilization of 
a free PMD cycles
- Mostly suitable for phy ports due to limited cycle increase for cross-NUMA 
polling of phy rx queues
- Could negatively impact the ALB prediction accuracy in certain scenarios

We will post a new version of our patch [2] for cross-numa polling on selected 
ports adapted to the current OVS master shortly.

[2] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384547.html

Thanks, Jan


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to