> Thanks for sharing your experience with it. My fear with the proposal is that > someone turns this on and then tells us performance is worse and/or OVS > assignments/ALB are broken, because it has an impact on their case. > > In terms of limiting possible negative effects, > - it can be opt-in and recommended only for phy ports > - could print a warning when it is enabled > - ALB is currently disabled with cross-numa polling (except a limited > case) but it's clear you want to remove that restriction too > - for ALB, a user could increase the improvement threshold to account for any > reassignments triggered by inaccuracies
[Jan] Yes, we want to enable cross-NUMA polling of selected (typically phy) ports in ALB "group" mode as an opt-in config option (default off). Based on our observations we are not too much concerned with the loss of ALB prediction accuracy but increasing the threshold may be a way of taking that into account, if wanted. > > There is also some improvements that can be made to the proposed method > when used with group assignment, > - we can prefer local numa where there is no difference between pmd cores. > (e.g. two unused cores available, pick the local numa one) > - we can flatten the list of pmds, so best pmd can be selected. This will > remove > issues with RR numa when there are different num of pmd cores or loads per > numa. > - I wrote an RFC that does these two items, I can post when(/if!) consensus is > reached on the broader topic [Jan] In our alternative version of the current upstream "group" ALB [1] we already maintained a flat list of PMDs. So we would support that feature. Using NUMA-locality as a tie-breaker makes sense. [1] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384546.html > > In summary, it's a trade-off, > > With no cross-numa polling (current): > - won't have any impact to OVS assignment or ALB accuracy > - there could be a bottleneck on one numa pmds while other numa pmd cores > are idle and unused > > With cross-numa rx pinning (current): > - will have access to pmd cores on all numas > - may require more cycles for some traffic paths > - won't have any impact to OVS assignment or ALB accuracy > - >1 pinned rxqs per core may cause a bottleneck depending on traffic > > With cross-numa interface setting (proposed): > - will have access to all pmd cores on all numas (i.e. no unused pmd cores > during highest load) > - will require more cycles for some traffic paths > - will impact on OVS assignment and ALB accuracy > > Anything missing above, or is it a reasonable summary? I think that is a reasonable summary, albeit I would have characterized the third option a bit more positively: - Gives ALB maximum freedom to balance load of PMDs on all NUMA nodes (in the likely scenario of uneven VM load on the NUMAs) - Accepts an increase of cycles on cross-NUMA paths for a better utilization of a free PMD cycles - Mostly suitable for phy ports due to limited cycle increase for cross-NUMA polling of phy rx queues - Could negatively impact the ALB prediction accuracy in certain scenarios We will post a new version of our patch [2] for cross-numa polling on selected ports adapted to the current OVS master shortly. [2] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384547.html Thanks, Jan _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev