[ovs-dev] 回复: [PATCH 6/6] ci: add the opts about ALLOW_EXPERIMENTAL_API

2022-12-17 Thread Nole Zhang


> -邮件原件-
> 发件人: David Marchand 
> 发送时间: 2022年12月17日 4:02
> 收件人: Simon Horman 
> 抄送: d...@openvswitch.org; Eli Britstein ; Chaoyong He
> ; oss-drivers ; Ilya
> Maximets ; Nole Zhang 
> 主题: Re: [ovs-dev] [PATCH 6/6] ci: add the opts about
> ALLOW_EXPERIMENTAL_API
> 
> [You don't often get email from david.march...@redhat.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Fri, Dec 16, 2022 at 4:52 PM Simon Horman 
> wrote:
> >
> > From: Peng Zhang 
> >
> > This commit adds support for OVS-DPDK with
> -DALLOW_EXPERIMENTAL_API.
> >
> > Tunnel offloads and Meter offloads are experimental APIs in DPDK. To
> > enable these features, compile need add -DALLOW_EXPERIMENTAL_API. So
> > in workflow, we also need need the new test with
> > -DALLOW_EXPERIMENTAL_API.
> >
> > Signed-off-by: Peng Zhang 
> 
> We have a similar patch in the dpdk-latest branch.
> https://github.com/openvswitch/ovs/commit/a8f6be98801f0c43d52173843d
> 649df2af5e1c0d
> Is something wrong with it?

The patch is good for me, I just didn't notice it,thanks for your notice.
> 
> 
> --
> David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif-netdev: calculate per numa variance

2022-12-17 Thread Cheng Li
Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.

Considering the following case. Numa0 is free because VMs on numa0
are not sending pkts, while numa1 is busy. Within numa1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 62 will make numa1 much more balance. For numa1
the variance improment will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not tigger auto_lb.

```
numa_id   core_id  kpps
  030 0
  031 0
  094 0
  095 0
  1   126  1500
  1   127  1000
  163  1000
  162   500
```

As auto_lb doesn't balance workload across numa nodes. So it makes
more sense to calculate variance improvemnet per numa node.

Signed-off-by: Cheng Li 
Signed-off-by: Kevin Traynor 
Co-authored-by: Kevin Traynor 
---

Notes:
v2:
- Commit msg update
- Update doc for per numa variance
- Reword variance improment log msg
- Not break numa variance loop for debug purpose

 Documentation/topics/dpdk/pmd.rst |  8 ++--
 lib/dpif-netdev.c | 85 +++
 2 files changed, 46 insertions(+), 47 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index b259cc8..c335757 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -278,10 +278,10 @@ If a PMD core is detected to be above the load threshold 
and the minimum
 pre-requisites are met, a dry-run using the current PMD assignment algorithm is
 performed.
 
-The current variance of load between the PMD cores and estimated variance from
-the dry-run are both calculated. If the estimated dry-run variance is improved
-from the current one by the variance threshold, a new Rx queue to PMD
-assignment will be performed.
+For each numa node, the current variance of load between the PMD cores and
+estimated variance from the dry-run are both calculated. If any numa's
+estimated dry-run variance is improved from the current one by the variance
+threshold, a new Rx queue to PMD assignment will be performed.
 
 For example, to set the variance improvement threshold to 40%::
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2c08a71..7ff923b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -6076,39 +6076,33 @@ rxq_scheduling(struct dp_netdev *dp)
 static uint64_t variance(uint64_t a[], int n);
 
 static uint64_t
-sched_numa_list_variance(struct sched_numa_list *numa_list)
+sched_numa_variance(struct sched_numa *numa)
 {
-struct sched_numa *numa;
 uint64_t *percent_busy = NULL;
-unsigned total_pmds = 0;
 int n_proc = 0;
 uint64_t var;
 
-HMAP_FOR_EACH (numa, node, _list->numas) {
-total_pmds += numa->n_pmds;
-percent_busy = xrealloc(percent_busy,
-total_pmds * sizeof *percent_busy);
+percent_busy = xmalloc(numa->n_pmds * sizeof *percent_busy);
 
-for (unsigned i = 0; i < numa->n_pmds; i++) {
-struct sched_pmd *sched_pmd;
-uint64_t total_cycles = 0;
+for (unsigned i = 0; i < numa->n_pmds; i++) {
+struct sched_pmd *sched_pmd;
+uint64_t total_cycles = 0;
 
-sched_pmd = >pmds[i];
-/* Exclude isolated PMDs from variance calculations. */
-if (sched_pmd->isolated == true) {
-continue;
-}
-/* Get the total pmd cycles for an interval. */
-atomic_read_relaxed(_pmd->pmd->intrvl_cycles, _cycles);
+sched_pmd = >pmds[i];
+/* Exclude isolated PMDs from variance calculations. */
+if (sched_pmd->isolated == true) {
+continue;
+}
+/* Get the total pmd cycles for an interval. */
+atomic_read_relaxed(_pmd->pmd->intrvl_cycles, _cycles);
 
-if (total_cycles) {
-/* Estimate the cycles to cover all intervals. */
-total_cycles *= PMD_INTERVAL_MAX;
-percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
- / total_cycles;
-} else {
-percent_busy[n_proc++] = 0;
-}
+if (total_cycles) {
+/* Estimate the cycles to cover all intervals. */
+total_cycles *= PMD_INTERVAL_MAX;
+percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
+/ total_cycles;
+} else {
+percent_busy[n_proc++] = 0;
 }
 }
 var = variance(percent_busy, n_proc);
@@ -6182,6 +6176,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
 struct sched_numa_list