> Hi Bhanu,
Thanks for the patch - some comments inline. Cheers, Mark >Set the DPDK pmd thread scheduling policy to SCHED_RR and static >priority to highest priority value of the policy. This is to deal with >pmd thread starvation case where another cpu hogging process can get >scheduled/affinitized on to the same core the pmd thread is running >there by significantly impacting the datapath performance. > >Setting the realtime scheduling policy to the pmd threads is one step >towards Fastpath Service Assurance in OVS DPDK. > >The realtime scheduling policy is applied only when CPU mask is passed >to 'pmd-cpu-mask'. For example: > > * In the absence of pmd-cpu-mask, one pmd thread shall be created > and default scheduling policy and prority gets applied. Typo above - 'prority' > > * If pmd-cpu-mask is specified, one ore more pmd threads shall be Typo above - 'ore' > spawned on the corresponding core(s) in the mask and real time > scheduling policy SCHED_RR and highest priority of the policy is > applied to the pmd thread(s). > >To reproduce the pmd thread starvation case: > >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >taskset 0x2 cat /dev/zero > /dev/null & > >With this commit OVS control threads and pmd threads can't have same >affinity ('dpdk-lcore-mask','pmd-cpu-mask' should be non-overlapping). >Also other processes with same affinity as PMD thread will be unresponsive. > >Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> >--- > >v2->v3: >* Move set_priority() function to lib/ovs-numa.c >* Apply realtime scheduling policy and priority to pmd thread only if > pmd-cpu-mask is passed. >* Update INSTALL.DPDK-ADVANCED. > >v1->v2: >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority" > in netdev-dpdk.h >* Rebase > > > INSTALL.DPDK-ADVANCED.md | 15 +++++++++++---- > lib/dpif-netdev.c | 9 +++++++++ > lib/ovs-numa.c | 18 ++++++++++++++++++ > lib/ovs-numa.h | 1 + > 4 files changed, 39 insertions(+), 4 deletions(-) > >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >index 9ae536d..d828290 100644 >--- a/INSTALL.DPDK-ADVANCED.md >+++ b/INSTALL.DPDK-ADVANCED.md >@@ -205,8 +205,10 @@ needs to be affinitized accordingly. > pmd thread is CPU bound, and needs to be affinitized to isolated > cores for optimum performance. > >- By setting a bit in the mask, a pmd thread is created and pinned >- to the corresponding CPU core. e.g. to run a pmd thread on core 2 >+ By setting a bit in the mask, a pmd thread is created, pinned >+ to the corresponding CPU core and the scheduling policy SCHED_RR >+ along with maximum priority of the policy applied to the pmd thread. >+ e.g. to pin a pmd thread on core 2 > > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` > >@@ -234,8 +236,10 @@ needs to be affinitized accordingly. > responsible for different ports/rxq's. Assignment of ports/rxq's to > pmd threads is done automatically. > >- A set bit in the mask means a pmd thread is created and pinned >- to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 >+ A set bit in the mask means a pmd thread is created, pinned to the >+ corresponding CPU core and the scheduling policy SCHED_RR with highest >+ priority of the scheduling policy applied to pmd thread. >+ e.g. to run pmd threads on core 1 and 2 There's some repetition in the last paragraph - I'm reviewing this patch in isolation, so the text may make sense/be required in the full document. > > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` > >@@ -246,6 +250,9 @@ needs to be affinitized accordingly. > > NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 > >+ Note: 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask settings should be >+ non-overlapping. Although it's mentioned in the commit message, it might be worth mentioning here the consequences of attempting to pin non-PMD processes to a pmd-cpu-mask core (i.e. CPU starvation) >+ > ### 4.3 DPDK physical port Rx Queues > > `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>` >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index e0107b7..805d0ae 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -2851,6 +2851,15 @@ pmd_thread_main(void *f_) > ovs_numa_thread_setaffinity_core(pmd->core_id); > dpdk_set_lcore_id(pmd->core_id); > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); >+ >+ /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd thread's >+ * scheduling policy is set to SCHED_RR and the priority to highest >priority >+ * of SCHED_RR policy. In the absence of pmd-cpu-mask, default scheduling >+ * policy and priority shall apply to pmd thread. >+ */ >+ if (pmd->dp->pmd_cmask) { >+ ovs_numa_thread_setpriority(SCHED_RR); >+ } > reload: > emc_cache_init(&pmd->flow_cache); > >diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c >index 7652636..d9f8ea1 100644 >--- a/lib/ovs-numa.c >+++ b/lib/ovs-numa.c >@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned core_id >OVS_UNUSED) > return EOPNOTSUPP; > #endif /* __linux__ */ > } >+ >+void >+ovs_numa_thread_setpriority(int policy) >+{ >+ if (dummy_numa) { >+ return; >+ } >+ >+ struct sched_param threadparam; >+ int err; >+ >+ memset(&threadparam, 0, sizeof(threadparam)); >+ threadparam.sched_priority = sched_get_priority_max(policy); >+ err = pthread_setschedparam(pthread_self(), policy, &threadparam); >+ if (err) { >+ VLOG_ERR("Thread priority error %d",err); The convention in this file seems to be to use ovs_strerror when reporting errors; suggest that you stick with same. >+ } >+} >diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h >index be836b2..94f0884 100644 >--- a/lib/ovs-numa.h >+++ b/lib/ovs-numa.h >@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id); > struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id); > void ovs_numa_dump_destroy(struct ovs_numa_dump *); > int ovs_numa_thread_setaffinity_core(unsigned core_id); >+void ovs_numa_thread_setpriority(int policy); > > #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP) \ > LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump) >-- >2.4.11 > >_______________________________________________ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev