Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Yanqin Wei Mon, 06 Jul 2020 19:30:38 -0700

Hi Shahaji,

It seems to be caused by some periodic task.  In the pmd thread, pmd auto load 
balance would be done periodically.
/* Time in microseconds of the interval in which rxq processing cycles used
* in rxq to pmd assignments is measured and stored. */
#define PMD_RXQ_INTERVAL_LEN 10000000LL


Would you like to disable it if it is not necessary?

Best Regards,
Wei Yanqin

From: Shahaji Bhosle <shahaji.bho...@broadcom.com>
Sent: Monday, July 6, 2020 8:24 PM
To: Yanqin Wei <yanqin....@arm.com>
Cc: Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd 
<n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed 
<lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex Barba 
<alex.ba...@broadcom.com>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Hi Yanqin,
The drops are random intervals, sometimes I can run for minutes without drops. 
The case is very borderline with when CPUs are close to 99% and with around 
1000 flows. We see the drops once every 10-15 seconds and its random in nature. 
If I use one ring per core the drops go away, if I enable EMC then the drops go 
away etc.
Thanks, Shahaji

On Mon, Jul 6, 2020 at 5:27 AM Yanqin Wei 
<yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote:
Hi Shahaji,

I have not measured context switch overhead, but I feel it should be 
acceptable. Because 10Mpps throughput with zero-packet drop(20s) could be 
achieved in some arm server.  Maybe you could make performance profiling on 
your test bench to find out the root cause of performance degradation of  
multi-rings.

Best Regards,
Wei Yanqin

From: Shahaji Bhosle 
<shahaji.bho...@broadcom.com<mailto:shahaji.bho...@broadcom.com>>
Sent: Thursday, July 2, 2020 9:27 PM
To: Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>>
Cc: Flavio Leitner <f...@sysclose.org<mailto:f...@sysclose.org>>; 
ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; nd 
<n...@arm.com<mailto:n...@arm.com>>; Ilya Maximets 
<i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>; Lee Reed 
<lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay Gupta 
<vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex Barba 
<alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Thanks Yanqin,
I am not seeing any context switches beyond 40usec in our do nothing loop test. 
But when OvS packets multiple rings(queues) on the same CPU and the number of 
packet it starts batching (MAX_BURST_SIZE) the toops will will take more time, 
I can see rings getting getting filled up. And then its a feedback loop. CPUs 
are running close to 100% any disturbance at that point I think is too much.
Do you have any data that you use to monitor OvS. I am doing all the above 
experiments without OvS.
Thanks, Shahaji

On Thu, Jul 2, 2020 at 4:43 AM Yanqin Wei 
<yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote:
Hi Shahaji,

IIUC, 1Hz time tick cannot be disabled even if full dynticks, right? But I have 
no idea of why it caused packet loss because it should be only a small overhead 
when rcu_nocbs is enabled .

Best Regards,
Wei Yanqin

===========

From: Shahaji Bhosle 
<shahaji.bho...@broadcom.com<mailto:shahaji.bho...@broadcom.com>>
Sent: Thursday, July 2, 2020 6:11 AM
To: Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>>
Cc: Flavio Leitner <f...@sysclose.org<mailto:f...@sysclose.org>>; 
ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; nd 
<n...@arm.com<mailto:n...@arm.com>>; Ilya Maximets 
<i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>; Lee Reed 
<lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay Gupta 
<vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex Barba 
<alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Hi Yanqin,
I added the patch you gave me to my script which runs a do nothing for loop. 
You can see the spikes in the below plot. 976/1000 times we are perfect, but 
around every 1 second u can see something going wrong. I dont see anything 
wrong in the trace-cmd world.
Thanks, Shahaji

root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
+ TARGET=2
+ MASK=4
+ NUM_ITER=1000
+ NUM_MS=100
+ N=37500000
+ LOGFILE=loop_1000iter_100ms.log
+ tee loop_1000iter_100ms.log
+ trace-cmd record -p function_graph -e all -M 4 -o trace_1000iter_100ms.dat 
taskset -c 2 /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
  plugin 'function_graph'
Cycles/Second (Hz) = 3000000000
Nano-seconds per cycle = 0.3333

Using ISB() before rte_rdtsc()
num_iter: 1000
do_nothing_loop for (N)=37500000
Running 1000 iterations of do_nothing_loop for (N)=37500000

Average =          100282.193430333 u-secs
Max     =          124777.488666667 u-secs
Min     =          100000.017666667 u-secs
\u03c3  =            1931.352376508 u-secs

Average =              300846580.29 cycles
Max     =              374332466.00 cycles
Min     =              300000053.00 cycles
\u03c3  =                5794057.13 cycles

#\u03c3 = events
 0 = 976
 1 = 3
 2 = 4
 3 = 3
 4 = 3
 5 = 2
 6 = 2
 7 = 2
 8 = 1
 9 = 1
10 = 1
12 = 2




On Wed, Jul 1, 2020 at 3:57 AM Yanqin Wei 
<mailto:yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote:
Hi Shahaji,

Adding isb instruction can help rdtsc precise, which sync system counter to 
cntvct_el0. There is a patch in DPDK. https://patchwork.dpdk.org/patch/66561/
So it may be not related with intermittent drops you observed.

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: dev 
> <mailto:ovs-dev-boun...@openvswitch.org<mailto:ovs-dev-boun...@openvswitch.org>>
>  On Behalf Of Shahaji Bhosle
> via dev
> Sent: Wednesday, July 1, 2020 6:05 AM
> To: Flavio Leitner <mailto:f...@sysclose.org<mailto:f...@sysclose.org>>
> Cc: mailto:ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; Ilya 
> Maximets <mailto:i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>;
> Lee Reed <mailto:lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay 
> Gupta
> <mailto:vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex 
> Barba <mailto:alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>>
> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP 
> (iperf3)
>
> Hi Flavio,
> I still see intermittent drops with rcu_nocbs. So I wrote that do_nothing()
> loop..to avoid all the other distractions to see if Linux is messing with the 
> OVS
> loop just to see what is going on. The interesting thing I see the case *BOLD*
> below where I use an ISB() instruction my STD deviation is well within Both 
> the
> results are basically DO NOTHING FOR 100msec and see what happens to
> time :) Thanks, Shahaji
>
> static inline uint64_t
> *rte_get_tsc_cycles*(void)
> {
> uint64_t tsc;
> #ifdef USE_ISB
> asm volatile("*isb*; mrs %0, pmccntr_el0" : "=r"(tsc)); #else asm
> volatile("mrs %0, pmccntr_el0" : "=r"(tsc)); #endif return tsc; } #endif
> /*RTE_ARM_EAL_RDTSC_USE_PMU*/
>
> ==================================
> usleep(100);
> for (volatile int i=0; i<num_iter; i++) { const uint64_t tsc_start =
> rte_get_tsc_cycles();
> /* do nothig for 1us second */
> *#ifdef USE_ISB*
> for(volatile int j=0; j < num_us; j++);       *<<<<<<<<<<<< THIS IS MESSED
> UP, 100msec do nothing, I am getting 2033 usec STD DEVIATION* #else
> *for(volatile int j=0; j < num_us; j++);       <<<<<<<<<<<< THIS LOOP HAS
> VERY LOW STD DEVIATION*
> * rte_isb();*
> #endif
> volatile uint64_t tsc_end = rte_get_tsc_cycles(); cycles[i] = tsc_end - 
> tsc_start; }
> usleep(100); calc_avg_var_stddev(num_iter, &cycles[0]);
> ===================================
> *#ifdef USE_ISB*
> root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
> + TARGET=2
> + MASK=4
> + NUM_ITER=1000
> + NUM_MS=100
> + N=37500000
> + LOGFILE=loop_1000iter_100ms.log
> + tee loop_1000iter_100ms.log
> + trace-cmd record -p function_graph -e all -M 4 -o
> trace_1000iter_100ms.dat taskset -c 2
> /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
>   plugin 'function_graph'
> Cycles/Second (Hz) = 3000000000
> Nano-seconds per cycle = 0.3333
>
> Using ISB() before rte_rdtsc()
> num_iter: 1000
> do_nothing_loop for (N)=37500000
> Running 1000 iterations of do_nothing_loop for (N)=37500000
>
> Average =          100328.158561667 u-secs
> Max =          123024.795333333 u-secs
> Min =          100000.017666667 u-secs
> *\sigma  =            2033.118969489 u-secs*
>
> Average =              300984475.69 cycles
> Max =              369074386.00 cycles
> Min =              300000053.00 cycles
> \sigma  =                6099356.91 cycles
>
> #\sigma = events
>  0 = 968
>  1 = 8
>  2 = 5
>  3 = 3
>  4 = 3
>  5 = 3
>  6 = 3
>  8 = 3
> 10 = 3
> 11 = 1
>
> *#ELSE*
> root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_loop
> + TARGET=2
> + MASK=4
> + NUM_ITER=1000
> + NUM_MS=100
> + N=7316912
> + LOGFILE=loop_1000iter_100ms.log
> + tee loop_1000iter_100ms.log
> + trace-cmd record -p function_graph -e all -M 4 -o
> trace_1000iter_100ms.dat taskset -c 2
> /home/root/arm_stb_user_loop_isb_loop
> 1000 7316912
>   plugin 'function_graph'
> Cycles/Second (Hz) = 3000000000
> Nano-seconds per cycle = 0.3333
>
> NO ISB() before rte_rdtsc()
> num_iter: 1000
> do_nothing_loop for (N)=7316912
> Running 1000 iterations of do_nothing_loop for (N)=7316912
>
> Average =           99999.863256333 u-secs
> Max =          100052.790333333 u-secs
> Min =           99997.807333333 u-secs
> *\u03c3 =               6.497043982 u-secs*
>
> Average =              299999589.77 cycles
> Max =              300158371.00 cycles
> Min =              299993422.00 cycles
> \u03c3 =                  19491.13 cycles
>
> #\u03c3 = events
>  0 = 900
>  2 = 79
>  4 = 17
>  5 = 3
>  8 = 1
>
>
> On Tue, Jun 30, 2020 at 4:42 PM Flavio Leitner 
> <mailto:f...@sysclose.org<mailto:f...@sysclose.org>> wrote:
>
> >
> >
> > Hi Shahaji,
> >
> > Did it help with the rcu_nocbs?
> >
> > fbl
> >
> > On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote:
> > > Thanks Flavio,
> > > Are there any special requirements for RCU on ARM vs x86.
> > >
> > > I am following what the above document is saying...Do you think I
> > > need to do something more than the below?
> > > Thanks again and appreciate the help. Shahaji
> > >
> > > 1. Isolate the CPU cores
> > > *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7* 2. Setting
> > > CONFIG_NO_HZ_FULL=y
> > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat
> > > /proc/config.gz
> > > |grep HZ
> > > CONFIG_NO_HZ_COMMON=y
> > > # CONFIG_HZ_PERIODIC is not set
> > > # CONFIG_NO_HZ_IDLE is not set
> > > *CONFIG_NO_HZ_FULL*=y
> > > # CONFIG_NO_HZ_FULL_ALL is not set
> > > # CONFIG_NO_HZ is not set
> > > # CONFIG_HZ_100 is not set
> > > CONFIG_HZ_250=y
> > > # CONFIG_HZ_300 is not set
> > > # CONFIG_HZ_1000 is not set
> > > CONFIG_HZ=250
> > >
> > >
> > >
> > > On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner 
> > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>>
> > wrote:
> > >
> > > >
> > > > Right, you might want to review Documentation/timers/no_hz.rst
> > > > from the kernel sources and look for RCU implications section
> > > > where it explains how to move RCU callbacks.
> > > >
> > > > fbl
> > > >
> > > > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote:
> > > > > Hi Flavio,
> > > > > I wrote a small program which has do_nothing for loop and I
> > > > > measure
> > the
> > > > > timestamps across the do nothing loop. I am seeing 3% of the
> > > > > time
> > around
> > > > > the 1 second mark when the arch_timer fires I get the timestamps
> > > > > to
> > be
> > > > off
> > > > > by 25% of the exprected value. I ran trace-cmd to see what is
> > > > > going
> > on
> > > > and
> > > > > see the below. Looks like some issue with *gic_handle_irg*(),
> > > > > not
> > seeing
> > > > > tihs behaviour on x86 host, something special with ARM v8.
> > > > > Thanks, Shahaji
> > > > >
> > > > >   %21.77  (14181) arm_stb_user_lo                    rcu_dyntick #922
> > > > >          |
> > > > >          --- *rcu_dyntick*
> > > > >             |
> > > > >             |--%46.85-- gic_handle_irq  # 432
> > > > >             |
> > > > >             |--%23.32-- context_tracking_user_exit  # 215
> > > > >             |
> > > > >             |--%22.34-- context_tracking_user_enter  # 206
> > > > >             |
> > > > >             |--%2.60-- SyS_execve  # 24
> > > > >             |
> > > > >             |--%1.30-- do_page_fault  # 12
> > > > >             |
> > > > >             |--%0.65-- SyS_write  # 6
> > > > >             |
> > > > >             |--%0.65-- schedule  # 6
> > > > >             |
> > > > >             |--%0.65-- SyS_nanosleep  # 6
> > > > >             |
> > > > >             |--%0.65-- syscall_trace_enter  # 6
> > > > >             |
> > > > >             |--%0.65-- SyS_faccessat  # 6
> > > > >
> > > > >   %5.01  (14181) arm_stb_user_lo                rcu_utilization #212
> > > > >          |
> > > > >          --- *rcu_utilization*
> > > > >             |
> > > > >             |--%96.23-- gic_handle_irq  # 204
> > > > >             |
> > > > >             |--%1.89-- SyS_nanosleep  # 4
> > > > >             |
> > > > >             |--%0.94-- SyS_exit_group  # 2
> > > > >             |
> > > > >             |--%0.94-- do_notify_resume  # 2
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo                      user_exit #206
> > > > >          |
> > > > >          --- *user_exit*
> > > > >           context_tracking_user_exit
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo     context_tracking_user_exit #206
> > > > >          |
> > > > >          --- context_tracking_user_exit
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo    context_tracking_user_enter #206
> > > > >          |
> > > > >          --- context_tracking_user_enter
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo                     user_enter #206
> > > > >          |
> > > > >          --- *user_enter*
> > > > >           context_tracking_user_enter
> > > > >
> > > > >   %2.95  (14181) arm_stb_user_lo                 gic_handle_irq #125
> > > > >          |
> > > > >          --- gic_handle_irq
> > > > >
> > > > >
> > > > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner
> > > > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>>
> > wrote:
> > > > >
> > > > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote:
> > > > > > > Hi Flavio,
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > > I have captured the suggested information but do not see
> > > > > > > anything
> > > > that
> > > > > > > could cause the packet drops.
> > > > > > > Can you please take a look at the below data and see if you
> > > > > > > can
> > find
> > > > > > > something unusual ?
> > > > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated
> > cores.
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------------------------------------------
> > > > > > > root@bcm958802a8046c:~# cstats ; sleep 10; cycles pmd thread
> > > > > > > numa_id 0 core_id 1:
> > > > > > >   idle cycles: 99140849 (7.93%)
> > > > > > >   processing cycles: 1151423715 (92.07%)
> > > > > > >   avg cycles per packet: 116.94 (1250564564/10693918)
> > > > > > >   avg processing cycles per packet: 107.67
> > > > > > > (1151423715/10693918) pmd thread numa_id 0 core_id 2:
> > > > > > >   idle cycles: 118373662 (9.47%)
> > > > > > >   processing cycles: 1132193442 (90.53%)
> > > > > > >   avg cycles per packet: 124.39 (1250567104/10053309)
> > > > > > >   avg processing cycles per packet: 112.62
> > > > > > > (1132193442/10053309) pmd thread numa_id 0 core_id 3:
> > > > > > >   idle cycles: 53805933 (4.30%)
> > > > > > >   processing cycles: 1196762002 (95.70%)
> > > > > > >   avg cycles per packet: 107.35 (1250567935/11649948)
> > > > > > >   avg processing cycles per packet: 102.73
> > > > > > > (1196762002/11649948) pmd thread numa_id 0 core_id 4:
> > > > > > >   idle cycles: 189102938 (15.12%)
> > > > > > >   processing cycles: 1061463293 (84.88%)
> > > > > > >   avg cycles per packet: 143.47 (1250566231/8716828)
> > > > > > >   avg processing cycles per packet: 121.77
> > > > > > > (1061463293/8716828) pmd thread numa_id 0 core_id 5:
> > > > > > > pmd thread numa_id 0 core_id 6:
> > > > > > > pmd thread numa_id 0 core_id 7:
> > > > > >
> > > > > >
> > > > > > The core_id 3 is high loaded, and then it's more likely to
> > > > > > show the drop issue when some other event happens.
> > > > > >
> > > > > > I think you need to run perf as I recommended before and see
> > > > > > if there are context switches happening and why they are happening.
> > > > > >
> > > > > > If a context switch happens, it's either because the core is
> > > > > > not well isolated or some other thing is going on. It will
> > > > > > help to understand why the queue wasn't serviced for a certain
> > > > > > amount of time.
> > > > > >
> > > > > > The issue is that running perf might introduce some load, so
> > > > > > you will need adjust the traffic rate accordingly.
> > > > > >
> > > > > > HTH,
> > > > > > fbl
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Runtime summary*                          comm  parent
> >  sched-in
> > > > > > > run-time    min-run     avg-run     max-run  stddev  migrations
> > > > > > >                                           (count)       (msec)
> > > >  (msec)
> > > > > > >    (msec)      (msec)       %
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > -----------------------------------------------
> > > > > > >                 ksoftirqd/0[7]       2          1        0.079
> > > > 0.079
> > > > > > >     0.079       0.079    0.00       0
> > > > > > >                   rcu_sched[8]       2         14        0.067
> > > > 0.002
> > > > > > >     0.004       0.009    9.96       0
> > > > > > >                    rcuos/4[38]       2          6        0.027
> > > > 0.002
> > > > > > >     0.004       0.008   20.97       0
> > > > > > >                    rcuos/5[45]       2          4        0.018
> > > > 0.004
> > > > > > >     0.004       0.005    6.63       0
> > > > > > >                kworker/0:1[71]       2         12        0.156
> > > > 0.008
> > > > > > >     0.013       0.019    6.72       0
> > > > > > >                  mmcqd/0[1230]       2          3        0.054
> > > > 0.001
> > > > > > >     0.018       0.031   47.29       0
> > > > > > >             kworker/0:1H[1248]       2          1        0.006
> > > > 0.006
> > > > > > >     0.006       0.006    0.00       0
> > > > > > >            kworker/u16:2[1547]       2         16        0.045
> > > > 0.001
> > > > > > >     0.002       0.012   26.19       0
> > > > > > >                     ntpd[5282]       1          1        0.063
> > > > 0.063
> > > > > > >     0.063       0.063    0.00       0
> > > > > > >                 watchdog[6988]       1          2        0.089
> > > > 0.012
> > > > > > >     0.044       0.076   72.26       0
> > > > > > >             ovs-vswitchd[9239]       1          2        0.326
> > > > 0.152
> > > > > > >     0.163       0.173    6.45       0
> > > > > > >        revalidator8[9309/9239]    9239          2        1.260
> > > > 0.607
> > > > > > >     0.630       0.652    3.58       0
> > > > > > >                    perf[27150]   27140          1        0.000
> > > > 0.000
> > > > > > >     0.000       0.000    0.00       0
> > > > > > >
> > > > > > > Terminated tasks:
> > > > > > >                   sleep[27151]   27150          4        1.002
> > > > 0.015
> > > > > > >     0.250       0.677   58.22       0
> > > > > > >
> > > > > > > Idle stats:
> > > > > > >     CPU  0 idle for    999.814  msec  ( 99.84%)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *CPU  1 idle entire time window    CPU  2 idle entire time window
> > > > > > CPU  3
> > > > > > > idle entire time window    CPU  4 idle entire time window*
> > > > > > >     CPU  5 idle for    500.326  msec  ( 49.96%)
> > > > > > >     CPU  6 idle entire time window
> > > > > > >     CPU  7 idle entire time window
> > > > > > >
> > > > > > >     Total number of unique tasks: 14 Total number of context
> > > > > > > switches: 115
> > > > > > >            Total run time (msec):  3.198
> > > > > > >     Total scheduling time (msec): 1001.425  (x 8)
> > > > > > > (END)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *02:16:22      UID      TGID       TID    %usr %system  %guest
> > > >  %wait
> > > > > > >  %CPU   CPU  Command *02:16:23        0      9239         -
> > 100.00
> > > > > > 0.00
> > > > > > >    0.00    0.00  100.00     5  ovs-vswitchd
> > > > > > > 02:16:23        0         -      9239    2.00    0.00    0.00
> > 0.00
> > > > > > >  2.00     5  |__ovs-vswitchd
> > > > > > > 02:16:23        0         -      9240    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     0  |__vfio-sync
> > > > > > > 02:16:23        0         -      9241    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__eal-intr-thread
> > > > > > > 02:16:23        0         -      9242    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__dpdk_watchdog1
> > > > > > > 02:16:23        0         -      9244    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__urcu2
> > > > > > > 02:16:23        0         -      9279    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__ct_clean3
> > > > > > > 02:16:23        0         -      9308    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__handler9
> > > > > > > 02:16:23        0         -      9309    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__revalidator8
> > > > > > > 02:16:23        0         -      9328    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     6  |__pmd13
> > > > > > > 02:16:23        0         -      9330  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     3  |__pmd12
> > > > > > > 02:16:23        0         -      9331  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     1  |__pmd11
> > > > > > > 02:16:23        0         -      9332    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     7  |__pmd10
> > > > > > > 02:16:23        0         -      9333    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__pmd16
> > > > > > > 02:16:23        0         -      9334  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     2  |__pmd15
> > > > > > > 02:16:23        0         -      9335  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     4  |__pmd14
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------------------------------------------
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinay
> > > > > > >
> > > > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner
> > > > > > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle
> > > > > > > > via
> > dev
> > > > wrote:
> > > > > > > > > Hi Ben/Ilya,
> > > > > > > > > Hope you guys are doing well and staying safe. I have
> > > > > > > > > been
> > > > chasing a
> > > > > > > > weird
> > > > > > > > > problem with small drops and I think that is causing
> > > > > > > > > lots of
> > TCP
> > > > > > > > > retransmission.
> > > > > > > > >
> > > > > > > > > Setup details
> > > > > > > > > iPerf3(1k-5K
> > > > > > > > >
> > Servers)<--
> DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<====
> > > > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<---
> iPerf3(Clients
> > > > > > > > > )
> > > > > > > > >
> > > > > > > > > All the Drops are ring drops on BONDed functions on the
> > server
> > > > > > side.  I
> > > > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and
> > > > > > > > > DPDK2
> > all
> > > > > > running
> > > > > > > > with
> > > > > > > > > 4 Rx rings each.
> > > > > > > > >
> > > > > > > > > What is interesting is when I give each Rx rings its own
> > > > > > > > > CPU
> > the
> > > > > > drops go
> > > > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1
> > > > > > > > > the
> > drops
> > > > go
> > > > > > away.
> > > > > > > > > But I need to scale up the number of flows so trying to
> > > > > > > > > run
> > this
> > > > > > with EMC
> > > > > > > > > disabled.
> > > > > > > > >
> > > > > > > > > I can tell that the rings are not getting serviced for
> > 30-40usec
> > > > > > because
> > > > > > > > of
> > > > > > > > > some kind context switch or interrupts on these cores. I
> > > > > > > > > have
> > > > tried
> > > > > > to do
> > > > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all
> > > > > > > > > the
> > > > interrupts
> > > > > > > > away
> > > > > > > > > from these cores etc. But nothing helps. I mean it
> > > > > > > > > improves,
> > but
> > > > the
> > > > > > > > drops
> > > > > > > > > still happen.
> > > > > > > >
> > > > > > > > When you disable the EMC (or reduce its efficiency) the
> > > > > > > > per
> > packet
> > > > cost
> > > > > > > > increases, then it becomes more sensitive to variations.
> > > > > > > > If you
> > > > share
> > > > > > > > a CPU with multiple queues, you decrease the amount of
> > > > > > > > time
> > > > available
> > > > > > > > to process the queue. In either case, there will be less
> > > > > > > > room
> > to
> > > > > > tolerate
> > > > > > > > variations.
> > > > > > > >
> > > > > > > > Well, you might want to use 'perf' and monitor for the
> > scheduling
> > > > > > events
> > > > > > > > and then based on the stack trace see what is causing it
> > > > > > > > and
> > try to
> > > > > > > > prevent it.
> > > > > > > >
> > > > > > > > For example:
> > > > > > > > # perf record -e sched:sched_switch -a -g sleep 1
> > > > > > > >
> > > > > > > > For instance, you might see that another NIC used for
> > management
> > > > has
> > > > > > > > IRQs assigned to one isolated CPU. You can move it to
> > > > > > > > another
> > CPU
> > > > to
> > > > > > > > reduce the noise, etc...
> > > > > > > >
> > > > > > > > Another suggestion is look at PMD thread idle statistics
> > because it
> > > > > > > > will tell you how much "extra" room you have left. As it
> > approaches
> > > > > > > > to 0, more fine tuned your setup needs to be to avoid drops.
> > > > > > > >
> > > > > > > > HTH,
> > > > > > > > --
> > > > > > > > fbl
> > > > > > > >
> > > > > >
> > > > > > --
> > > > > > fbl
> > > > > >
> > > >
> > > > --
> > > > fbl
> > > >
> >
> > --
> > fbl
> >
> _______________________________________________
> dev mailing list
> mailto:d...@openvswitch.org<mailto:d...@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Reply via email to