Hi Shahaji, It seems to be caused by some periodic task. In the pmd thread, pmd auto load balance would be done periodically. /* Time in microseconds of the interval in which rxq processing cycles used * in rxq to pmd assignments is measured and stored. */ #define PMD_RXQ_INTERVAL_LEN 10000000LL
Would you like to disable it if it is not necessary? Best Regards, Wei Yanqin From: Shahaji Bhosle <shahaji.bho...@broadcom.com> Sent: Monday, July 6, 2020 8:24 PM To: Yanqin Wei <yanqin....@arm.com> Cc: Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd <n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed <lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex Barba <alex.ba...@broadcom.com> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3) Hi Yanqin, The drops are random intervals, sometimes I can run for minutes without drops. The case is very borderline with when CPUs are close to 99% and with around 1000 flows. We see the drops once every 10-15 seconds and its random in nature. If I use one ring per core the drops go away, if I enable EMC then the drops go away etc. Thanks, Shahaji On Mon, Jul 6, 2020 at 5:27 AM Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote: Hi Shahaji, I have not measured context switch overhead, but I feel it should be acceptable. Because 10Mpps throughput with zero-packet drop(20s) could be achieved in some arm server. Maybe you could make performance profiling on your test bench to find out the root cause of performance degradation of multi-rings. Best Regards, Wei Yanqin From: Shahaji Bhosle <shahaji.bho...@broadcom.com<mailto:shahaji.bho...@broadcom.com>> Sent: Thursday, July 2, 2020 9:27 PM To: Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>> Cc: Flavio Leitner <f...@sysclose.org<mailto:f...@sysclose.org>>; ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; nd <n...@arm.com<mailto:n...@arm.com>>; Ilya Maximets <i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>; Lee Reed <lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay Gupta <vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex Barba <alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3) Thanks Yanqin, I am not seeing any context switches beyond 40usec in our do nothing loop test. But when OvS packets multiple rings(queues) on the same CPU and the number of packet it starts batching (MAX_BURST_SIZE) the toops will will take more time, I can see rings getting getting filled up. And then its a feedback loop. CPUs are running close to 100% any disturbance at that point I think is too much. Do you have any data that you use to monitor OvS. I am doing all the above experiments without OvS. Thanks, Shahaji On Thu, Jul 2, 2020 at 4:43 AM Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote: Hi Shahaji, IIUC, 1Hz time tick cannot be disabled even if full dynticks, right? But I have no idea of why it caused packet loss because it should be only a small overhead when rcu_nocbs is enabled . Best Regards, Wei Yanqin =========== From: Shahaji Bhosle <shahaji.bho...@broadcom.com<mailto:shahaji.bho...@broadcom.com>> Sent: Thursday, July 2, 2020 6:11 AM To: Yanqin Wei <yanqin....@arm.com<mailto:yanqin....@arm.com>> Cc: Flavio Leitner <f...@sysclose.org<mailto:f...@sysclose.org>>; ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; nd <n...@arm.com<mailto:n...@arm.com>>; Ilya Maximets <i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>; Lee Reed <lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay Gupta <vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex Barba <alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3) Hi Yanqin, I added the patch you gave me to my script which runs a do nothing for loop. You can see the spikes in the below plot. 976/1000 times we are perfect, but around every 1 second u can see something going wrong. I dont see anything wrong in the trace-cmd world. Thanks, Shahaji root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc + TARGET=2 + MASK=4 + NUM_ITER=1000 + NUM_MS=100 + N=37500000 + LOGFILE=loop_1000iter_100ms.log + tee loop_1000iter_100ms.log + trace-cmd record -p function_graph -e all -M 4 -o trace_1000iter_100ms.dat taskset -c 2 /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000 plugin 'function_graph' Cycles/Second (Hz) = 3000000000 Nano-seconds per cycle = 0.3333 Using ISB() before rte_rdtsc() num_iter: 1000 do_nothing_loop for (N)=37500000 Running 1000 iterations of do_nothing_loop for (N)=37500000 Average = 100282.193430333 u-secs Max = 124777.488666667 u-secs Min = 100000.017666667 u-secs \u03c3 = 1931.352376508 u-secs Average = 300846580.29 cycles Max = 374332466.00 cycles Min = 300000053.00 cycles \u03c3 = 5794057.13 cycles #\u03c3 = events 0 = 976 1 = 3 2 = 4 3 = 3 4 = 3 5 = 2 6 = 2 7 = 2 8 = 1 9 = 1 10 = 1 12 = 2 On Wed, Jul 1, 2020 at 3:57 AM Yanqin Wei <mailto:yanqin....@arm.com<mailto:yanqin....@arm.com>> wrote: Hi Shahaji, Adding isb instruction can help rdtsc precise, which sync system counter to cntvct_el0. There is a patch in DPDK. https://patchwork.dpdk.org/patch/66561/ So it may be not related with intermittent drops you observed. Best Regards, Wei Yanqin > -----Original Message----- > From: dev > <mailto:ovs-dev-boun...@openvswitch.org<mailto:ovs-dev-boun...@openvswitch.org>> > On Behalf Of Shahaji Bhosle > via dev > Sent: Wednesday, July 1, 2020 6:05 AM > To: Flavio Leitner <mailto:f...@sysclose.org<mailto:f...@sysclose.org>> > Cc: mailto:ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; Ilya > Maximets <mailto:i.maxim...@samsung.com<mailto:i.maxim...@samsung.com>>; > Lee Reed <mailto:lee.r...@broadcom.com<mailto:lee.r...@broadcom.com>>; Vinay > Gupta > <mailto:vinay.gu...@broadcom.com<mailto:vinay.gu...@broadcom.com>>; Alex > Barba <mailto:alex.ba...@broadcom.com<mailto:alex.ba...@broadcom.com>> > Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP > (iperf3) > > Hi Flavio, > I still see intermittent drops with rcu_nocbs. So I wrote that do_nothing() > loop..to avoid all the other distractions to see if Linux is messing with the > OVS > loop just to see what is going on. The interesting thing I see the case *BOLD* > below where I use an ISB() instruction my STD deviation is well within Both > the > results are basically DO NOTHING FOR 100msec and see what happens to > time :) Thanks, Shahaji > > static inline uint64_t > *rte_get_tsc_cycles*(void) > { > uint64_t tsc; > #ifdef USE_ISB > asm volatile("*isb*; mrs %0, pmccntr_el0" : "=r"(tsc)); #else asm > volatile("mrs %0, pmccntr_el0" : "=r"(tsc)); #endif return tsc; } #endif > /*RTE_ARM_EAL_RDTSC_USE_PMU*/ > > ================================== > usleep(100); > for (volatile int i=0; i<num_iter; i++) { const uint64_t tsc_start = > rte_get_tsc_cycles(); > /* do nothig for 1us second */ > *#ifdef USE_ISB* > for(volatile int j=0; j < num_us; j++); *<<<<<<<<<<<< THIS IS MESSED > UP, 100msec do nothing, I am getting 2033 usec STD DEVIATION* #else > *for(volatile int j=0; j < num_us; j++); <<<<<<<<<<<< THIS LOOP HAS > VERY LOW STD DEVIATION* > * rte_isb();* > #endif > volatile uint64_t tsc_end = rte_get_tsc_cycles(); cycles[i] = tsc_end - > tsc_start; } > usleep(100); calc_avg_var_stddev(num_iter, &cycles[0]); > =================================== > *#ifdef USE_ISB* > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc > + TARGET=2 > + MASK=4 > + NUM_ITER=1000 > + NUM_MS=100 > + N=37500000 > + LOGFILE=loop_1000iter_100ms.log > + tee loop_1000iter_100ms.log > + trace-cmd record -p function_graph -e all -M 4 -o > trace_1000iter_100ms.dat taskset -c 2 > /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000 > plugin 'function_graph' > Cycles/Second (Hz) = 3000000000 > Nano-seconds per cycle = 0.3333 > > Using ISB() before rte_rdtsc() > num_iter: 1000 > do_nothing_loop for (N)=37500000 > Running 1000 iterations of do_nothing_loop for (N)=37500000 > > Average = 100328.158561667 u-secs > Max = 123024.795333333 u-secs > Min = 100000.017666667 u-secs > *\sigma = 2033.118969489 u-secs* > > Average = 300984475.69 cycles > Max = 369074386.00 cycles > Min = 300000053.00 cycles > \sigma = 6099356.91 cycles > > #\sigma = events > 0 = 968 > 1 = 8 > 2 = 5 > 3 = 3 > 4 = 3 > 5 = 3 > 6 = 3 > 8 = 3 > 10 = 3 > 11 = 1 > > *#ELSE* > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_loop > + TARGET=2 > + MASK=4 > + NUM_ITER=1000 > + NUM_MS=100 > + N=7316912 > + LOGFILE=loop_1000iter_100ms.log > + tee loop_1000iter_100ms.log > + trace-cmd record -p function_graph -e all -M 4 -o > trace_1000iter_100ms.dat taskset -c 2 > /home/root/arm_stb_user_loop_isb_loop > 1000 7316912 > plugin 'function_graph' > Cycles/Second (Hz) = 3000000000 > Nano-seconds per cycle = 0.3333 > > NO ISB() before rte_rdtsc() > num_iter: 1000 > do_nothing_loop for (N)=7316912 > Running 1000 iterations of do_nothing_loop for (N)=7316912 > > Average = 99999.863256333 u-secs > Max = 100052.790333333 u-secs > Min = 99997.807333333 u-secs > *\u03c3 = 6.497043982 u-secs* > > Average = 299999589.77 cycles > Max = 300158371.00 cycles > Min = 299993422.00 cycles > \u03c3 = 19491.13 cycles > > #\u03c3 = events > 0 = 900 > 2 = 79 > 4 = 17 > 5 = 3 > 8 = 1 > > > On Tue, Jun 30, 2020 at 4:42 PM Flavio Leitner > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>> wrote: > > > > > > > Hi Shahaji, > > > > Did it help with the rcu_nocbs? > > > > fbl > > > > On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote: > > > Thanks Flavio, > > > Are there any special requirements for RCU on ARM vs x86. > > > > > > I am following what the above document is saying...Do you think I > > > need to do something more than the below? > > > Thanks again and appreciate the help. Shahaji > > > > > > 1. Isolate the CPU cores > > > *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7* 2. Setting > > > CONFIG_NO_HZ_FULL=y > > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat > > > /proc/config.gz > > > |grep HZ > > > CONFIG_NO_HZ_COMMON=y > > > # CONFIG_HZ_PERIODIC is not set > > > # CONFIG_NO_HZ_IDLE is not set > > > *CONFIG_NO_HZ_FULL*=y > > > # CONFIG_NO_HZ_FULL_ALL is not set > > > # CONFIG_NO_HZ is not set > > > # CONFIG_HZ_100 is not set > > > CONFIG_HZ_250=y > > > # CONFIG_HZ_300 is not set > > > # CONFIG_HZ_1000 is not set > > > CONFIG_HZ=250 > > > > > > > > > > > > On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner > > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>> > > wrote: > > > > > > > > > > > Right, you might want to review Documentation/timers/no_hz.rst > > > > from the kernel sources and look for RCU implications section > > > > where it explains how to move RCU callbacks. > > > > > > > > fbl > > > > > > > > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote: > > > > > Hi Flavio, > > > > > I wrote a small program which has do_nothing for loop and I > > > > > measure > > the > > > > > timestamps across the do nothing loop. I am seeing 3% of the > > > > > time > > around > > > > > the 1 second mark when the arch_timer fires I get the timestamps > > > > > to > > be > > > > off > > > > > by 25% of the exprected value. I ran trace-cmd to see what is > > > > > going > > on > > > > and > > > > > see the below. Looks like some issue with *gic_handle_irg*(), > > > > > not > > seeing > > > > > tihs behaviour on x86 host, something special with ARM v8. > > > > > Thanks, Shahaji > > > > > > > > > > %21.77 (14181) arm_stb_user_lo rcu_dyntick #922 > > > > > | > > > > > --- *rcu_dyntick* > > > > > | > > > > > |--%46.85-- gic_handle_irq # 432 > > > > > | > > > > > |--%23.32-- context_tracking_user_exit # 215 > > > > > | > > > > > |--%22.34-- context_tracking_user_enter # 206 > > > > > | > > > > > |--%2.60-- SyS_execve # 24 > > > > > | > > > > > |--%1.30-- do_page_fault # 12 > > > > > | > > > > > |--%0.65-- SyS_write # 6 > > > > > | > > > > > |--%0.65-- schedule # 6 > > > > > | > > > > > |--%0.65-- SyS_nanosleep # 6 > > > > > | > > > > > |--%0.65-- syscall_trace_enter # 6 > > > > > | > > > > > |--%0.65-- SyS_faccessat # 6 > > > > > > > > > > %5.01 (14181) arm_stb_user_lo rcu_utilization #212 > > > > > | > > > > > --- *rcu_utilization* > > > > > | > > > > > |--%96.23-- gic_handle_irq # 204 > > > > > | > > > > > |--%1.89-- SyS_nanosleep # 4 > > > > > | > > > > > |--%0.94-- SyS_exit_group # 2 > > > > > | > > > > > |--%0.94-- do_notify_resume # 2 > > > > > > > > > > %4.86 (14181) arm_stb_user_lo user_exit #206 > > > > > | > > > > > --- *user_exit* > > > > > context_tracking_user_exit > > > > > > > > > > %4.86 (14181) arm_stb_user_lo context_tracking_user_exit #206 > > > > > | > > > > > --- context_tracking_user_exit > > > > > > > > > > %4.86 (14181) arm_stb_user_lo context_tracking_user_enter #206 > > > > > | > > > > > --- context_tracking_user_enter > > > > > > > > > > %4.86 (14181) arm_stb_user_lo user_enter #206 > > > > > | > > > > > --- *user_enter* > > > > > context_tracking_user_enter > > > > > > > > > > %2.95 (14181) arm_stb_user_lo gic_handle_irq #125 > > > > > | > > > > > --- gic_handle_irq > > > > > > > > > > > > > > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner > > > > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org>> > > wrote: > > > > > > > > > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote: > > > > > > > Hi Flavio, > > > > > > > > > > > > > > Thanks for your reply. > > > > > > > I have captured the suggested information but do not see > > > > > > > anything > > > > that > > > > > > > could cause the packet drops. > > > > > > > Can you please take a look at the below data and see if you > > > > > > > can > > find > > > > > > > something unusual ? > > > > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated > > cores. > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ------------------------------------------------------------------- > > > > > > > root@bcm958802a8046c:~# cstats ; sleep 10; cycles pmd thread > > > > > > > numa_id 0 core_id 1: > > > > > > > idle cycles: 99140849 (7.93%) > > > > > > > processing cycles: 1151423715 (92.07%) > > > > > > > avg cycles per packet: 116.94 (1250564564/10693918) > > > > > > > avg processing cycles per packet: 107.67 > > > > > > > (1151423715/10693918) pmd thread numa_id 0 core_id 2: > > > > > > > idle cycles: 118373662 (9.47%) > > > > > > > processing cycles: 1132193442 (90.53%) > > > > > > > avg cycles per packet: 124.39 (1250567104/10053309) > > > > > > > avg processing cycles per packet: 112.62 > > > > > > > (1132193442/10053309) pmd thread numa_id 0 core_id 3: > > > > > > > idle cycles: 53805933 (4.30%) > > > > > > > processing cycles: 1196762002 (95.70%) > > > > > > > avg cycles per packet: 107.35 (1250567935/11649948) > > > > > > > avg processing cycles per packet: 102.73 > > > > > > > (1196762002/11649948) pmd thread numa_id 0 core_id 4: > > > > > > > idle cycles: 189102938 (15.12%) > > > > > > > processing cycles: 1061463293 (84.88%) > > > > > > > avg cycles per packet: 143.47 (1250566231/8716828) > > > > > > > avg processing cycles per packet: 121.77 > > > > > > > (1061463293/8716828) pmd thread numa_id 0 core_id 5: > > > > > > > pmd thread numa_id 0 core_id 6: > > > > > > > pmd thread numa_id 0 core_id 7: > > > > > > > > > > > > > > > > > > The core_id 3 is high loaded, and then it's more likely to > > > > > > show the drop issue when some other event happens. > > > > > > > > > > > > I think you need to run perf as I recommended before and see > > > > > > if there are context switches happening and why they are happening. > > > > > > > > > > > > If a context switch happens, it's either because the core is > > > > > > not well isolated or some other thing is going on. It will > > > > > > help to understand why the queue wasn't serviced for a certain > > > > > > amount of time. > > > > > > > > > > > > The issue is that running perf might introduce some load, so > > > > > > you will need adjust the traffic rate accordingly. > > > > > > > > > > > > HTH, > > > > > > fbl > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Runtime summary* comm parent > > sched-in > > > > > > > run-time min-run avg-run max-run stddev migrations > > > > > > > (count) (msec) > > > > (msec) > > > > > > > (msec) (msec) % > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > ----------------------------------------------- > > > > > > > ksoftirqd/0[7] 2 1 0.079 > > > > 0.079 > > > > > > > 0.079 0.079 0.00 0 > > > > > > > rcu_sched[8] 2 14 0.067 > > > > 0.002 > > > > > > > 0.004 0.009 9.96 0 > > > > > > > rcuos/4[38] 2 6 0.027 > > > > 0.002 > > > > > > > 0.004 0.008 20.97 0 > > > > > > > rcuos/5[45] 2 4 0.018 > > > > 0.004 > > > > > > > 0.004 0.005 6.63 0 > > > > > > > kworker/0:1[71] 2 12 0.156 > > > > 0.008 > > > > > > > 0.013 0.019 6.72 0 > > > > > > > mmcqd/0[1230] 2 3 0.054 > > > > 0.001 > > > > > > > 0.018 0.031 47.29 0 > > > > > > > kworker/0:1H[1248] 2 1 0.006 > > > > 0.006 > > > > > > > 0.006 0.006 0.00 0 > > > > > > > kworker/u16:2[1547] 2 16 0.045 > > > > 0.001 > > > > > > > 0.002 0.012 26.19 0 > > > > > > > ntpd[5282] 1 1 0.063 > > > > 0.063 > > > > > > > 0.063 0.063 0.00 0 > > > > > > > watchdog[6988] 1 2 0.089 > > > > 0.012 > > > > > > > 0.044 0.076 72.26 0 > > > > > > > ovs-vswitchd[9239] 1 2 0.326 > > > > 0.152 > > > > > > > 0.163 0.173 6.45 0 > > > > > > > revalidator8[9309/9239] 9239 2 1.260 > > > > 0.607 > > > > > > > 0.630 0.652 3.58 0 > > > > > > > perf[27150] 27140 1 0.000 > > > > 0.000 > > > > > > > 0.000 0.000 0.00 0 > > > > > > > > > > > > > > Terminated tasks: > > > > > > > sleep[27151] 27150 4 1.002 > > > > 0.015 > > > > > > > 0.250 0.677 58.22 0 > > > > > > > > > > > > > > Idle stats: > > > > > > > CPU 0 idle for 999.814 msec ( 99.84%) > > > > > > > > > > > > > > > > > > > > > > > > > > > > *CPU 1 idle entire time window CPU 2 idle entire time window > > > > > > CPU 3 > > > > > > > idle entire time window CPU 4 idle entire time window* > > > > > > > CPU 5 idle for 500.326 msec ( 49.96%) > > > > > > > CPU 6 idle entire time window > > > > > > > CPU 7 idle entire time window > > > > > > > > > > > > > > Total number of unique tasks: 14 Total number of context > > > > > > > switches: 115 > > > > > > > Total run time (msec): 3.198 > > > > > > > Total scheduling time (msec): 1001.425 (x 8) > > > > > > > (END) > > > > > > > > > > > > > > > > > > > > > > > > > > > > *02:16:22 UID TGID TID %usr %system %guest > > > > %wait > > > > > > > %CPU CPU Command *02:16:23 0 9239 - > > 100.00 > > > > > > 0.00 > > > > > > > 0.00 0.00 100.00 5 ovs-vswitchd > > > > > > > 02:16:23 0 - 9239 2.00 0.00 0.00 > > 0.00 > > > > > > > 2.00 5 |__ovs-vswitchd > > > > > > > 02:16:23 0 - 9240 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 0 |__vfio-sync > > > > > > > 02:16:23 0 - 9241 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__eal-intr-thread > > > > > > > 02:16:23 0 - 9242 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__dpdk_watchdog1 > > > > > > > 02:16:23 0 - 9244 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__urcu2 > > > > > > > 02:16:23 0 - 9279 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__ct_clean3 > > > > > > > 02:16:23 0 - 9308 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__handler9 > > > > > > > 02:16:23 0 - 9309 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__revalidator8 > > > > > > > 02:16:23 0 - 9328 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 6 |__pmd13 > > > > > > > 02:16:23 0 - 9330 100.00 0.00 0.00 > > 0.00 > > > > > > > 100.00 3 |__pmd12 > > > > > > > 02:16:23 0 - 9331 100.00 0.00 0.00 > > 0.00 > > > > > > > 100.00 1 |__pmd11 > > > > > > > 02:16:23 0 - 9332 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 7 |__pmd10 > > > > > > > 02:16:23 0 - 9333 0.00 0.00 0.00 > > 0.00 > > > > > > > 0.00 5 |__pmd16 > > > > > > > 02:16:23 0 - 9334 100.00 0.00 0.00 > > 0.00 > > > > > > > 100.00 2 |__pmd15 > > > > > > > 02:16:23 0 - 9335 100.00 0.00 0.00 > > 0.00 > > > > > > > 100.00 4 |__pmd14 > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ------------------------------------------------------------------- > > > > > > > > > > > > > > Thanks > > > > > > > Vinay > > > > > > > > > > > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner > > > > > > > <mailto:f...@sysclose.org<mailto:f...@sysclose.org> > > > > > > > wrote: > > > > > > > > > > > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle > > > > > > > > via > > dev > > > > wrote: > > > > > > > > > Hi Ben/Ilya, > > > > > > > > > Hope you guys are doing well and staying safe. I have > > > > > > > > > been > > > > chasing a > > > > > > > > weird > > > > > > > > > problem with small drops and I think that is causing > > > > > > > > > lots of > > TCP > > > > > > > > > retransmission. > > > > > > > > > > > > > > > > > > Setup details > > > > > > > > > iPerf3(1k-5K > > > > > > > > > > > Servers)<-- > DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<==== > > > > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<--- > iPerf3(Clients > > > > > > > > > ) > > > > > > > > > > > > > > > > > > All the Drops are ring drops on BONDed functions on the > > server > > > > > > side. I > > > > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and > > > > > > > > > DPDK2 > > all > > > > > > running > > > > > > > > with > > > > > > > > > 4 Rx rings each. > > > > > > > > > > > > > > > > > > What is interesting is when I give each Rx rings its own > > > > > > > > > CPU > > the > > > > > > drops go > > > > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1 > > > > > > > > > the > > drops > > > > go > > > > > > away. > > > > > > > > > But I need to scale up the number of flows so trying to > > > > > > > > > run > > this > > > > > > with EMC > > > > > > > > > disabled. > > > > > > > > > > > > > > > > > > I can tell that the rings are not getting serviced for > > 30-40usec > > > > > > because > > > > > > > > of > > > > > > > > > some kind context switch or interrupts on these cores. I > > > > > > > > > have > > > > tried > > > > > > to do > > > > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all > > > > > > > > > the > > > > interrupts > > > > > > > > away > > > > > > > > > from these cores etc. But nothing helps. I mean it > > > > > > > > > improves, > > but > > > > the > > > > > > > > drops > > > > > > > > > still happen. > > > > > > > > > > > > > > > > When you disable the EMC (or reduce its efficiency) the > > > > > > > > per > > packet > > > > cost > > > > > > > > increases, then it becomes more sensitive to variations. > > > > > > > > If you > > > > share > > > > > > > > a CPU with multiple queues, you decrease the amount of > > > > > > > > time > > > > available > > > > > > > > to process the queue. In either case, there will be less > > > > > > > > room > > to > > > > > > tolerate > > > > > > > > variations. > > > > > > > > > > > > > > > > Well, you might want to use 'perf' and monitor for the > > scheduling > > > > > > events > > > > > > > > and then based on the stack trace see what is causing it > > > > > > > > and > > try to > > > > > > > > prevent it. > > > > > > > > > > > > > > > > For example: > > > > > > > > # perf record -e sched:sched_switch -a -g sleep 1 > > > > > > > > > > > > > > > > For instance, you might see that another NIC used for > > management > > > > has > > > > > > > > IRQs assigned to one isolated CPU. You can move it to > > > > > > > > another > > CPU > > > > to > > > > > > > > reduce the noise, etc... > > > > > > > > > > > > > > > > Another suggestion is look at PMD thread idle statistics > > because it > > > > > > > > will tell you how much "extra" room you have left. As it > > approaches > > > > > > > > to 0, more fine tuned your setup needs to be to avoid drops. > > > > > > > > > > > > > > > > HTH, > > > > > > > > -- > > > > > > > > fbl > > > > > > > > > > > > > > > > > > > > -- > > > > > > fbl > > > > > > > > > > > > > > -- > > > > fbl > > > > > > > > -- > > fbl > > > _______________________________________________ > dev mailing list > mailto:d...@openvswitch.org<mailto:d...@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev