Hi Shahaji,
Did it help with the rcu_nocbs?
fbl
On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote:
> Thanks Flavio,
> Are there any special requirements for RCU on ARM vs x86.
>
> I am following what the above document is saying...Do you think I need to
> do something more than the below?
> Thanks again and appreciate the help. Shahaji
>
> 1. Isolate the CPU cores
> *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7*
> 2. Setting CONFIG_NO_HZ_FULL=y
> root@bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat /proc/config.gz
> |grep HZ
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> # CONFIG_NO_HZ_IDLE is not set
> *CONFIG_NO_HZ_FULL*=y
> # CONFIG_NO_HZ_FULL_ALL is not set
> # CONFIG_NO_HZ is not set
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
>
>
>
> On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner <f...@sysclose.org> wrote:
>
> >
> > Right, you might want to review Documentation/timers/no_hz.rst from
> > the kernel sources and look for RCU implications section where
> > it explains how to move RCU callbacks.
> >
> > fbl
> >
> > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote:
> > > Hi Flavio,
> > > I wrote a small program which has do_nothing for loop and I measure the
> > > timestamps across the do nothing loop. I am seeing 3% of the time around
> > > the 1 second mark when the arch_timer fires I get the timestamps to be
> > off
> > > by 25% of the exprected value. I ran trace-cmd to see what is going on
> > and
> > > see the below. Looks like some issue with *gic_handle_irg*(), not seeing
> > > tihs behaviour on x86 host, something special with ARM v8.
> > > Thanks, Shahaji
> > >
> > > %21.77 (14181) arm_stb_user_lo rcu_dyntick #922
> > > |
> > > --- *rcu_dyntick*
> > > |
> > > |--%46.85-- gic_handle_irq # 432
> > > |
> > > |--%23.32-- context_tracking_user_exit # 215
> > > |
> > > |--%22.34-- context_tracking_user_enter # 206
> > > |
> > > |--%2.60-- SyS_execve # 24
> > > |
> > > |--%1.30-- do_page_fault # 12
> > > |
> > > |--%0.65-- SyS_write # 6
> > > |
> > > |--%0.65-- schedule # 6
> > > |
> > > |--%0.65-- SyS_nanosleep # 6
> > > |
> > > |--%0.65-- syscall_trace_enter # 6
> > > |
> > > |--%0.65-- SyS_faccessat # 6
> > >
> > > %5.01 (14181) arm_stb_user_lo rcu_utilization #212
> > > |
> > > --- *rcu_utilization*
> > > |
> > > |--%96.23-- gic_handle_irq # 204
> > > |
> > > |--%1.89-- SyS_nanosleep # 4
> > > |
> > > |--%0.94-- SyS_exit_group # 2
> > > |
> > > |--%0.94-- do_notify_resume # 2
> > >
> > > %4.86 (14181) arm_stb_user_lo user_exit #206
> > > |
> > > --- *user_exit*
> > > context_tracking_user_exit
> > >
> > > %4.86 (14181) arm_stb_user_lo context_tracking_user_exit #206
> > > |
> > > --- context_tracking_user_exit
> > >
> > > %4.86 (14181) arm_stb_user_lo context_tracking_user_enter #206
> > > |
> > > --- context_tracking_user_enter
> > >
> > > %4.86 (14181) arm_stb_user_lo user_enter #206
> > > |
> > > --- *user_enter*
> > > context_tracking_user_enter
> > >
> > > %2.95 (14181) arm_stb_user_lo gic_handle_irq #125
> > > |
> > > --- gic_handle_irq
> > >
> > >
> > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner <f...@sysclose.org> wrote:
> > >
> > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote:
> > > > > Hi Flavio,
> > > > >
> > > > > Thanks for your reply.
> > > > > I have captured the suggested information but do not see anything
> > that
> > > > > could cause the packet drops.
> > > > > Can you please take a look at the below data and see if you can find
> > > > > something unusual ?
> > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated cores.
> > > > >
> > > > >
> > > >
> > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > > > root@bcm958802a8046c:~# cstats ; sleep 10; cycles
> > > > > pmd thread numa_id 0 core_id 1:
> > > > > idle cycles: 99140849 (7.93%)
> > > > > processing cycles: 1151423715 (92.07%)
> > > > > avg cycles per packet: 116.94 (1250564564/10693918)
> > > > > avg processing cycles per packet: 107.67 (1151423715/10693918)
> > > > > pmd thread numa_id 0 core_id 2:
> > > > > idle cycles: 118373662 (9.47%)
> > > > > processing cycles: 1132193442 (90.53%)
> > > > > avg cycles per packet: 124.39 (1250567104/10053309)
> > > > > avg processing cycles per packet: 112.62 (1132193442/10053309)
> > > > > pmd thread numa_id 0 core_id 3:
> > > > > idle cycles: 53805933 (4.30%)
> > > > > processing cycles: 1196762002 (95.70%)
> > > > > avg cycles per packet: 107.35 (1250567935/11649948)
> > > > > avg processing cycles per packet: 102.73 (1196762002/11649948)
> > > > > pmd thread numa_id 0 core_id 4:
> > > > > idle cycles: 189102938 (15.12%)
> > > > > processing cycles: 1061463293 (84.88%)
> > > > > avg cycles per packet: 143.47 (1250566231/8716828)
> > > > > avg processing cycles per packet: 121.77 (1061463293/8716828)
> > > > > pmd thread numa_id 0 core_id 5:
> > > > > pmd thread numa_id 0 core_id 6:
> > > > > pmd thread numa_id 0 core_id 7:
> > > >
> > > >
> > > > The core_id 3 is high loaded, and then it's more likely to show
> > > > the drop issue when some other event happens.
> > > >
> > > > I think you need to run perf as I recommended before and see if
> > > > there are context switches happening and why they are happening.
> > > >
> > > > If a context switch happens, it's either because the core is not
> > > > well isolated or some other thing is going on. It will help to
> > > > understand why the queue wasn't serviced for a certain amount of
> > > > time.
> > > >
> > > > The issue is that running perf might introduce some load, so you
> > > > will need adjust the traffic rate accordingly.
> > > >
> > > > HTH,
> > > > fbl
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > *Runtime summary* comm parent sched-in
> > > > > run-time min-run avg-run max-run stddev migrations
> > > > > (count) (msec)
> > (msec)
> > > > > (msec) (msec) %
> > > > >
> > > >
> > ---------------------------------------------------------------------------------------------------------------------
> > > > > ksoftirqd/0[7] 2 1 0.079
> > 0.079
> > > > > 0.079 0.079 0.00 0
> > > > > rcu_sched[8] 2 14 0.067
> > 0.002
> > > > > 0.004 0.009 9.96 0
> > > > > rcuos/4[38] 2 6 0.027
> > 0.002
> > > > > 0.004 0.008 20.97 0
> > > > > rcuos/5[45] 2 4 0.018
> > 0.004
> > > > > 0.004 0.005 6.63 0
> > > > > kworker/0:1[71] 2 12 0.156
> > 0.008
> > > > > 0.013 0.019 6.72 0
> > > > > mmcqd/0[1230] 2 3 0.054
> > 0.001
> > > > > 0.018 0.031 47.29 0
> > > > > kworker/0:1H[1248] 2 1 0.006
> > 0.006
> > > > > 0.006 0.006 0.00 0
> > > > > kworker/u16:2[1547] 2 16 0.045
> > 0.001
> > > > > 0.002 0.012 26.19 0
> > > > > ntpd[5282] 1 1 0.063
> > 0.063
> > > > > 0.063 0.063 0.00 0
> > > > > watchdog[6988] 1 2 0.089
> > 0.012
> > > > > 0.044 0.076 72.26 0
> > > > > ovs-vswitchd[9239] 1 2 0.326
> > 0.152
> > > > > 0.163 0.173 6.45 0
> > > > > revalidator8[9309/9239] 9239 2 1.260
> > 0.607
> > > > > 0.630 0.652 3.58 0
> > > > > perf[27150] 27140 1 0.000
> > 0.000
> > > > > 0.000 0.000 0.00 0
> > > > >
> > > > > Terminated tasks:
> > > > > sleep[27151] 27150 4 1.002
> > 0.015
> > > > > 0.250 0.677 58.22 0
> > > > >
> > > > > Idle stats:
> > > > > CPU 0 idle for 999.814 msec ( 99.84%)
> > > > >
> > > > >
> > > > >
> > > > > *CPU 1 idle entire time window CPU 2 idle entire time window
> > > > CPU 3
> > > > > idle entire time window CPU 4 idle entire time window*
> > > > > CPU 5 idle for 500.326 msec ( 49.96%)
> > > > > CPU 6 idle entire time window
> > > > > CPU 7 idle entire time window
> > > > >
> > > > > Total number of unique tasks: 14
> > > > > Total number of context switches: 115
> > > > > Total run time (msec): 3.198
> > > > > Total scheduling time (msec): 1001.425 (x 8)
> > > > > (END)
> > > > >
> > > > >
> > > > >
> > > > > *02:16:22 UID TGID TID %usr %system %guest
> > %wait
> > > > > %CPU CPU Command *02:16:23 0 9239 - 100.00
> > > > 0.00
> > > > > 0.00 0.00 100.00 5 ovs-vswitchd
> > > > > 02:16:23 0 - 9239 2.00 0.00 0.00 0.00
> > > > > 2.00 5 |__ovs-vswitchd
> > > > > 02:16:23 0 - 9240 0.00 0.00 0.00 0.00
> > > > > 0.00 0 |__vfio-sync
> > > > > 02:16:23 0 - 9241 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__eal-intr-thread
> > > > > 02:16:23 0 - 9242 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__dpdk_watchdog1
> > > > > 02:16:23 0 - 9244 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__urcu2
> > > > > 02:16:23 0 - 9279 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__ct_clean3
> > > > > 02:16:23 0 - 9308 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__handler9
> > > > > 02:16:23 0 - 9309 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__revalidator8
> > > > > 02:16:23 0 - 9328 0.00 0.00 0.00 0.00
> > > > > 0.00 6 |__pmd13
> > > > > 02:16:23 0 - 9330 100.00 0.00 0.00 0.00
> > > > > 100.00 3 |__pmd12
> > > > > 02:16:23 0 - 9331 100.00 0.00 0.00 0.00
> > > > > 100.00 1 |__pmd11
> > > > > 02:16:23 0 - 9332 0.00 0.00 0.00 0.00
> > > > > 0.00 7 |__pmd10
> > > > > 02:16:23 0 - 9333 0.00 0.00 0.00 0.00
> > > > > 0.00 5 |__pmd16
> > > > > 02:16:23 0 - 9334 100.00 0.00 0.00 0.00
> > > > > 100.00 2 |__pmd15
> > > > > 02:16:23 0 - 9335 100.00 0.00 0.00 0.00
> > > > > 100.00 4 |__pmd14
> > > > >
> > > >
> > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > Thanks
> > > > > Vinay
> > > > >
> > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner <f...@sysclose.org>
> > wrote:
> > > > >
> > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle via dev
> > wrote:
> > > > > > > Hi Ben/Ilya,
> > > > > > > Hope you guys are doing well and staying safe. I have been
> > chasing a
> > > > > > weird
> > > > > > > problem with small drops and I think that is causing lots of TCP
> > > > > > > retransmission.
> > > > > > >
> > > > > > > Setup details
> > > > > > > iPerf3(1k-5K
> > > > > > > Servers)<--DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<====
> > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<---iPerf3(Clients)
> > > > > > >
> > > > > > > All the Drops are ring drops on BONDed functions on the server
> > > > side. I
> > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and DPDK2 all
> > > > running
> > > > > > with
> > > > > > > 4 Rx rings each.
> > > > > > >
> > > > > > > What is interesting is when I give each Rx rings its own CPU the
> > > > drops go
> > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1 the drops
> > go
> > > > away.
> > > > > > > But I need to scale up the number of flows so trying to run this
> > > > with EMC
> > > > > > > disabled.
> > > > > > >
> > > > > > > I can tell that the rings are not getting serviced for 30-40usec
> > > > because
> > > > > > of
> > > > > > > some kind context switch or interrupts on these cores. I have
> > tried
> > > > to do
> > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all the
> > interrupts
> > > > > > away
> > > > > > > from these cores etc. But nothing helps. I mean it improves, but
> > the
> > > > > > drops
> > > > > > > still happen.
> > > > > >
> > > > > > When you disable the EMC (or reduce its efficiency) the per packet
> > cost
> > > > > > increases, then it becomes more sensitive to variations. If you
> > share
> > > > > > a CPU with multiple queues, you decrease the amount of time
> > available
> > > > > > to process the queue. In either case, there will be less room to
> > > > tolerate
> > > > > > variations.
> > > > > >
> > > > > > Well, you might want to use 'perf' and monitor for the scheduling
> > > > events
> > > > > > and then based on the stack trace see what is causing it and try to
> > > > > > prevent it.
> > > > > >
> > > > > > For example:
> > > > > > # perf record -e sched:sched_switch -a -g sleep 1
> > > > > >
> > > > > > For instance, you might see that another NIC used for management
> > has
> > > > > > IRQs assigned to one isolated CPU. You can move it to another CPU
> > to
> > > > > > reduce the noise, etc...
> > > > > >
> > > > > > Another suggestion is look at PMD thread idle statistics because it
> > > > > > will tell you how much "extra" room you have left. As it approaches
> > > > > > to 0, more fine tuned your setup needs to be to avoid drops.
> > > > > >
> > > > > > HTH,
> > > > > > --
> > > > > > fbl
> > > > > >
> > > >
> > > > --
> > > > fbl
> > > >
> >
> > --
> > fbl
> >
--
fbl
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev