On Sun, Feb 11, 2024 at 07:22:55PM -0500, Joel Fernandes wrote:
>
>
> On 2/8/2024 9:51 AM, Uladzislau Rezki wrote:
> > On Thu, Feb 08, 2024 at 01:53:58PM +0100, Uladzislau Rezki wrote:
> >> On Thu, Feb 08, 2024 at 07:55:37AM +0100, Andrea Righi wrote:
> >>> On Thu, Feb 08, 2024 at 12:54:58AM -0500, Joel Fernandes wrote:
> >>> ...
> >>>>>> Slightly related, but one of the things we are wondering also is how
> >>>>>> much of the overhead for your nohz-full and lazy-RCU test (on top of
> >>>>>> baseline - that is just CONFIG_HZ=1000 without nohz-full or nocbs) is
> >>>>>> because of just using NOCB. Uladsizlau mentioned he might run a test
> >>>>>> for comparing along those lines as well.
> >>>>>
> >>>>> Just to clarify, "lazy rcu on" results are just with rcu_nocb=all and
> >>>>> lazy RCUs enabled (and HZ=1000), so without nohz_full.
> >>>>>
> >>>>> If I enable only nohz_full=all (without rcu_nocb) I see something like
> >>>>> this:
> >>>>
> >>>> Ok. I did want to mention nohz_full implies rcu_nocb on the same CPUs as
> >>>> well.
> >>>>
> >>>> Its also mentioned in the boot param docs on the last line of the
> >>>> description:
> >>>>
> >>>> nohz_full= [KNL,BOOT,SMP,ISOL]
> >>>> The argument is a cpu list, as described above.
> >>>> In kernels built with CONFIG_NO_HZ_FULL=y, set
> >>>> the specified list of CPUs whose tick will be
> >>>> stopped
> >>>> whenever possible. The boot CPU will be forced
> >>>> outside
> >>>> the range to maintain the timekeeping. Any CPUs
> >>>> in this list will have their RCU callbacks
> >>>> offloaded,
> >>>> just as if they had also been called out in the
> >>>> rcu_nocbs= boot parameter.
> >>>
> >>> Ah I didn't realize that, it definitely makes sense, thanks for
> >>> clarifying it.
> >>>
> >>> Then basically in the results that I posted the difference is
> >>> "nohz_full=all+rcu_nocb=all" vs "rcu_nocb=all+lazy_RCU=on".
> >>>
> >> So, you say that a hrtimer_interrupt() handler takes more time in case
> >> of lazy + nocb + rcu_nocb=all and for nohz_full + rcu_nocb=all it faster?
> >> Could you please clarify this? I will try to measure from my side!
> >>
> >> I have done some basic research about hrtimer_interrupt() latency on my
> >> HW with latest Linux kernel. I have compared below cases:
> >>
> >> case a: 1000HZ + lazy + nocb_all_cpus
> >> case b: 1000HZ + nocb_all_cpus
> >>
> >> I used "ftrace" to measure time(in microseconds). Steps:
> >>
> >> echo 0 > tracing_on
> >> echo function_graph > current_tracer
> >> echo funcgraph-proc > trace_options
> >> echo funcgraph-abstime > trace_options
> >> echo hrtimer_interrupt > set_ftrace_filter
> >>
> >> fio --rw=write --bs=1M --size=1G --numjobs=8 --name=worker --time_based
> >> --runtime=50&
> >>
> >> echo 1 > tracing_on; sleep 10; echo 0 > tracing_on
> >>
> >> data is based on 10 seconds collection:
> >>
> >> <case a>
> >> 6 2102 ############################################################
> >> 8 2079 ############################################################
> >> 10 1464 ##########################################
> >> 7 897 ##########################
>
> So first column is microseconds and second one is count?
>
> >> 9 625 ##################
> >> 12 490 ##############
> >> 13 479 ##############
> >> 11 289 #########
> >> 5 249 ########
> >> 14 124 ####
> >> 15 72 ###
> >> 16 41 ##
> >> 17 24 #
> >> 4 22 #
> >> 18 12 #
> >> 22 2 #
> >> 19 1 #
> >> <case a>
> >>
> >> <case b>
> >> 9 1658 ############################################################
> >> 13 1308 ################################################
> >> 12 1224 #############################################
>
> Assuming that, it does seem the "best" case is off by 3 microseconds (9 vs 6),
> still would not warrant being regarded a bug and possibly just in the noise.
>
> >> 10 972 ####################################
> >> 8 703 ##########################
> >> 14 595 ######################
> >> 15 571 #####################
> >> 11 525 ###################
> >> 17 350 #############
> >> 16 235 #########
> >> 7 214 ########
> >> 4 73 ###
> >> 5 68 ###
> >> 6 54 ##
> >> 20 9 #
> >> 18 9 #
> >> 19 6 #
> >> 33 1 #
> >> 3 1 #
> >> 28 1 #
> >> 27 1 #
> >> 25 1 #
> >> 22 1 #
> >> 21 1 #
> >> <case b>
> >>
> >> I do not see the difference, there is a nose of 1/2/3 microseconds diff.
> >>
> > Let me further have a look at what we use for lazy in terms on hrtimer
> > though.
>
> Thanks for tracing it. Yeah it would be nice to count how many counts of
> do_nocb_deferred_wakeup() does the fio test trigger. If it is few, then maybe
> the problem with hrtimer_interrupt() is something else.
>
urezki@pc638:~$ fio --rw=write --bs=1M --size=1G --numjobs=8 --name=worker
--time_based --runtime=1000 > /dev/null 2>&1&
[1] 697
urezki@pc638:~$ sudo cat /sys/kernel/debug/tracing/trace_pipe | grep Timer
<idle>-0 [005] ..s1. 175.762689: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [002] ..s1. 186.002386: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [005] ..s1. 186.002391: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 228.497061: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [001] ..s1. 242.831956: rcu_nocb_wake: rcu_preempt 4
Timer
fio-705 [004] ..s.. 272.009997: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [002] ..s1. 282.253447: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 347.786758: rcu_nocb_wake: rcu_preempt 4
Timer
cron-714 [004] ..s1. 362.429287: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] .Ns1. 377.993814: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [002] ..s1. 393.353371: rcu_nocb_wake: rcu_preempt 0
Timer
kworker/6:1H-172 [006] ..s1. 410.760933: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 427.148456: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 438.220173: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] .Ns1. 451.719831: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 472.199389: rcu_nocb_wake: rcu_preempt 4
Timer
<...>-717 [000] ..s.. 482.434074: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 484.969012: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 495.750151: rcu_nocb_wake: rcu_preempt 4
Timer
fio-702 [000] ..s1. 516.439595: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 578.212713: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 595.583863: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 597.634693: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 617.598664: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 655.996890: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 660.606691: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 680.059867: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 688.227533: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 698.491126: rcu_nocb_wake: rcu_preempt 4
Timer
fio-703 [000] ..s1. 705.925823: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 721.018256: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 732.793797: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 747.129268: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 766.075583: rcu_nocb_wake: rcu_preempt 4
Timer
fio-701 [004] ..s2. 767.047547: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 782.456032: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 797.303567: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/6:1H-172 [006] ..s1. 813.686996: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 828.534430: rcu_nocb_wake: rcu_preempt 4
Timer
fio-700 [004] ..s2. 850.536698: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 860.789367: rcu_nocb_wake: rcu_preempt 4
Timer
ksoftirqd/6-63 [006] ..s.. 888.948515: rcu_nocb_wake: rcu_preempt 4
Timer
fio-699 [000] ..s.. 905.854912: rcu_nocb_wake: rcu_preempt 0
Timer
kworker/6:1H-172 [006] ..s1. 911.987781: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 924.275337: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 938.208899: rcu_nocb_wake: rcu_preempt 4
Timer
kworker/0:1H-100 [000] ..s1. 938.236053: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 948.338615: rcu_nocb_wake: rcu_preempt 4
Timer
fio-702 [000] ..s2. 958.216274: rcu_nocb_wake: rcu_preempt 0
Timer
<...>-728 [004] ..s.. 962.432149: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [002] ..s1. 978.545645: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [003] ..s1. 1002.608935: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1002.614904: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 1012.848592: rcu_nocb_wake: rcu_preempt 0
Timer
kworker/6:1H-172 [006] ..s1. 1021.552408: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [002] ..s1. 1042.543682: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [001] ..s1. 1068.654892: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [002] ..s1. 1080.430541: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [000] ..s1. 1098.349992: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [002] ..s1. 1115.757464: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [005] ..s1. 1170.539822: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [000] ..s1. 1178.191592: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [000] ..s1. 1188.459281: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1196.139077: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [000] ..s1. 1216.762430: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1238.374788: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 1241.194246: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [003] ..s1. 1251.433991: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1261.673634: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 1261.673637: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [007] ..s1. 1271.913347: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 1308.263711: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [000] ..s1. 1308.263738: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [005] ..s1. 1308.270707: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [000] ..s1. 1326.183705: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1326.183709: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 1326.193714: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 1336.423403: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [003] ..s1. 1336.423404: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [000] ..s1. 1368.678480: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [007] ..s1. 1398.373573: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 1428.580662: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 1428.580666: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [007] ..s1. 1468.515358: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [000] ..s1. 1488.482412: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [000] ..s1. 1548.385082: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [005] ..s1. 1548.385083: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [003] ..s1. 1572.448367: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1572.448367: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [004] ..s1. 1572.453972: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [005] ..s1. 1582.688060: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 1582.688062: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [004] ..s1. 1592.927895: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [007] ..s1. 1608.287294: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [001] ..s1. 1608.799312: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [002] ..s1. 1621.086909: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [005] ..s1. 1631.326609: rcu_nocb_wake: rcu_preempt 4
Timer
<idle>-0 [003] .Ns1. 1653.854052: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [001] ..s1. 1668.189531: rcu_nocb_wake: rcu_preempt 0
Timer
<idle>-0 [007] ..s1. 1678.429219: rcu_nocb_wake: rcu_preempt 4
Timer
<snip>
1000 seconds runtime of "fio" test triggers the do_nocb_deferred_wakeup_timer()
around 100 times. Most of them are handled from the soft-irq context. I have
looked at the perf figures vs hrtimer_interrupt(). I do not see any obvious
performance issues with it.
8xCPUs KVM system;
1000HZ timer;
lazy + all offloading.
--
Uladzislau Rezki