On 03/08/20 20:22, Thomas Gleixner wrote: > Valentin, > > Valentin Schneider <valentin.schnei...@arm.com> writes: >> On 03/08/20 16:13, Thomas Gleixner wrote: >>> Vladimir Oltean <olte...@gmail.com> writes: >>>>> 1) When irq accounting is disabled, RT throttling kicks in as >>>>> expected. >>>>> >>>>> 2) With irq accounting the RT throttler does not kick in and the RCU >>>>> stall/lockups happen. >>>> What is this telling us? >>> >>> It seems that the fine grained irq time accounting affects the runtime >>> accounting in some way which I haven't figured out yet. >>> >> >> With IRQ_TIME_ACCOUNTING, rq_clock_task() will always be incremented by a >> lesser-or-equal value than when not having the option; you start with the >> same delta_exec but slice some for the IRQ accounting, and leave the rest >> for the rq_clock_task() (+paravirt). >> >> IIUC this means that if you spend e.g. 10% of the time in IRQ and 90% of >> the time running the stress-ng RT tasks, despite having RT tasks hogging >> the entirety of the "available time" it is still only 90% runtime, which is >> below the 95% default and the throttling doesn't happen. > > totaltime = irqtime + tasktime > > Ignoring irqtime and pretending that totaltime is what the scheduler > can control and deal with is naive at best. >
Agreed, however AFAICT rt_time is only incremented by rq_clock_task() deltas, which don't include IRQ time with IRQ_TIME_ACCOUNTING=y. That would then be directly compared to the sysctl runtime. Adding some prints in sched_rt_runtime_exceeded() and running this test case on my Juno, I get: # IRQ_TIME_ACCOUNTING=y cpu=2 rt_time=713455220 runtime=950000000 rq->avg_irq.util_avg=265 (rt_time oscillates between [70.1e7, 75.1e7]; avg_irq between [220, 270]) # IRQ_TIME_ACCOUNTING=n cpu=2 rt_time=963035300 runtime=949951811 (rt_time oscillates between [94.1e7, 96.1e7]; Throttling happens for IRQ_TIME_ACCOUNTING=n and doesn't for IRQ_TIME_ACCOUNTING=y - clearly the accounted rt_time isn't high enough for that to happen, and it does look like what is missing in rt_time (or what should be subtracted from the available runtime) is there in the avg_irq. Or is that another case where I shouldn't have been writing emails at this hour? > Thanks, > > tglx