On Wed, Jan 10, 2018 at 11:54 AM, Juri Lelli <juri.le...@redhat.com> wrote: > On 09/01/18 16:50, Rafael J. Wysocki wrote: >> On Tue, Jan 9, 2018 at 3:43 PM, Leonard Crestez <leonard.cres...@nxp.com> >> wrote: > > [...] > >> > Every 4 seconds (really it's /proc/sys/kernel/watchdog_thresh * 2 / 5 >> > and watchdog_thresh defaults to 10). There is a per-cpu hrtimer which >> > wakes the per-cpu thread in order to check that tasks can still >> > execute, this works very well against bugs like infinite loops in >> > softirq mode. The timers are synchronized initially but can get >> > staggered (for example by hotplug). >> > >> > My guess is that it's only marked RT so that it executes ahead of other >> > threads and the watchdog doesn't trigger simply when there are lots of >> > userspace tasks. >> >> I think so too. >> >> I see a couple of more-or-less hackish ways to avoid the issue, but >> nothing particularly attractive ATM. >> >> I wouldn't change the general behavior with respect to RT tasks >> because of this, though, as we would quickly find a case in which that >> would turn out to be not desirable. > > I agree we cannot generalize to all RT tasks, but what Patrick proposed > (clamping utilization of certain known tasks) might help here: > > lkml.kernel.org/r/20170824180857.32103-1-patrick.bell...@arm.com > > Maybe with a per-task interface instead of using cgroups?
The problem here is that this is a kernel thing and user space should not be expected to have to do anything about fixing this IMO. > The other option would be to relax DL tasks affinity constraints, so > that a case like this might be handled. Daniel and Tommaso proposed > possible approaches, this might be a driving use case. Not sure how we > would come up with a proper runtime for the watchdog, though. That is a problem. Basically, it needs to run as soon as possible, but it will be running for a very short time, every time. Overall, using a thread for that seems wasteful ... Thanks, Rafael