This has been implemented more or less as described. First ANR report arrived on sentry already :)
On Sun, Jul 14, 2024 at 7:27 PM Harald Sitter <sit...@kde.org> wrote: > > Ciao! > > A while ago I was thinking about ANR handling but then forgot about it > again, some malfunction reminded me of it again so I thought I should > write down some musings. Maybe y'all have some input as well. > > Right now we don't really know when our applications deadlock because > kwin somewhat gracefully kills the process when it detects no answer > to window actions, leaving no trace of the malfunction for debugging. > Even outside that feature it's exceptionally hard for a user to > generate an ANR report because the user either needs to SEGV the app > manually (at which point kcrash and drkonqi kick in), or attach a > debugger (requiring basically developer-level knowledge). All in all a > garbage situation. > > It's actually a bit tricky to solve because currently it seems neither > POSIX nor Linux have a concept of ANR defects so we need some custom > metadata on top. > > Here's my thinking... > > - The way kwin does the killing is in a helper binary that more or > less simply calls kill() on the stuck pid > - The kill helper could write some trivial metadata to > .cache/kwin/anr/$exe.$bootid.$pid.$time_at_time_of_crash.json (the > name format is the one used by coredumpd as well FWIW) > - It could then send ABRT instead of KILL as first signal > - It should probably also make sure the pid actually shoved off in > some timeout or else send KILL > - KCrash kicks in and does the handover dance with drkonqi > - DrKonqi can check for the ANR metadata and then mark the report ANR > for sentry and bugzilla > > 3rd party software would still get ABRT and if they have a crash > handler they'll be able to handle it, it will look like a random ABRT > at a glance but they'll have at least the possibility of noticing that > something is deadlocking in their software. I don't think there's a > better solution for them right now, seeing as we have no platform way > to tell them this ABRT was ANR. Of course if they are running outside > a sandbox they are free to also pick up the kwin ANR metadata. > > When no crash handlers of any sort are installed ABRT will by default > cause a core dump, which then ideally goes into a crash handler daemon > like coredumpd. In fact, with coredumpd the user is then able to > excavate such deadlock traces via drkonqi's crashed process viewer. > Improving the debugging UX of deadlocks in general. > > Since systemd also sends ABRT when a service watchdog barks this also > allows us to notice daemon deadlocks on systems where drkonqi covers > all software (e.g. plasma-mobile). > > Any further thoughts on this? > > HS