Ciao! A while ago I was thinking about ANR handling but then forgot about it again, some malfunction reminded me of it again so I thought I should write down some musings. Maybe y'all have some input as well.
Right now we don't really know when our applications deadlock because kwin somewhat gracefully kills the process when it detects no answer to window actions, leaving no trace of the malfunction for debugging. Even outside that feature it's exceptionally hard for a user to generate an ANR report because the user either needs to SEGV the app manually (at which point kcrash and drkonqi kick in), or attach a debugger (requiring basically developer-level knowledge). All in all a garbage situation. It's actually a bit tricky to solve because currently it seems neither POSIX nor Linux have a concept of ANR defects so we need some custom metadata on top. Here's my thinking... - The way kwin does the killing is in a helper binary that more or less simply calls kill() on the stuck pid - The kill helper could write some trivial metadata to .cache/kwin/anr/$exe.$bootid.$pid.$time_at_time_of_crash.json (the name format is the one used by coredumpd as well FWIW) - It could then send ABRT instead of KILL as first signal - It should probably also make sure the pid actually shoved off in some timeout or else send KILL - KCrash kicks in and does the handover dance with drkonqi - DrKonqi can check for the ANR metadata and then mark the report ANR for sentry and bugzilla 3rd party software would still get ABRT and if they have a crash handler they'll be able to handle it, it will look like a random ABRT at a glance but they'll have at least the possibility of noticing that something is deadlocking in their software. I don't think there's a better solution for them right now, seeing as we have no platform way to tell them this ABRT was ANR. Of course if they are running outside a sandbox they are free to also pick up the kwin ANR metadata. When no crash handlers of any sort are installed ABRT will by default cause a core dump, which then ideally goes into a crash handler daemon like coredumpd. In fact, with coredumpd the user is then able to excavate such deadlock traces via drkonqi's crashed process viewer. Improving the debugging UX of deadlocks in general. Since systemd also sends ABRT when a service watchdog barks this also allows us to notice daemon deadlocks on systems where drkonqi covers all software (e.g. plasma-mobile). Any further thoughts on this? HS