El dimecres, 31 de juliol de 2019, a les 12:26:23 CEST, Harald Sitter va escriure: > Moin Moin! > > I've been haunting down a nasty backtrace problem in drkonqi where it > entirely fails to create a backtrace and am now fairly confident this > is in fact a design flaw with kcrash, but I have no awesome ideas on > how to solve this properly. > > Long story short: there is a space of time between SEGV occurring and > drkonqi stopping the threads. This causes (e.g.) GIO threads to > actively unavoidably crash the process. Most recently this could/can > be observed with plasmashell which has a GIO thread sitting around > when (I think) flatpak updates are being checked. The result is that > the crash cannot be traced because the process dies before drkonqi has > a chance to deal with it. > > If you have ever seen a warning or error of the kind "XCB connection > lost" or something similar it is in fact the very same problem, albeit > usually not fatal. > > When a process crashes SEGV is sent to any one thread. The other > threads continue to run! > When the SEGV arrives the standard handler will possibly restart the > process, then close all open file descriptors, potentially start (and > wait for) drkonqi and when drkonqi has worked its magic raise itself > to a core pattern process if applicable [1]. > The threads have still not been suspended! > When drkonqi starts, it sends STOP to the crashed process. STOP is > delivered to every thread, thus stopping everything this time around. > Only now is the process "safe" from crashing while crashing. > > And that's the race right there. In between the file descriptors > getting closed and the STOPping the threads that aren't being handled > and continue to run to potentially access the now-closed file > descriptors. In GIO's case it can try to read inotify events and run > into an error (e.g. in ik_source_read_some_events) and g_error, which > as far as I can tell will result in a TRAP because g_error almost > always(?) ends in g_abort. > > The solution is simply: we shouldn't close FDs before all threads are stopped. > > Practically I can't think of a way to actually pull this off though. > We'd need to close the FDs *at* STOP. But STOP like KILL cannot be > handled. > > I think the actual solution here would need to be that kcrash stops > invoking drkonqi and instead defers to a core handler through which > drkonqi can get access to the core. > Trouble is that there can only be one core handler and there are more > software providers on a system than just us, so I guess this isn't > really a viable solution :/ > Also the core stuff isn't too portable I think. > > I am fairly out of ideas :/
Tried looking at what breakpad does? Cheers, Albert P.S: I've no idea if i'm saying something stupid, sorry if i am ^_^ > > [1] http://man7.org/linux/man-pages/man5/core.5.html >