Hi

On Mon, Jan 22, 2018 at 4:54 PM, Adam Williamson <adamw...@fedoraproject.org
> wrote:

> On Mon, 2018-01-22 at 07:16 -0500, Christian Fredrik Schaller wrote:
> > Sorry for responding to myself here, but I thought it could also be
> > worthwhile to mention that one of our primary tools for identifying
> > problems is the Fedora ABRT server. Looking at the current stats it looks
> > to me like F27 is actually doing better than F26 used to in
> > terms of minimizing crashers: https://goo.gl/babuJx
> >
> > There is always a chance ABRT is not catching the issues of course for
> some
> > reason,
>
> Well, see my mails from last month or so to desktop@ . There's several
> problems with how abrt interacts with GNOME and Wayland; I'm not sure
> to what extent these distort the figures.
>
> First problem: abrt considers *lots* of actually-unrelated crashes to
> be duplicates, because their tracebacks look similar - this happens
> because glib has a special 'logging' function which actually means
> (more or less) 'die intentionally, with this log message'. abrt tends
> to interpret many bugs that crash along that path as dupes of each
> other, even if the actual cause of the crash - whatever triggers that
> special log message call - is different in each case. I've filed a
> couple of variants of this at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1509086
>
> Second problem: I *think* there's a similar issue with the recently-
> introduced `dump_gjs_stack_on_signal_handler` path; I've found at least
> some cases of apparently-unrelated bugs being marked as dupes due to
> that path. Details:
> https://github.com/abrt/satyr/issues/272
>
> Third problem: abrt doesn't do a very good job of reporting any crash
> that's caused by Xwayland dying. All you get is a backtrace that
> basically tells you "Xwayland crashed", but no useful information about
> why. Sometimes the system log extract that abrt captures happens to
> shed some light on the reason, but sometimes it doesn't. Details:
> https://github.com/abrt/satyr/issues/271
>
> I did some cleanup on false dupes and things caused by these problems,
> but it's necessarily incomplete, and I know more dupes have been filed
> since I did the cleanup...
>

Regarding the characterization of issues with Wayland, there is a bit of
history behind all this, and are a couple things to consider as well.

With GNOME on Wayland, gnome-shell/mutter is the display server and
gnome-shell/mutter still depends on Xwayland to run [1] and cannot survive
a crash in Xwayland.

Xwayland is an X server for the X11 clients but a Wayland client as well,
so if gnome-shell/mutter crashes, Xwayland will lose its connection to the
Wayland compositor and therefore dies as well.

So both components (gnome-shell/mutter and Xwayland) are tightly coupled
and cannot survive one each other (in GNOME).

That alone makes automatically (or even manually) root causing an issue
afterwards a bit of a challenge sometimes, one has first to determine which
of the two components has died first and taken the other with it.

To make things slightly more challenging, Xwayland would not generate a
core file on a crash, just a self-generated backtrace that could be found
in journalctl, so in some case, it would be almost impossible to tell why
the Wayland session crashed as no core file for Xwayland would be available
(and the self-generated backtrace is rarely of much help, sadly).

So gnome-shell/mutter added “-core” to the Wayland command line (Xwayland
being started automatically by gnome-shell/mutter) so that we could capture
a core file every time Xwayland would crash [2].

Unfortunately, using “-core” instructs Xwayland to generate a core file
each time a fatal error occurs, and losing the connection to the Wayland
compositor is a fatal error for Xwayland, so now each time
gnome-shell/mutter crashes, we also get a core file for Xwayland and get
reports about a bug in Xwayland whereas the issue come from
gnome-shell/mutter. That alone generates a lot of false positive for
Xwayland, and a lot of duplicates (the backtrace usually contains
“xwl_read_events()”)

The way to solve that problem is to change Xwayland to not call
FatalError() when the Wayland compositor dies so that no core is generated
in this case, a patch for this has already landed in the xserver master
branch upstream [3].

With this, we should get a core file for “real” crashes but not when
Xwayland is aborting because the Wayland compositor (gnome-shell/mutter)
has crashed, hopefully that will help with a better characterization of
Wayland issues in the future.

HTH,

Cheers,
Olivier

[1] https://bugzilla.gnome.org/show_bug.cgi?id=759538
[2] https://bugzilla.gnome.org/show_bug.cgi?id=789086
[3] https://cgit.freedesktop.org/xorg/xserver/commit/?id=
fe46cbea0f19959d469ca4d1f09be379dc7b1e45
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Reply via email to