OK, the ibox patch seemed to resolve this issue.
Thank you very much! :-)

But. As you proposed I started to play with ASAN ... and opened quite a can of worms apparently. E is now rather constantly crashing. I guess this is because of the "abort_on_error=1" setting of ASAN and it's, well, finding many memory leaks. So I hope we can squash them one by one.

First I want to say that I needed to add "log_path=asan.log" to the ASAN_OPTIONS variable in order to have the asan output actually written somewhere, so I would propose to add this information to the enlightenment homepage. Most users nowadys probably don't start E from a terminal where any stdout would be visible.

So I tried to capture one of the crashes as best as I could with both gdb and asan. This one seemed to be in the procstats module. The result is here: https://pastebin.com/M6V2QTwd

Also, now E brings an additional error popup when returning from the lock screen: "Authentication via PAM had errors setting up the authentication session. The error code was 6." This did not happen before the recompiling. So I was suspecting that this is somehow due to ASAN so I tried to remove the ASAN_OPTIONS from the .xsessionrc. But it seems that without this variable E won't even start now. I see the processes in the process list but the screen remains just black. Therefore back to ASAN it is. Also I could not find any related messages in auth.log or similar. Very strange and somewhat unsettling.

Concerning the ACPI daemon. I see, this seems to be a "hard" requirement of E then. Interesting design choice. For me personally running an ACPI daemon on a desktop system has exactly zero additional benefit. The power button is handled by systemd just fine and I am happy for every unnecessary daemon that I can prevent from cluttering my ps output. So, anyway, for now I just commented out the callback to the popup. Works great. ;-)

Cheers
Florian

On 9/5/21 6:27 AM, Carsten Haitzler wrote:
On Sat, 4 Sep 2021 17:52:09 +0900 Florian Schaefer <list...@netego.de> said:

On 9/4/21 4:55 PM, Carsten Haitzler wrote:
On Sat, 4 Sep 2021 11:47:20 +0900 Florian Schaefer <list...@netego.de> said:

Raster,

Thanks for the quick reply and help!

OK, so ibox seems to be the culprit. With the module unloaded I was not
able to crash the system. That's quite interesting, on my personal
machine I am using ibox ever since and never had any issues (just like
your test yesterday). So this seems to be somehow specific to my new
system here.

Anyway, thanks for pointing me into the right direction. With this I now
also finally understood how to identify which one of the many threads
was the segfaulting one. ;-)

Now for the backtrace. As it is quite short I will paste it below

========================================
(gdb) bt
#0  0x00007f23b417f872 in __libc_pause () at
../sysdeps/unix/sysv/linux/pause.c:29
#1  0x0000564440d159f7 in e_alert_show () at ../src/bin/e_alert.c:43
#2  0x0000564440cda47a in _e_crash () at ../src/bin/e_signals.c:81
#3  0x0000564440cda4a9 in e_sigseg_act (x=<optimized out>,
info=<optimized out>, data=<optimized out>) at ../src/bin/e_signals.c:91
#4  0x00007f23b4180140 in <signal handler called> () at
/lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at
../src/modules/ibox/e_mod_main.c:636
#6  0x00007f23a57df330 in _ibox_cb_icon_fill_timer (data=<optimized
out>) at ../src/modules/ibox/e_mod_main.c:526
#7  0x00007f23b4c25581 in _ecore_call_task_cb (data=<optimized out>,
func=<optimized out>) at ../src/lib/ecore/ecore_private.h:456
#8  _ecore_timer_legacy_tick (data=0x564441cbf230, event=0x7ffd43c61150)
at ../src/lib/ecore/ecore_timer.c:172
#9  0x00007f23b3b1c130 in _event_callback_call (obj_id=0x400000379067,
pd=0x5644412371e0, desc=0x7f23b4c521e0
<_EFL_LOOP_TIMER_EVENT_TIMER_TICK>, event_info=<optimized out>,
legacy_compare=legacy_compare@entry=0 '\000') at
../src/lib/eo/eo_base_class.c:2114
#10 0x00007f23b3b1c3ec in _efl_object_event_callback_call
(obj_id=<optimized out>, pd=<optimized out>, desc=<optimized out>,
event_info=<optimized out>) at ../src/lib/eo/eo_base_class.c:2186
#11 0x00007f23b3b16620 in efl_event_callback_call (obj=<optimized out>,
desc=desc@entry=0x7f23b4c521e0 <_EFL_LOOP_TIMER_EVENT_TIMER_TICK>,
event_info=event_info@entry=0x0) at ../src/lib/eo/eo_base_class.c:2189
#12 0x00007f23b4c26e15 in _efl_loop_timer_expired_call
(obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460,
when=when@entry=436613.23437423998) at ../src/lib/ecore/ecore_timer.c:669
#13 0x00007f23b4c26f43 in _efl_loop_timer_expired_timers_call
(obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460,
when=436613.23437423998) at ../src/lib/ecore/ecore_timer.c:621
#14 0x00007f23b4bf2fae in _ecore_main_loop_iterate_internal
(obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460,
once_only=once_only@entry=0) at ../src/lib/ecore/ecore_main.c:2431
#15 0x00007f23b4bf383f in _ecore_main_loop_begin
(obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460) at
../src/lib/ecore/ecore_main.c:1231
#16 0x00007f23b4bf7e6d in _efl_loop_begin (obj=0x40000000012d,
pd=0x5644411fd460) at ../src/lib/ecore/efl_loop.c:57
#17 0x00007f23b4bf7233 in efl_loop_begin (obj=0x40000000012d) at
src/lib/ecore/efl_loop.eo.c:28
#18 0x00007f23b4bf390c in ecore_main_loop_begin () at
../src/lib/ecore/ecore_main.c:1316
#19 0x0000564440cb8c50 in main (argc=<optimized out>, argv=<optimized
out>) at ../src/bin/e_main.c:1121

(gdb) fr 5
#5  0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at
../src/modules/ibox/e_mod_main.c:636
636        if ((ic->ibox->inst->ci->show_preview) &&
(edje_object_part_exists(ic->o_holder, "e.swallow.preview")))

(gdb) list
631     }
632
633     static void
634     _ibox_icon_fill(IBox_Icon *ic)
635     {
636        if ((ic->ibox->inst->ci->show_preview) &&
(edje_object_part_exists(ic->o_holder, "e.swallow.preview")))
637          _ibox_icon_fill_preview(ic, EINA_FALSE);
638        else
639          _ibox_icon_fill_icon(ic);
640

(gdb) print ic
$1 = (IBox_Icon *) 0x5644419a2910

(gdb) print *ic
$2 = {ibox = 0x564441cc3fe0, o_holder = 0x0, o_icon = 0x0, o_holder2 =
0x0, o_icon2 = 0x0, client = 0x0, drag = {start = 0 '\000', dnd = 0
'\000', x = 0, y = 0, dx = 0, dy = 128}}

(gdb) print *(ic->ibox)
$3 = {inst = 0x40, o_box = 0xe1, o_drop = 0x564441a499b0, o_drop_over =
0x7f23b4165cb0 <main_arena+304>, o_empty = 0x7474756200726162,
ic_drop_before = 0x81646c3698761235, drop_before = 1103904792, icons =
0x0, zone = 0x698761254, dnd_x = 0, dnd_y = 1769170290}

(gdb) print *(ic->ibox->inst)
Cannot access memory at address 0x40
========================================

So somehow we've got some garbage pointer in ic->ibox->inst.

actualluy.. ic->ibox is junk. iut happens to point to some memory we can
access but it's full of ... garbage. like dnd_y is and unrealistic coord.
zone does not look like a proper pointer (o_drop does) and o_box is nothing
like what a pointer should look like. drop_before seems junky too. so ...
what happened to ic->ibox? or ... for that matter what happened to ic?
maybe ic has been freed and now the ibox ptr has been overwritten to point
to some junk as i cant imagine the ibox struct being freed as that struct
is still there for the ibox gadget. so ...

Ah I see. It certainly makes debugging easier if you know what a pointer
is supposed to look like. :-)

well turning on ASAN (search enlightenment.org for asan and how to enable
it) in efl and e would probably instantly point out the problem. you can
try that as an exercise in  being able to divine better debug info from efl
+ e. it's pretty easy now with meson.... :) unlike valgrind it's not
prohibitively slow either. it's usable day to day on a fast enough machine.

Interesting. Thanks for the pointer to new debugging tools. (And yes,
valgrind is really slow.) I found the documentation you mentioned. I
think I will give it a try before applying your patch, just to see what
happens and to be able to play around with it for a bit.

and i can see the problem:

     ecore_timer_add(0.1, _ibox_cb_icon_fill_timer, ic);

a timer is created to fill the icon in 0.1 sec... but ... imagine the icon
(ic) has been freed/deleted BEFORE the timer fires... in 0.1sec from
now. ... someone added a timer without remembering to delete it when the
icon the timer is for is deleted! a bit sloppy...

Bad boy. ;-)

This means that on my old laptop I never ran into any issues because it
is just too slow for this race condition to occur?

d12acf0d01e628d71548adbb77670c7e40aef043 commit in git now fixes that.
problem is in e ... not efl :)

Great. Thanks! As said before, I will try to tackle this with ASAN first
for training and then see how your solution is holding up. That will
hopefully be tomorrow.

Now to the second point of my first mail from yesterday: Is there any
way for me to disable/silence the error popup on startup that no ACPI
daemon is running?

oh yes. install acpid and have it run. :)

Cheers,
Florian

I tried to poke into the preceding frames (#6 and #7) but only hit
optimized out variables. This is efl territory, right? This morning I
recompiled enlightenment with "-O0 -g" but I guess I should also have
done the same to efl. Well, I can do this the next time I'm in office if
helpful.

Any ideas?

For now I gave ibar a try. Not exactly a replacement for me. I don't
need a launcher (using everything and favorites menu instead) or a
tracker of running windows (I know what windows I have open). I only
need something to show my minimized windows so that I can open them
again (I know, they appear with Alt+Tab...) and this seems to be the
only scenario that cannot be reproduced by ibar. -- I guess I never
bought into the MacOS style launcher bar. ;-)

ibar will show both running and minimized icons for windows .. but ok -
yeah - it doesnt "show only minimized"... :)

Cheers
Florian

On 9/4/21 1:25 AM, Carsten Haitzler wrote:
On Fri, 3 Sep 2021 21:04:35 +0900 Florian Schaefer <list...@netego.de>
said:

quick - if you unload the ibox module ... does the problem stop? that
crash is inside ibox code - memory it's accessing is bad/wrong - why i
don't know. not more information. like 363 in ibox is:

      if ((ic->ibox->inst->ci->show_preview) &&
(edje_object_part_exists(ic->o_holder, "e.swallow.preview")))

so what is ic? whats is ic->ibox, ic->ibox->inst, ic->ibox->inst->ci ?

if you attach gdb when e crashes and dump these values - i'd know more.
maybe. I actually stopped using ibox a while ago since ibar does both
effectively these days. perhaps it is an ibox bug and i havent seen it as
i dont use it. so try the above, if it goes away - attach gdb

i can say that i dont see the problem here with ibox enabled and on amd +
e (git).

Dear everyone,

so I got a new desktop PC at work and the first thing I did, of course,
was to install Debian sid and enlightenment-git. ;-)

The machine has a Nvidia T600 card and this is where troubles probably
begin. As I kind of need the graphics performance for CAD I went with
the drivers from Nvidia (the stock open source drivers were terribly
slow).

Now what happens is that enlightenment crashes often. Like kind of
constantly. I got the impression it happens mostly when several windows
are going through their appearance fade-in transition at the same time.
Then the "red screen of death" appears and I need to press F1 to
continue. With some applications this happens always (Eagle anyone?)
with others only sometimes. After the forced restart many windows (e.g.
terminology always, firefox sometimes) need to be minimized and
uncovered again for their content to display again. Some dialog windows
won't even show their content from the beginning and instead just some
different portion of the screen. Needless to say that for a machine at
work this is not an optimal situation.

The most pressing issue are of course the crashes. I recompiled
everything with debugging symbols and optimization disabled (or at least
I thought so, some things seem still to be optimized away) to get some
meaningful dumps. One of which I uploaded to pastebin
(https://pastebin.com/YWSarC10) hoping that it makes sense to someone.

I am sure that it is not E that is "at fault" but Nvidia, but for now I
need to find a way around this so that I can work without having to
reset everything every five minutes. Any ideas?

Oh, I also tried to disable OpenGL in the compositor settings and
choosing the software option. And it still crashes!

For starters I was hoping that I can just switch off all the window
transition-fading eye-candy but I did not understand whether this is
possible. Is it?

Finally, being a desktop system (my first in like 10 years or so) it
does not run an acpi daemon. I don't really see any reason to do so.
Therefore E also complains on every startup that no acpi daemon can be
found. I did not find any compile time or runtime options to disable
acpi. Is there a way to silence this error/warning?

Cheers,
Florian


_______________________________________________
enlightenment-users mailing list
enlightenment-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-users





_______________________________________________
enlightenment-users mailing list
enlightenment-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-users





_______________________________________________
enlightenment-users mailing list
enlightenment-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-users





_______________________________________________
enlightenment-users mailing list
enlightenment-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-users

Reply via email to