On Sun, 5 Sep 2021 11:25:35 +0900 Florian Schaefer <list...@netego.de> said:
> OK, the ibox patch seemed to resolve this issue. > Thank you very much! :-) > > But. As you proposed I started to play with ASAN ... and opened quite a > can of worms apparently. E is now rather constantly crashing. I guess > this is because of the "abort_on_error=1" setting of ASAN and it's, > well, finding many memory leaks. So I hope we can squash them one by one. export ASAN_OPTIONS="detect_odr_violation=0:detect_leaks=0:abort_on_error=1:new_delete_type_mismatch=0" :) it will only barf on real memory erros - not smaller things that don't cause crashes. for leaks i'm more interested in using massif for that, but they wont cause crashes so those are "worry about another day" if anything. > First I want to say that I needed to add "log_path=asan.log" to the > ASAN_OPTIONS variable in order to have the asan output actually written > somewhere, so I would propose to add this information to the > enlightenment homepage. Most users nowadys probably don't start E from a > terminal where any stdout would be visible. actually i just redirect ALL stdout/err from e to ~/.xsession-errors so that handles it anyway :) you won't need to do the above special asan log if you're dloing that and i'd generally say it's a smart move. if you don't you can also check your journald logs from systemd etc. > So I tried to capture one of the crashes as best as I could with both > gdb and asan. This one seemed to be in the procstats module. The result > is here: https://pastebin.com/M6V2QTwd ooh procstats... i do not run that, so that probably explains why i don't see this... /me summons a netstar > Also, now E brings an additional error popup when returning from the > lock screen: "Authentication via PAM had errors setting up the > authentication session. The error code was 6." This did not happen > before the recompiling. So I was suspecting that this is somehow due to > ASAN so I tried to remove the ASAN_OPTIONS from the .xsessionrc. But it > seems that without this variable E won't even start now. I see the > processes in the process list but the screen remains just black. > Therefore back to ASAN it is. Also I could not find any related messages > in auth.log or similar. Very strange and somewhat unsettling. aaaah yes. i think error code is changing because asan detects something e.g. like a leak on shutdown of the ckpasswd slave binary thus making this not work. basically "don't rely on desklock to work right" if using asan. kind of a "gotcha". > Concerning the ACPI daemon. I see, this seems to be a "hard" requirement > of E then. Interesting design choice. For me personally running an ACPI It's a soft requirement. E works without BUT you will be missing events for things like: lid open/close, some power/reset buttons being pressed, ac adaptor plug/unplug ... e will check if your system has acpi at all - if it does it will want events from acpid to handle these. it may be you are lucky and don't need these (eg only have a power button - you already getkey press for it and no reset button, no lid, no ac adapter/battery), but e will basically insist this runs because you have these as possible events. it's a trivially small daemon to run and every distro i know of has it, so not much to just go do this. i added this because people complained e didn't suspend their laptop on lid close and it ended up they didn't follow the recommendation of having acpid to handle that. this is there because people don't follow docs so now it's pushing it on everyone to avoid things like a laptop in your backpack running and overheating and running your entire battery empty in a few hours. > daemon on a desktop system has exactly zero additional benefit. The > power button is handled by systemd just fine and I am happy for every actually it's not. e inhibits systemd handling this - always. no choice. when e runs system will ignore this. e is handling it. it can handle it either via a x11 power key press event OR an acpi button press. see above. :) > unnecessary daemon that I can prevent from cluttering my ps output. So, > anyway, for now I just commented out the callback to the popup. Works > great. ;-) see above. too many times people don't follow the recommendations, so now forcing it on everyone. i have considered adding acpi support to enlightenment_system that runs as root, but i haven't done that so until then ... you need acpid. :) > Cheers > Florian > > On 9/5/21 6:27 AM, Carsten Haitzler wrote: > > On Sat, 4 Sep 2021 17:52:09 +0900 Florian Schaefer <list...@netego.de> said: > > > >> On 9/4/21 4:55 PM, Carsten Haitzler wrote: > >>> On Sat, 4 Sep 2021 11:47:20 +0900 Florian Schaefer <list...@netego.de> > >>> said: > >>> > >>>> Raster, > >>>> > >>>> Thanks for the quick reply and help! > >>>> > >>>> OK, so ibox seems to be the culprit. With the module unloaded I was not > >>>> able to crash the system. That's quite interesting, on my personal > >>>> machine I am using ibox ever since and never had any issues (just like > >>>> your test yesterday). So this seems to be somehow specific to my new > >>>> system here. > >>>> > >>>> Anyway, thanks for pointing me into the right direction. With this I now > >>>> also finally understood how to identify which one of the many threads > >>>> was the segfaulting one. ;-) > >>>> > >>>> Now for the backtrace. As it is quite short I will paste it below > >>>> > >>>> ======================================== > >>>> (gdb) bt > >>>> #0 0x00007f23b417f872 in __libc_pause () at > >>>> ../sysdeps/unix/sysv/linux/pause.c:29 > >>>> #1 0x0000564440d159f7 in e_alert_show () at ../src/bin/e_alert.c:43 > >>>> #2 0x0000564440cda47a in _e_crash () at ../src/bin/e_signals.c:81 > >>>> #3 0x0000564440cda4a9 in e_sigseg_act (x=<optimized out>, > >>>> info=<optimized out>, data=<optimized out>) at ../src/bin/e_signals.c:91 > >>>> #4 0x00007f23b4180140 in <signal handler called> () at > >>>> /lib/x86_64-linux-gnu/libpthread.so.0 > >>>> #5 0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at > >>>> ../src/modules/ibox/e_mod_main.c:636 > >>>> #6 0x00007f23a57df330 in _ibox_cb_icon_fill_timer (data=<optimized > >>>> out>) at ../src/modules/ibox/e_mod_main.c:526 > >>>> #7 0x00007f23b4c25581 in _ecore_call_task_cb (data=<optimized out>, > >>>> func=<optimized out>) at ../src/lib/ecore/ecore_private.h:456 > >>>> #8 _ecore_timer_legacy_tick (data=0x564441cbf230, event=0x7ffd43c61150) > >>>> at ../src/lib/ecore/ecore_timer.c:172 > >>>> #9 0x00007f23b3b1c130 in _event_callback_call (obj_id=0x400000379067, > >>>> pd=0x5644412371e0, desc=0x7f23b4c521e0 > >>>> <_EFL_LOOP_TIMER_EVENT_TIMER_TICK>, event_info=<optimized out>, > >>>> legacy_compare=legacy_compare@entry=0 '\000') at > >>>> ../src/lib/eo/eo_base_class.c:2114 > >>>> #10 0x00007f23b3b1c3ec in _efl_object_event_callback_call > >>>> (obj_id=<optimized out>, pd=<optimized out>, desc=<optimized out>, > >>>> event_info=<optimized out>) at ../src/lib/eo/eo_base_class.c:2186 > >>>> #11 0x00007f23b3b16620 in efl_event_callback_call (obj=<optimized out>, > >>>> desc=desc@entry=0x7f23b4c521e0 <_EFL_LOOP_TIMER_EVENT_TIMER_TICK>, > >>>> event_info=event_info@entry=0x0) at ../src/lib/eo/eo_base_class.c:2189 > >>>> #12 0x00007f23b4c26e15 in _efl_loop_timer_expired_call > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > >>>> when=when@entry=436613.23437423998) at ../src/lib/ecore/ecore_timer.c:669 > >>>> #13 0x00007f23b4c26f43 in _efl_loop_timer_expired_timers_call > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > >>>> when=436613.23437423998) at ../src/lib/ecore/ecore_timer.c:621 > >>>> #14 0x00007f23b4bf2fae in _ecore_main_loop_iterate_internal > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > >>>> once_only=once_only@entry=0) at ../src/lib/ecore/ecore_main.c:2431 > >>>> #15 0x00007f23b4bf383f in _ecore_main_loop_begin > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460) at > >>>> ../src/lib/ecore/ecore_main.c:1231 > >>>> #16 0x00007f23b4bf7e6d in _efl_loop_begin (obj=0x40000000012d, > >>>> pd=0x5644411fd460) at ../src/lib/ecore/efl_loop.c:57 > >>>> #17 0x00007f23b4bf7233 in efl_loop_begin (obj=0x40000000012d) at > >>>> src/lib/ecore/efl_loop.eo.c:28 > >>>> #18 0x00007f23b4bf390c in ecore_main_loop_begin () at > >>>> ../src/lib/ecore/ecore_main.c:1316 > >>>> #19 0x0000564440cb8c50 in main (argc=<optimized out>, argv=<optimized > >>>> out>) at ../src/bin/e_main.c:1121 > >>>> > >>>> (gdb) fr 5 > >>>> #5 0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at > >>>> ../src/modules/ibox/e_mod_main.c:636 > >>>> 636 if ((ic->ibox->inst->ci->show_preview) && > >>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > >>>> > >>>> (gdb) list > >>>> 631 } > >>>> 632 > >>>> 633 static void > >>>> 634 _ibox_icon_fill(IBox_Icon *ic) > >>>> 635 { > >>>> 636 if ((ic->ibox->inst->ci->show_preview) && > >>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > >>>> 637 _ibox_icon_fill_preview(ic, EINA_FALSE); > >>>> 638 else > >>>> 639 _ibox_icon_fill_icon(ic); > >>>> 640 > >>>> > >>>> (gdb) print ic > >>>> $1 = (IBox_Icon *) 0x5644419a2910 > >>>> > >>>> (gdb) print *ic > >>>> $2 = {ibox = 0x564441cc3fe0, o_holder = 0x0, o_icon = 0x0, o_holder2 = > >>>> 0x0, o_icon2 = 0x0, client = 0x0, drag = {start = 0 '\000', dnd = 0 > >>>> '\000', x = 0, y = 0, dx = 0, dy = 128}} > >>>> > >>>> (gdb) print *(ic->ibox) > >>>> $3 = {inst = 0x40, o_box = 0xe1, o_drop = 0x564441a499b0, o_drop_over = > >>>> 0x7f23b4165cb0 <main_arena+304>, o_empty = 0x7474756200726162, > >>>> ic_drop_before = 0x81646c3698761235, drop_before = 1103904792, icons = > >>>> 0x0, zone = 0x698761254, dnd_x = 0, dnd_y = 1769170290} > >>>> > >>>> (gdb) print *(ic->ibox->inst) > >>>> Cannot access memory at address 0x40 > >>>> ======================================== > >>>> > >>>> So somehow we've got some garbage pointer in ic->ibox->inst. > >>> > >>> actualluy.. ic->ibox is junk. iut happens to point to some memory we can > >>> access but it's full of ... garbage. like dnd_y is and unrealistic coord. > >>> zone does not look like a proper pointer (o_drop does) and o_box is > >>> nothing like what a pointer should look like. drop_before seems junky > >>> too. so ... what happened to ic->ibox? or ... for that matter what > >>> happened to ic? maybe ic has been freed and now the ibox ptr has been > >>> overwritten to point to some junk as i cant imagine the ibox struct being > >>> freed as that struct is still there for the ibox gadget. so ... > >> > >> Ah I see. It certainly makes debugging easier if you know what a pointer > >> is supposed to look like. :-) > >> > >>> well turning on ASAN (search enlightenment.org for asan and how to enable > >>> it) in efl and e would probably instantly point out the problem. you can > >>> try that as an exercise in being able to divine better debug info from > >>> efl > >>> + e. it's pretty easy now with meson.... :) unlike valgrind it's not > >>> prohibitively slow either. it's usable day to day on a fast enough > >>> machine. > >> > >> Interesting. Thanks for the pointer to new debugging tools. (And yes, > >> valgrind is really slow.) I found the documentation you mentioned. I > >> think I will give it a try before applying your patch, just to see what > >> happens and to be able to play around with it for a bit. > >> > >>> and i can see the problem: > >>> > >>> ecore_timer_add(0.1, _ibox_cb_icon_fill_timer, ic); > >>> > >>> a timer is created to fill the icon in 0.1 sec... but ... imagine the icon > >>> (ic) has been freed/deleted BEFORE the timer fires... in 0.1sec from > >>> now. ... someone added a timer without remembering to delete it when the > >>> icon the timer is for is deleted! a bit sloppy... > >> > >> Bad boy. ;-) > >> > >> This means that on my old laptop I never ran into any issues because it > >> is just too slow for this race condition to occur? > >> > >>> d12acf0d01e628d71548adbb77670c7e40aef043 commit in git now fixes that. > >>> problem is in e ... not efl :) > >> > >> Great. Thanks! As said before, I will try to tackle this with ASAN first > >> for training and then see how your solution is holding up. That will > >> hopefully be tomorrow. > >> > >> Now to the second point of my first mail from yesterday: Is there any > >> way for me to disable/silence the error popup on startup that no ACPI > >> daemon is running? > > > > oh yes. install acpid and have it run. :) > > > >> Cheers, > >> Florian > >> > >>>> I tried to poke into the preceding frames (#6 and #7) but only hit > >>>> optimized out variables. This is efl territory, right? This morning I > >>>> recompiled enlightenment with "-O0 -g" but I guess I should also have > >>>> done the same to efl. Well, I can do this the next time I'm in office if > >>>> helpful. > >>>> > >>>> Any ideas? > >>>> > >>>> For now I gave ibar a try. Not exactly a replacement for me. I don't > >>>> need a launcher (using everything and favorites menu instead) or a > >>>> tracker of running windows (I know what windows I have open). I only > >>>> need something to show my minimized windows so that I can open them > >>>> again (I know, they appear with Alt+Tab...) and this seems to be the > >>>> only scenario that cannot be reproduced by ibar. -- I guess I never > >>>> bought into the MacOS style launcher bar. ;-) > >>> > >>> ibar will show both running and minimized icons for windows .. but ok - > >>> yeah - it doesnt "show only minimized"... :) > >>> > >>>> Cheers > >>>> Florian > >>>> > >>>> On 9/4/21 1:25 AM, Carsten Haitzler wrote: > >>>>> On Fri, 3 Sep 2021 21:04:35 +0900 Florian Schaefer <list...@netego.de> > >>>>> said: > >>>>> > >>>>> quick - if you unload the ibox module ... does the problem stop? that > >>>>> crash is inside ibox code - memory it's accessing is bad/wrong - why i > >>>>> don't know. not more information. like 363 in ibox is: > >>>>> > >>>>> if ((ic->ibox->inst->ci->show_preview) && > >>>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > >>>>> > >>>>> so what is ic? whats is ic->ibox, ic->ibox->inst, ic->ibox->inst->ci ? > >>>>> > >>>>> if you attach gdb when e crashes and dump these values - i'd know more. > >>>>> maybe. I actually stopped using ibox a while ago since ibar does both > >>>>> effectively these days. perhaps it is an ibox bug and i havent seen it > >>>>> as i dont use it. so try the above, if it goes away - attach gdb > >>>>> > >>>>> i can say that i dont see the problem here with ibox enabled and on amd > >>>>> + e (git). > >>>>> > >>>>>> Dear everyone, > >>>>>> > >>>>>> so I got a new desktop PC at work and the first thing I did, of course, > >>>>>> was to install Debian sid and enlightenment-git. ;-) > >>>>>> > >>>>>> The machine has a Nvidia T600 card and this is where troubles probably > >>>>>> begin. As I kind of need the graphics performance for CAD I went with > >>>>>> the drivers from Nvidia (the stock open source drivers were terribly > >>>>>> slow). > >>>>>> > >>>>>> Now what happens is that enlightenment crashes often. Like kind of > >>>>>> constantly. I got the impression it happens mostly when several windows > >>>>>> are going through their appearance fade-in transition at the same time. > >>>>>> Then the "red screen of death" appears and I need to press F1 to > >>>>>> continue. With some applications this happens always (Eagle anyone?) > >>>>>> with others only sometimes. After the forced restart many windows (e.g. > >>>>>> terminology always, firefox sometimes) need to be minimized and > >>>>>> uncovered again for their content to display again. Some dialog windows > >>>>>> won't even show their content from the beginning and instead just some > >>>>>> different portion of the screen. Needless to say that for a machine at > >>>>>> work this is not an optimal situation. > >>>>>> > >>>>>> The most pressing issue are of course the crashes. I recompiled > >>>>>> everything with debugging symbols and optimization disabled (or at > >>>>>> least I thought so, some things seem still to be optimized away) to > >>>>>> get some meaningful dumps. One of which I uploaded to pastebin > >>>>>> (https://pastebin.com/YWSarC10) hoping that it makes sense to someone. > >>>>>> > >>>>>> I am sure that it is not E that is "at fault" but Nvidia, but for now I > >>>>>> need to find a way around this so that I can work without having to > >>>>>> reset everything every five minutes. Any ideas? > >>>>>> > >>>>>> Oh, I also tried to disable OpenGL in the compositor settings and > >>>>>> choosing the software option. And it still crashes! > >>>>>> > >>>>>> For starters I was hoping that I can just switch off all the window > >>>>>> transition-fading eye-candy but I did not understand whether this is > >>>>>> possible. Is it? > >>>>>> > >>>>>> Finally, being a desktop system (my first in like 10 years or so) it > >>>>>> does not run an acpi daemon. I don't really see any reason to do so. > >>>>>> Therefore E also complains on every startup that no acpi daemon can be > >>>>>> found. I did not find any compile time or runtime options to disable > >>>>>> acpi. Is there a way to silence this error/warning? > >>>>>> > >>>>>> Cheers, > >>>>>> Florian > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> enlightenment-users mailing list > >>>>>> enlightenment-users@lists.sourceforge.net > >>>>>> https://lists.sourceforge.net/lists/listinfo/enlightenment-users > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> enlightenment-users mailing list > >>>> enlightenment-users@lists.sourceforge.net > >>>> https://lists.sourceforge.net/lists/listinfo/enlightenment-users > >>>> > >>> > >>> > >> > >> > >> _______________________________________________ > >> enlightenment-users mailing list > >> enlightenment-users@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/enlightenment-users > >> > > > > > > > _______________________________________________ > enlightenment-users mailing list > enlightenment-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-users > -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- Carsten Haitzler - ras...@rasterman.com _______________________________________________ enlightenment-users mailing list enlightenment-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-users