Thanks Florian, I didn't backport this to procstats...should have a fix... On Tue, Sep 7, 2021 at 12:46 AM Florian Schaefer <list...@netego.de> wrote:
> I think we've got a second, independent issue with procstat here. This > time it seems to me your friendly string-buffer overflow. Incidentally > triggered by a long command line in terminology while compiling the latest > enlightenment. ;) > > > https://pastebin.com/Ue03AbmB > > > Cheers, > > Florian > > > "Al Poole" nets...@gmail.com – September 6, 2021 1:22 AM > > Summoned > > > > On Sun, 5 Sep 2021, 14:12 Carsten Haitzler, <ras...@rasterman.com> > wrote: > > > > > On Sun, 5 Sep 2021 11:25:35 +0900 Florian Schaefer <list...@netego.de> > > > said: > > > > > > > OK, the ibox patch seemed to resolve this issue. > > > > Thank you very much! :-) > > > > > > > > But. As you proposed I started to play with ASAN ... and opened > quite a > > > > can of worms apparently. E is now rather constantly crashing. I guess > > > > this is because of the "abort_on_error=1" setting of ASAN and it's, > > > > well, finding many memory leaks. So I hope we can squash them one by > one. > > > > > > export > > > > > > > ASAN_OPTIONS="detect_odr_violation=0:detect_leaks=0:abort_on_error=1:new_delete_type_mismatch=0" > > > > > > :) it will only barf on real memory erros - not smaller things that > don't > > > cause > > > crashes. for leaks i'm more interested in using massif for that, but > they > > > wont > > > cause crashes so those are "worry about another day" if anything. > > > > > > > First I want to say that I needed to add "log_path=asan.log" to the > > > > ASAN_OPTIONS variable in order to have the asan output actually > written > > > > somewhere, so I would propose to add this information to the > > > > enlightenment homepage. Most users nowadys probably don't start E > from a > > > > terminal where any stdout would be visible. > > > > > > actually i just redirect ALL stdout/err from e to ~/.xsession-errors so > > > that > > > handles it anyway :) you won't need to do the above special asan log if > > > you're > > > dloing that and i'd generally say it's a smart move. if you don't you > can > > > also > > > check your journald logs from systemd etc. > > > > > > > So I tried to capture one of the crashes as best as I could with both > > > > gdb and asan. This one seemed to be in the procstats module. The > result > > > > is here: pastebin.com/M6V2QTwd > > > > > > ooh procstats... i do not run that, so that probably explains why i > don't > > > see > > > this... > > > > > > /me summons a netstar > > > > > > > > > > > > > Also, now E brings an additional error popup when returning from the > > > > lock screen: "Authentication via PAM had errors setting up the > > > > authentication session. The error code was 6." This did not happen > > > > before the recompiling. So I was suspecting that this is somehow due > to > > > > ASAN so I tried to remove the ASAN_OPTIONS from the .xsessionrc. But > it > > > > seems that without this variable E won't even start now. I see the > > > > processes in the process list but the screen remains just black. > > > > Therefore back to ASAN it is. Also I could not find any related > messages > > > > in auth.log or similar. Very strange and somewhat unsettling. > > > > > > aaaah yes. i think error code is changing because asan detects > something > > > e.g. > > > like a leak on shutdown of the ckpasswd slave binary thus making this > not > > > work. > > > basically "don't rely on desklock to work right" if using asan. kind > of a > > > "gotcha". > > > > > > > Concerning the ACPI daemon. I see, this seems to be a "hard" > requirement > > > > of E then. Interesting design choice. For me personally running an > ACPI > > > > > > It's a soft requirement. E works without BUT you will be missing > events for > > > things like: lid open/close, some power/reset buttons being pressed, ac > > > adaptor > > > plug/unplug ... e will check if your system has acpi at all - if it > does it > > > will want events from acpid to handle these. it may be you are lucky > and > > > don't > > > need these (eg only have a power button - you already getkey press for > it > > > and > > > no reset button, no lid, no ac adapter/battery), but e will basically > > > insist > > > this runs because you have these as possible events. it's a trivially > small > > > daemon to run and every distro i know of has it, so not much to just > go do > > > this. i added this because people complained e didn't suspend their > laptop > > > on > > > lid close and it ended up they didn't follow the recommendation of > having > > > acpid > > > to handle that. this is there because people don't follow docs so now > it's > > > pushing it on everyone to avoid things like a laptop in your backpack > > > running > > > and overheating and running your entire battery empty in a few hours. > > > > > > > daemon on a desktop system has exactly zero additional benefit. The > > > > power button is handled by systemd just fine and I am happy for every > > > > > > actually it's not. e inhibits systemd handling this - always. no > choice. > > > when e > > > runs system will ignore this. e is handling it. it can handle it either > > > via a > > > x11 power key press event OR an acpi button press. see above. :) > > > > > > > unnecessary daemon that I can prevent from cluttering my ps output. > So, > > > > anyway, for now I just commented out the callback to the popup. Works > > > > great. ;-) > > > > > > see above. too many times people don't follow the recommendations, so > now > > > forcing it on everyone. i have considered adding acpi support to > > > enlightenment_system that runs as root, but i haven't done that so > until > > > then ... you need acpid. :) > > > > > > > Cheers > > > > Florian > > > > > > > > On 9/5/21 6:27 AM, Carsten Haitzler wrote: > > > > > On Sat, 4 Sep 2021 17:52:09 +0900 Florian Schaefer < > list...@netego.de> > > > said: > > > > > > > > > >> On 9/4/21 4:55 PM, Carsten Haitzler wrote: > > > > >>> On Sat, 4 Sep 2021 11:47:20 +0900 Florian Schaefer < > > > list...@netego.de> > > > > >>> said: > > > > >>> > > > > >>>> Raster, > > > > >>>> > > > > >>>> Thanks for the quick reply and help! > > > > >>>> > > > > >>>> OK, so ibox seems to be the culprit. With the module unloaded I > was > > > not > > > > >>>> able to crash the system. That's quite interesting, on my > personal > > > > >>>> machine I am using ibox ever since and never had any issues > (just > > > like > > > > >>>> your test yesterday). So this seems to be somehow specific to > my new > > > > >>>> system here. > > > > >>>> > > > > >>>> Anyway, thanks for pointing me into the right direction. With > this > > > I now > > > > >>>> also finally understood how to identify which one of the many > > > threads > > > > >>>> was the segfaulting one. ;-) > > > > >>>> > > > > >>>> Now for the backtrace. As it is quite short I will paste it > below > > > > >>>> > > > > >>>> ======================================== > > > > >>>> (gdb) bt > > > > >>>> #0 0x00007f23b417f872 in __libc_pause () at > > > > >>>> ../sysdeps/unix/sysv/linux/pause.c:29 > > > > >>>> #1 0x0000564440d159f7 in e_alert_show () at > ../src/bin/e_alert.c:43 > > > > >>>> #2 0x0000564440cda47a in _e_crash () at > ../src/bin/e_signals.c:81 > > > > >>>> #3 0x0000564440cda4a9 in e_sigseg_act (x=<optimized out>, > > > > >>>> info=<optimized out>, data=<optimized out>) at > > > ../src/bin/e_signals.c:91 > > > > >>>> #4 0x00007f23b4180140 in <signal handler called> () at > > > > >>>> /lib/x86_64-linux-gnu/libpthread.so.0 > > > > >>>> #5 0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at > > > > >>>> ../src/modules/ibox/e_mod_main.c:636 > > > > >>>> #6 0x00007f23a57df330 in _ibox_cb_icon_fill_timer > (data=<optimized > > > > >>>> out>) at ../src/modules/ibox/e_mod_main.c:526 > > > > >>>> #7 0x00007f23b4c25581 in _ecore_call_task_cb (data=<optimized > out>, > > > > >>>> func=<optimized out>) at ../src/lib/ecore/ecore_private.h:456 > > > > >>>> #8 _ecore_timer_legacy_tick (data=0x564441cbf230, > > > event=0x7ffd43c61150) > > > > >>>> at ../src/lib/ecore/ecore_timer.c:172 > > > > >>>> #9 0x00007f23b3b1c130 in _event_callback_call > > > (obj_id=0x400000379067, > > > > >>>> pd=0x5644412371e0, desc=0x7f23b4c521e0 > > > > >>>> <_EFL_LOOP_TIMER_EVENT_TIMER_TICK>, event_info=<optimized out>, > > > > >>>> legacy_compare=legacy_compare@entry=0 '\000') at > > > > >>>> ../src/lib/eo/eo_base_class.c:2114 > > > > >>>> #10 0x00007f23b3b1c3ec in _efl_object_event_callback_call > > > > >>>> (obj_id=<optimized out>, pd=<optimized out>, desc=<optimized > out>, > > > > >>>> event_info=<optimized out>) at > ../src/lib/eo/eo_base_class.c:2186 > > > > >>>> #11 0x00007f23b3b16620 in efl_event_callback_call > (obj=<optimized > > > out>, > > > > >>>> desc=desc@entry=0x7f23b4c521e0 > <_EFL_LOOP_TIMER_EVENT_TIMER_TICK>, > > > > >>>> event_info=event_info@entry=0x0) at > > > ../src/lib/eo/eo_base_class.c:2189 > > > > >>>> #12 0x00007f23b4c26e15 in _efl_loop_timer_expired_call > > > > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > > > > >>>> when=when@entry=436613.23437423998) at > > > ../src/lib/ecore/ecore_timer.c:669 > > > > >>>> #13 0x00007f23b4c26f43 in _efl_loop_timer_expired_timers_call > > > > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > > > > >>>> when=436613.23437423998) at ../src/lib/ecore/ecore_timer.c:621 > > > > >>>> #14 0x00007f23b4bf2fae in _ecore_main_loop_iterate_internal > > > > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460, > > > > >>>> once_only=once_only@entry=0) at > ../src/lib/ecore/ecore_main.c:2431 > > > > >>>> #15 0x00007f23b4bf383f in _ecore_main_loop_begin > > > > >>>> (obj=obj@entry=0x40000000012d, pd=pd@entry=0x5644411fd460) at > > > > >>>> ../src/lib/ecore/ecore_main.c:1231 > > > > >>>> #16 0x00007f23b4bf7e6d in _efl_loop_begin (obj=0x40000000012d, > > > > >>>> pd=0x5644411fd460) at ../src/lib/ecore/efl_loop.c:57 > > > > >>>> #17 0x00007f23b4bf7233 in efl_loop_begin (obj=0x40000000012d) at > > > > >>>> src/lib/ecore/efl_loop.eo.c:28 > > > > >>>> #18 0x00007f23b4bf390c in ecore_main_loop_begin () at > > > > >>>> ../src/lib/ecore/ecore_main.c:1316 > > > > >>>> #19 0x0000564440cb8c50 in main (argc=<optimized out>, > > > argv=<optimized > > > > >>>> out>) at ../src/bin/e_main.c:1121 > > > > >>>> > > > > >>>> (gdb) fr 5 > > > > >>>> #5 0x00007f23a57df211 in _ibox_icon_fill (ic=0x5644419a2910) at > > > > >>>> ../src/modules/ibox/e_mod_main.c:636 > > > > >>>> 636 if ((ic->ibox->inst->ci->show_preview) && > > > > >>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > > > > >>>> > > > > >>>> (gdb) list > > > > >>>> 631 } > > > > >>>> 632 > > > > >>>> 633 static void > > > > >>>> 634 _ibox_icon_fill(IBox_Icon *ic) > > > > >>>> 635 { > > > > >>>> 636 if ((ic->ibox->inst->ci->show_preview) && > > > > >>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > > > > >>>> 637 _ibox_icon_fill_preview(ic, EINA_FALSE); > > > > >>>> 638 else > > > > >>>> 639 _ibox_icon_fill_icon(ic); > > > > >>>> 640 > > > > >>>> > > > > >>>> (gdb) print ic > > > > >>>> $1 = (IBox_Icon *) 0x5644419a2910 > > > > >>>> > > > > >>>> (gdb) print *ic > > > > >>>> $2 = {ibox = 0x564441cc3fe0, o_holder = 0x0, o_icon = 0x0, > > > o_holder2 = > > > > >>>> 0x0, o_icon2 = 0x0, client = 0x0, drag = {start = 0 '\000', dnd > = 0 > > > > >>>> '\000', x = 0, y = 0, dx = 0, dy = 128}} > > > > >>>> > > > > >>>> (gdb) print *(ic->ibox) > > > > >>>> $3 = {inst = 0x40, o_box = 0xe1, o_drop = 0x564441a499b0, > > > o_drop_over = > > > > >>>> 0x7f23b4165cb0 <main_arena+304>, o_empty = 0x7474756200726162, > > > > >>>> ic_drop_before = 0x81646c3698761235, drop_before = 1103904792, > > > icons = > > > > >>>> 0x0, zone = 0x698761254, dnd_x = 0, dnd_y = 1769170290} > > > > >>>> > > > > >>>> (gdb) print *(ic->ibox->inst) > > > > >>>> Cannot access memory at address 0x40 > > > > >>>> ======================================== > > > > >>>> > > > > >>>> So somehow we've got some garbage pointer in ic->ibox->inst. > > > > >>> > > > > >>> actualluy.. ic->ibox is junk. iut happens to point to some > memory we > > > can > > > > >>> access but it's full of ... garbage. like dnd_y is and > unrealistic > > > coord. > > > > >>> zone does not look like a proper pointer (o_drop does) and o_box > is > > > > >>> nothing like what a pointer should look like. drop_before seems > junky > > > > >>> too. so ... what happened to ic->ibox? or ... for that matter > what > > > > >>> happened to ic? maybe ic has been freed and now the ibox ptr has > been > > > > >>> overwritten to point to some junk as i cant imagine the ibox > struct > > > being > > > > >>> freed as that struct is still there for the ibox gadget. so ... > > > > >> > > > > >> Ah I see. It certainly makes debugging easier if you know what a > > > pointer > > > > >> is supposed to look like. :-) > > > > >> > > > > >>> well turning on ASAN (search enlightenment.org for asan and how > to > > > enable > > > > >>> it) in efl and e would probably instantly point out the problem. > you > > > can > > > > >>> try that as an exercise in being able to divine better debug info > > > from > > > > >>> efl > > > > >>> + e. it's pretty easy now with meson.... :) unlike valgrind it's > not > > > > >>> prohibitively slow either. it's usable day to day on a fast > enough > > > > >>> machine. > > > > >> > > > > >> Interesting. Thanks for the pointer to new debugging tools. (And > yes, > > > > >> valgrind is really slow.) I found the documentation you > mentioned. I > > > > >> think I will give it a try before applying your patch, just to see > > > what > > > > >> happens and to be able to play around with it for a bit. > > > > >> > > > > >>> and i can see the problem: > > > > >>> > > > > >>> ecore_timer_add(0.1, _ibox_cb_icon_fill_timer, ic); > > > > >>> > > > > >>> a timer is created to fill the icon in 0.1 sec... but ... imagine > > > the icon > > > > >>> (ic) has been freed/deleted BEFORE the timer fires... in 0.1sec > from > > > > >>> now. ... someone added a timer without remembering to delete it > when > > > the > > > > >>> icon the timer is for is deleted! a bit sloppy... > > > > >> > > > > >> Bad boy. ;-) > > > > >> > > > > >> This means that on my old laptop I never ran into any issues > because > > > it > > > > >> is just too slow for this race condition to occur? > > > > >> > > > > >>> d12acf0d01e628d71548adbb77670c7e40aef043 commit in git now fixes > > > that. > > > > >>> problem is in e ... not efl :) > > > > >> > > > > >> Great. Thanks! As said before, I will try to tackle this with ASAN > > > first > > > > >> for training and then see how your solution is holding up. That > will > > > > >> hopefully be tomorrow. > > > > >> > > > > >> Now to the second point of my first mail from yesterday: Is there > any > > > > >> way for me to disable/silence the error popup on startup that no > ACPI > > > > >> daemon is running? > > > > > > > > > > oh yes. install acpid and have it run. :) > > > > > > > > > >> Cheers, > > > > >> Florian > > > > >> > > > > >>>> I tried to poke into the preceding frames (#6 and #7) but only > hit > > > > >>>> optimized out variables. This is efl territory, right? This > morning > > > I > > > > >>>> recompiled enlightenment with "-O0 -g" but I guess I should also > > > have > > > > >>>> done the same to efl. Well, I can do this the next time I'm in > > > office if > > > > >>>> helpful. > > > > >>>> > > > > >>>> Any ideas? > > > > >>>> > > > > >>>> For now I gave ibar a try. Not exactly a replacement for me. I > don't > > > > >>>> need a launcher (using everything and favorites menu instead) > or a > > > > >>>> tracker of running windows (I know what windows I have open). I > only > > > > >>>> need something to show my minimized windows so that I can open > them > > > > >>>> again (I know, they appear with Alt+Tab...) and this seems to > be the > > > > >>>> only scenario that cannot be reproduced by ibar. -- I guess I > never > > > > >>>> bought into the MacOS style launcher bar. ;-) > > > > >>> > > > > >>> ibar will show both running and minimized icons for windows .. > but > > > ok - > > > > >>> yeah - it doesnt "show only minimized"... :) > > > > >>> > > > > >>>> Cheers > > > > >>>> Florian > > > > >>>> > > > > >>>> On 9/4/21 1:25 AM, Carsten Haitzler wrote: > > > > >>>>> On Fri, 3 Sep 2021 21:04:35 +0900 Florian Schaefer < > > > list...@netego.de> > > > > >>>>> said: > > > > >>>>> > > > > >>>>> quick - if you unload the ibox module ... does the problem > stop? > > > that > > > > >>>>> crash is inside ibox code - memory it's accessing is bad/wrong > - > > > why i > > > > >>>>> don't know. not more information. like 363 in ibox is: > > > > >>>>> > > > > >>>>> if ((ic->ibox->inst->ci->show_preview) && > > > > >>>>> (edje_object_part_exists(ic->o_holder, "e.swallow.preview"))) > > > > >>>>> > > > > >>>>> so what is ic? whats is ic->ibox, ic->ibox->inst, > > > ic->ibox->inst->ci ? > > > > >>>>> > > > > >>>>> if you attach gdb when e crashes and dump these values - i'd > know > > > more. > > > > >>>>> maybe. I actually stopped using ibox a while ago since ibar > does > > > both > > > > >>>>> effectively these days. perhaps it is an ibox bug and i havent > > > seen it > > > > >>>>> as i dont use it. so try the above, if it goes away - attach > gdb > > > > >>>>> > > > > >>>>> i can say that i dont see the problem here with ibox enabled > and > > > on amd > > > > >>>>> + e (git). > > > > >>>>> > > > > >>>>>> Dear everyone, > > > > >>>>>> > > > > >>>>>> so I got a new desktop PC at work and the first thing I did, > of > > > course, > > > > >>>>>> was to install Debian sid and enlightenment-git. ;-) > > > > >>>>>> > > > > >>>>>> The machine has a Nvidia T600 card and this is where troubles > > > probably > > > > >>>>>> begin. As I kind of need the graphics performance for CAD I > went > > > with > > > > >>>>>> the drivers from Nvidia (the stock open source drivers were > > > terribly > > > > >>>>>> slow). > > > > >>>>>> > > > > >>>>>> Now what happens is that enlightenment crashes often. Like > kind of > > > > >>>>>> constantly. I got the impression it happens mostly when > several > > > windows > > > > >>>>>> are going through their appearance fade-in transition at the > same > > > time. > > > > >>>>>> Then the "red screen of death" appears and I need to press F1 > to > > > > >>>>>> continue. With some applications this happens always (Eagle > > > anyone?) > > > > >>>>>> with others only sometimes. After the forced restart many > windows > > > (e.g. > > > > >>>>>> terminology always, firefox sometimes) need to be minimized > and > > > > >>>>>> uncovered again for their content to display again. Some > dialog > > > windows > > > > >>>>>> won't even show their content from the beginning and instead > just > > > some > > > > >>>>>> different portion of the screen. Needless to say that for a > > > machine at > > > > >>>>>> work this is not an optimal situation. > > > > >>>>>> > > > > >>>>>> The most pressing issue are of course the crashes. I > recompiled > > > > >>>>>> everything with debugging symbols and optimization disabled > (or at > > > > >>>>>> least I thought so, some things seem still to be optimized > away) > > > to > > > > >>>>>> get some meaningful dumps. One of which I uploaded to pastebin > > > > >>>>>> (pastebin.com/YWSarC10) hoping that it makes sense to > > > someone. > > > > >>>>>> > > > > >>>>>> I am sure that it is not E that is "at fault" but Nvidia, but > for > > > now I > > > > >>>>>> need to find a way around this so that I can work without > having > > > to > > > > >>>>>> reset everything every five minutes. Any ideas? > > > > >>>>>> > > > > >>>>>> Oh, I also tried to disable OpenGL in the compositor settings > and > > > > >>>>>> choosing the software option. And it still crashes! > > > > >>>>>> > > > > >>>>>> For starters I was hoping that I can just switch off all the > > > window > > > > >>>>>> transition-fading eye-candy but I did not understand whether > this > > > is > > > > >>>>>> possible. Is it? > > > > >>>>>> > > > > >>>>>> Finally, being a desktop system (my first in like 10 years or > so) > > > it > > > > >>>>>> does not run an acpi daemon. I don't really see any reason to > do > > > so. > > > > >>>>>> Therefore E also complains on every startup that no acpi > daemon > > > can be > > > > >>>>>> found. I did not find any compile time or runtime options to > > > disable > > > > >>>>>> acpi. Is there a way to silence this error/warning? > > > > >>>>>> > > > > >>>>>> Cheers, > > > > >>>>>> Florian > > > _______________________________________________ > enlightenment-users mailing list > enlightenment-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-users > _______________________________________________ enlightenment-users mailing list enlightenment-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-users