On Fri, 10 Sep 2021 08:28:30 +0900 Florian Schaefer <list...@netego.de> said:
> On Thu, Sep 09, 2021 at 08:32:47AM +0100, Carsten Haitzler wrote: > > On Thu, 9 Sep 2021 09:20:28 +0900 Florian Schaefer <list...@netego.de> said: > > > > > On Wed, Sep 08, 2021 at 11:08:00AM +0100, Carsten Haitzler wrote: > > > > On Wed, 8 Sep 2021 17:35:12 +0900 Florian Schaefer <list...@netego.de> > > > > said: > > > > > > > > > Seems to me to have been good last words this time. ;) So I am running > > > > > this all day now and I think I did not have a segfault due to procstat > > > > > so far. Thanks for the fixes and I like the new indicator icon. :) > > > > > > > > > > That being said, I still had some crashes today and I am thinking that > > > > > perhaps finally I might have something true to the topic of this > > > > > thread. At least it crashes within libnvidia and I do not get an ASAN > > > > > trace. > > > > > > > > > > For what it's worth, I tried to record a trace as good as I can. > > > > > > > > > > https://pastebin.com/p41b7GKW > > > > > > > > > > This happens reproducibly when I change from X running E to the text > > > > > console and then back to the graphics screen. (I did quite a lot of > > > > > these switches lately for running gdb while E is stil crashed.) When I > > > > > have an "empty" E running it is fine. However, as soon as some window > > > > > is open it reliably segfaults upon returning to X. Any ideas? > > > > > > > > time to stop asan and use valgrind. that can at least say if the memory > > > > nvidia is accessing is beyond some array e provided - the shader flush > > > > basically has e provide a block of mem containing vertexes etc. for the > > > > gpu to draw. this array is expanded as new triangle are added then > > > > flushed to the gpu at some point during rendering. that might be the > > > > only thing i can think of that might be an efl bug - we use a dud > > > > pointer? but then you could figure this out from valgrind + gdb... > > > > maybe. valgrind would see the errant pointer and perhaps if its just > > > > beyond some other block of mem or if that block was freed recently etc. > > > > > > So there are things that valgrind can that asan cannot. More stuff to > > > learn. :) > > > > Yeah. Valgrind is actually a cpu interpreter. it literally interprets every > > instruction and while doing that tracks memory state. it also traps > > malloc/free and so on too and tracks what memory has been allocated, freed > > down to the byte, if it has been written to or not etc. - doing qll of this > > is can see every issue. it may have no DEBUG to tell you more than "code in > > this library causers problem X", or with full gdb debug it can use that > > memory address to tell you the file, line number, function name and so on > > too. This is why valgrind is slow. it's literally interpreting everything a > > process under valgrind does. > > > > Asan has the compiler do the above instead. So when the compiler generates > > the binary code for an application or library, it ADDS code that runs > > natively that does tracking. This means tat simple instructions that just > > do add/sub/compare etc. just get generated as normal. instructions that > > access memory get tracking code added like valgrind. this means only the > > code that the compiler generates will get tracked (e.g. efl and > > enlightenment), and other code that efl calls (stuff in libc, libjpeg, > > opengl libs etc.) will not be. this is a major difference in design and > > makes asan massively faster. it's actually usable day to day on a decently > > fast machine. it does mean e uses a lot more memory as asan needs extra > > memory in the process to do the tracking of every byte and its history and > > it does need to execute more instructions whenever reading/writing to some > > memory etc. ... but not all the code your cpu runs will have this extra > > work because it's only these actions and any libraries called that do not > > have asan build will also not do this extra work. thus - asan can't find > > anything in a library you did not build with asan support. thus sometimes > > you still have to pull out ye-olde valgrind. valgrind is an amazing tool. > > it's just slow. if you seem to have issues in e/efl the first port of call > > is to try asan. it's fast enough to run day to day and not very intrusive > > in that you can rebuild efl+e and then just ctrl+alt+end to restart e and > > presto - asan is on. as long as you have pre set-up a proper ASAN_OPTIONS > > env var ... also i suggest you: > > > > export EINA_FREEQ_TOTAL_MAX=0 > > export EINA_FREEQ_MEM_MAX=0 > > export EINA_FREEQ_FILL_MAX=0 > > > > as well. this may make e/efl a little more crashy and will also remove a > > minor optimization (freeq is a ... free queue - it takes things that need > > to be freed and adds them to a queue to free some time later = freeq will > > collect things to free up until some limit. it will, when items are added > > to the queue, fill their memory with some pattern like 0x555555 or 0x777777 > > etc. - or well up to the first N bytes of that memory object, and then when > > it actually does the free later will check that that pattern still is > > there. if it's not, something wrote to that memory that SHOULD have been > > left alone as the object was queued to be freed - it can give you an > > indication that something is wrong but not exactly where). as freeq waits > > until the app is idle (has nothing to do but wait for input or things to > > happen) it runs through the queue then freeing objects so avoiding the work > > of the free until then. it's an efl self-check mechanism put in to hunt > > down bugs and get a little optimzation in return for the extra work it has > > to do. by setting the above to zero you basically disable freeq and force > > it to free immediately which is what you want for both valgrind and asan so > > they detect the problems right. note efl knows when it runs under valgrind > > and auto disables freeq on its own. but with asan, it does not. > > > > i hope that helps explain the above (roughly - i glossed over a lot of > > details to make it easier to explain in a short amount of time) > > Ahm, yeah, thanks for the explanations. I wasn't expecting such a ... > verbose ... reply. But it is appreciated. Even though I did probably not > fully understand everything I now see that valgrind is more than meets > the eye and that the same is true for eina. ;) > > > > Anyway, I tried to follow the debugging instructions on E.org as good as > > > I can (after having finally recompiled everything without asan, but > > > leaving the debugging symbols in place). > > > > > > Three observations: > > > > > > 1. The valgrind option --db-attach seems to be deprecated since 2015 and > > > is not avaiable any more. So I just omitted this. I hope that's fine. > > > > i know. :( you now need a separate shell running gdb to attach gdb to the > > process then tell it to run. painful. :( > > > > > 2. Then I tried to use the ".xinitrc-debug" method. Upon starting E the > > > startup apparently went into an infinite loop, generating pages and > > > pages of valgrind and E startup messages (a few valgrind messages with > > > something-something exiting 0) and generating many 120MB core dumps. So > > > I never got to the point where I would actually get anything but a black > > > screen from X. > > > > aaah with valgrind you want to probably bypass enlightenment_start - this > > means any issue will drop you out of your login session but you will have a > > chance to debug it. to avoid enlightenment_start do: > > > > export E_START=1 > > valgrind --tool=memcheck ... enlightenment > > > > > > FYI when i valgrind i do: > > > > valgrind --suppressions=$HOME/.zsh/vgd.supp --tool=memcheck --num-callers=64 > > --show-reachable=no --read-var-info=yes --leak-check=yes > > --leak-resolution=high > > --undef-value-errors=yes --track-origins=yes --vgdb-error=0 --vgdb=full > > --redzone-size=512 --freelist-vol=100000000 > > > > :) the suppressions file is a file i keep to tell valgrind to ignore that > > issue > > - e.g. it's a common optimization in libc or freetype or something that it > > should just pretend is not an issue. you can drop that option because you > > won't maintain that file and that file is highly system specific. > > Hmm, this valgrind stuff is more difficult then I expected. First I was > struggling to get the X server and enlightenment to start properly. I > finally settled on just creating the .xinitrc and let the rest be sorted > out with startx. > > But then, again, if I just start enlightenment without valgrind it > works. With valgrind enabled everything stops at a black screen and the > only way to get a responsive interface again is to reboot the machine. > > So here's what I do: https://pastebin.com/yzhy4gj1 > > The first part shows my .xinitrc. At the end you see two alternative > exec commands. The one with valgrind causes everything to hang. The one > without works just fine. > > Even though with valgrind enabled I cannot really do anything at least > there is still heaps of stuff in the logfile, so that output is also > included. Many "lost bytes" (not really dangerous, right?) and an > unhandled instruction in e_comp_x_randr.c. Hmmm. unhanded instruction. that means your compiler is outputting instructions valgrind does not know how to interpret. e.g. it is optimizing for a newer x86 instruction. you might want to compile with -mpentium in CFLAGS or something very conservative. you also might want to avoid --trace-children=yes if you are running enlightenment directly (avoiding enlightenment_start). > Cheers > Florian > > > > 3. Then I tried it again, removing from the .xinitrc-debug script all > > > options from valgrind but the --tool=memcheck one, thus being closer to > > > the first example of using valgrind. This caused a complete lockup of my > > > computer and my only rescue was a reboot via SysRq. > > > > > > I guess I will have to try this again with a somewhat different > > > approach... > > > > > > Cheers, > > > Florian > > > > > > PS: Can I hijack this thread to quickly paste an eina trace I get all > > > the time when openening everying? ;) https://pastebin.com/rvupgMcx > > > _______________________________________________ > enlightenment-users mailing list > enlightenment-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-users > -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- Carsten Haitzler - ras...@rasterman.com _______________________________________________ enlightenment-users mailing list enlightenment-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-users