Hi Arne, Alright. Any idea what the crash might be related to?
I’ve pushed the marty/call_frames branch now. As mentioned, something breaks when precompiled bytecode is decoded, so many testsuite tests will segfault (since they are precompiled). Compiling --with-mc-stack-frames and running the very nice Debug.generate_perf_map() (previously implemented by TobiJ) should enable perf to extract what’s needed. I’ve used https://github.com/jrfonseca/gprof2dot <https://github.com/jrfonseca/gprof2dot> and http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html <http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html> for visualisation. /Marty > On 21 Feb 2017, at 20:31 , Arne Goedeke <[email protected]> wrote: > > Hi Marty, > > thanks! > > Yes, low_mega_apply still needs to be refactored. It is slightly more > "complicated" because of APPLY_STACK, where the return value will > overwrite the function on the stack. I want to fix the last crash in the > testsuite before refactoring that. If you are interested in working on > those, just let me know so we don't both do it ;) > > Adding more perf support would be great, do you have your code in a > branch somewhere? I would be interested to have a look at it. > > Arne > > On 02/20/17 23:47, Martin Karlgren wrote: >> Hi Arne, >> >> That’s awesome! >> >> I’d love to help (with the limited spare time I have.) I guess >> low_mega_apply should be refactored to make use of the new API too? >> >> Speaking of faster calls, I’ve incidentally been poking around a bit with >> machine code function calling conventions lately. For profiling purposes >> (i.e. Linux perf) I’ve added minimal call frame information to Pike >> functions in the amd64 machine code generator. I’ve gotten to the point >> where I can start Roxen and get proper stack traces from perf, but the >> testsuite still fails – it seems related to decoding of dumped bytecode, and >> I haven’t been able to sort out why. >> Anyways, the good thing is that readymade visualisation tools built on perf >> output can be used to profile Pike code, and the interaction between Pike >> code and C functions is more apparent. >> Examples from a very simple Roxen site being hit by apachebench: >> http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a >> “perf-17628.map” header represent Pike functions) >> http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on >> horisontal axis, stack depth on vertical axis). >> >> Hopefully this can be used to weed out where we should start looking for >> optimisation candidates eventually. >> >> /Marty >> >
