I am not quite sure, since I did not have the time to look into it, yet. My feeling is that callsite_reset() is currently broken, probably when trampolines are used. Its probably easy to fix. I was also planning to write a couple of tests which try to cover all possible paths of the function call API. Having to run the full testsuite can be quite annoying..
I also started adding some benchmarks for function calls to the pike-benchmark repo. That might make it easier to tweak specific optimizations. Arne On 02/21/17 22:12, Martin Karlgren wrote: > Hi Arne, > > Alright. Any idea what the crash might be related to? > > I’ve pushed the marty/call_frames branch now. As mentioned, something breaks > when precompiled bytecode is decoded, so many testsuite tests will segfault > (since they are precompiled). > > Compiling --with-mc-stack-frames and running the very nice > Debug.generate_perf_map() (previously implemented by TobiJ) should enable > perf to extract what’s needed. I’ve used > https://github.com/jrfonseca/gprof2dot > <https://github.com/jrfonseca/gprof2dot> and > http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html > <http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html> for > visualisation. > > /Marty > >> On 21 Feb 2017, at 20:31 , Arne Goedeke <[email protected]> wrote: >> >> Hi Marty, >> >> thanks! >> >> Yes, low_mega_apply still needs to be refactored. It is slightly more >> "complicated" because of APPLY_STACK, where the return value will >> overwrite the function on the stack. I want to fix the last crash in the >> testsuite before refactoring that. If you are interested in working on >> those, just let me know so we don't both do it ;) >> >> Adding more perf support would be great, do you have your code in a >> branch somewhere? I would be interested to have a look at it. >> >> Arne >> >> On 02/20/17 23:47, Martin Karlgren wrote: >>> Hi Arne, >>> >>> That’s awesome! >>> >>> I’d love to help (with the limited spare time I have.) I guess >>> low_mega_apply should be refactored to make use of the new API too? >>> >>> Speaking of faster calls, I’ve incidentally been poking around a bit with >>> machine code function calling conventions lately. For profiling purposes >>> (i.e. Linux perf) I’ve added minimal call frame information to Pike >>> functions in the amd64 machine code generator. I’ve gotten to the point >>> where I can start Roxen and get proper stack traces from perf, but the >>> testsuite still fails – it seems related to decoding of dumped bytecode, >>> and I haven’t been able to sort out why. >>> Anyways, the good thing is that readymade visualisation tools built on perf >>> output can be used to profile Pike code, and the interaction between Pike >>> code and C functions is more apparent. >>> Examples from a very simple Roxen site being hit by apachebench: >>> http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a >>> “perf-17628.map” header represent Pike functions) >>> http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on >>> horisontal axis, stack depth on vertical axis). >>> >>> Hopefully this can be used to weed out where we should start looking for >>> optimisation candidates eventually. >>> >>> /Marty >>> >> > >
