Hi Marty, thanks!
Yes, low_mega_apply still needs to be refactored. It is slightly more "complicated" because of APPLY_STACK, where the return value will overwrite the function on the stack. I want to fix the last crash in the testsuite before refactoring that. If you are interested in working on those, just let me know so we don't both do it ;) Adding more perf support would be great, do you have your code in a branch somewhere? I would be interested to have a look at it. Arne On 02/20/17 23:47, Martin Karlgren wrote: > Hi Arne, > > That’s awesome! > > I’d love to help (with the limited spare time I have.) I guess low_mega_apply > should be refactored to make use of the new API too? > > Speaking of faster calls, I’ve incidentally been poking around a bit with > machine code function calling conventions lately. For profiling purposes > (i.e. Linux perf) I’ve added minimal call frame information to Pike functions > in the amd64 machine code generator. I’ve gotten to the point where I can > start Roxen and get proper stack traces from perf, but the testsuite still > fails – it seems related to decoding of dumped bytecode, and I haven’t been > able to sort out why. > Anyways, the good thing is that readymade visualisation tools built on perf > output can be used to profile Pike code, and the interaction between Pike > code and C functions is more apparent. > Examples from a very simple Roxen site being hit by apachebench: > http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a > “perf-17628.map” header represent Pike functions) > http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on > horisontal axis, stack depth on vertical axis). > > Hopefully this can be used to weed out where we should start looking for > optimisation candidates eventually. > > /Marty >
