Re: Faster calls (again)

Arne Goedeke Wed, 22 Feb 2017 00:37:50 -0800

I am not quite sure, since I did not have the time to look into it, yet.
My feeling is that callsite_reset() is currently broken, probably  when
trampolines are used. Its probably easy to fix. I was also planning to
write a couple of tests which try to cover all possible paths of the
function call API. Having to run the full testsuite can be quite annoying..


I also started adding some benchmarks for function calls to the
pike-benchmark repo. That might make it easier to tweak specific
optimizations.

Arne

On 02/21/17 22:12, Martin Karlgren wrote:
> Hi Arne,
> 
> Alright. Any idea what the crash might be related to?
> 
> I’ve pushed the marty/call_frames branch now. As mentioned, something breaks 
> when precompiled bytecode is decoded, so many testsuite tests will segfault 
> (since they are precompiled).
> 
> Compiling --with-mc-stack-frames and running the very nice 
> Debug.generate_perf_map() (previously implemented by TobiJ) should enable 
> perf to extract what’s needed. I’ve used 
> https://github.com/jrfonseca/gprof2dot 
> <https://github.com/jrfonseca/gprof2dot> and 
> http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html 
> <http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html> for 
> visualisation.
> 
> /Marty
> 
>> On 21 Feb 2017, at 20:31 , Arne Goedeke <[email protected]> wrote:
>>
>> Hi Marty,
>>
>> thanks!
>>
>> Yes, low_mega_apply still needs to be refactored. It is slightly more
>> "complicated" because of APPLY_STACK, where the return value will
>> overwrite the function on the stack. I want to fix the last crash in the
>> testsuite before refactoring that. If you are interested in working on
>> those, just let me know so we don't both do it ;)
>>
>> Adding more perf support would be great, do you have your code in a
>> branch somewhere? I would be interested to have a look at it.
>>
>> Arne
>>
>> On 02/20/17 23:47, Martin Karlgren wrote:
>>> Hi Arne,
>>>
>>> That’s awesome!
>>>
>>> I’d love to help (with the limited spare time I have.) I guess 
>>> low_mega_apply should be refactored to make use of the new API too?
>>>
>>> Speaking of faster calls, I’ve incidentally been poking around a bit with 
>>> machine code function calling conventions lately. For profiling purposes 
>>> (i.e. Linux perf) I’ve added minimal call frame information to Pike 
>>> functions in the amd64 machine code generator. I’ve gotten to the point 
>>> where I can start Roxen and get proper stack traces from perf, but the 
>>> testsuite still fails – it seems related to decoding of dumped bytecode, 
>>> and I haven’t been able to sort out why.
>>> Anyways, the good thing is that readymade visualisation tools built on perf 
>>> output can be used to profile Pike code, and the interaction between Pike 
>>> code and C functions is more apparent.
>>> Examples from a very simple Roxen site being hit by apachebench:
>>> http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a 
>>> “perf-17628.map” header represent Pike functions)
>>> http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on 
>>> horisontal axis, stack depth on vertical axis).
>>>
>>> Hopefully this can be used to weed out where we should start looking for 
>>> optimisation candidates eventually.
>>>
>>> /Marty
>>>
>>
> 
>

Re: Faster calls (again)

Reply via email to