Re: Faster calls (again)

Arne Goedeke Tue, 21 Feb 2017 11:31:48 -0800

Hi Marty,

thanks!


Yes, low_mega_apply still needs to be refactored. It is slightly more
"complicated" because of APPLY_STACK, where the return value will
overwrite the function on the stack. I want to fix the last crash in the
testsuite before refactoring that. If you are interested in working on
those, just let me know so we don't both do it ;)

Adding more perf support would be great, do you have your code in a
branch somewhere? I would be interested to have a look at it.

Arne

On 02/20/17 23:47, Martin Karlgren wrote:
> Hi Arne,
> 
> That’s awesome!
> 
> I’d love to help (with the limited spare time I have.) I guess low_mega_apply 
> should be refactored to make use of the new API too?
> 
> Speaking of faster calls, I’ve incidentally been poking around a bit with 
> machine code function calling conventions lately. For profiling purposes 
> (i.e. Linux perf) I’ve added minimal call frame information to Pike functions 
> in the amd64 machine code generator. I’ve gotten to the point where I can 
> start Roxen and get proper stack traces from perf, but the testsuite still 
> fails – it seems related to decoding of dumped bytecode, and I haven’t been 
> able to sort out why.
> Anyways, the good thing is that readymade visualisation tools built on perf 
> output can be used to profile Pike code, and the interaction between Pike 
> code and C functions is more apparent.
> Examples from a very simple Roxen site being hit by apachebench:
> http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a 
> “perf-17628.map” header represent Pike functions)
> http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on 
> horisontal axis, stack depth on vertical axis).
> 
> Hopefully this can be used to weed out where we should start looking for 
> optimisation candidates eventually.
> 
> /Marty
>

Re: Faster calls (again)

Reply via email to