Has anyone tried kcachegrind to speed profile parrot? Based on what the web page http://www.weidendorfers.de/kcachegrind/ says:
The trace includes the number of instruction/data memory accesses and 1st/2nd level cache misses, and relates it to source lines and functions of the run program (a disadvantage is the slowdown involved the the processor emulation, it's unfortunately around 50 times slower). A patch for valgrind sources (see below) adds call tree tracing, i.e. how the functions call each other and how many events happen while running a function (including all called functions). this could be very useful for figuring out where parrot is slow. I've used the basic cachegrind interface to valgrind at work to figure out where I was doing things wrong w.r.t. memory, but kcachegrind is currently somewhat tricky for me, as no desktop machine I've got at home is running Linux on x86. Nicholas Clark