[develnytprof-dev: 1305] How the sub profiler works, for the record, plus a fix (was: Completely confused ...)

Tim Bunce Fri, 03 Jul 2009 08:47:56 -0700

[Best viewed widescreen in a fixed width font :-]

On Thu, Jul 02, 2009 at 08:22:34PM -0500, Elliot Shank wrote:
> 
> When running this before, I just scanned the top of the list and
> didn't look further down.  But when I got to editing the report to
> anonymize it, I noticed the last subroutine in the list has a negative
> exclusive time equal in magnitude to the exclusive time of the
> subroutine under discussion; the former is called only by the latter.
> So, I did a second run after the first hack of NYTProf.xs with this
> other subroutine traced as well; this is the version of the
> incr_sub_inclusive_time() trace that you got (i.e. the report and the
> trace are not for the same run).


I realized afterwards you could have just grepped the trace log.
To make that easier I've checked in a change to add a log=... option.

> Once I noticed interesting_negative_time_sub(), I started digging
> down.  The exclusive subroutine time in question is 422s.  Further
> down the call tree there's
> 
>     "spent 202s making 232259 calls to Some::Moose::Object::new, avg
>     870µs/call"
> 
> the_weird_subroutine() does a bunch of setup before and cleanup after
> calling interesting_negative_time_sub().

> During the test, there are a couple of cases where an exception will
> be caught by the eval above.  None of those are from an xsubs or from
> Moose generated methods (though they do come from methods on Moose
> objects).

Well this is turning out to be a challenging puzzle! :)

I've written out a lot of detail here partly to help me think it through
(which is why this reads as a stream of thought), partly to help explain
a tricky part of the innards, and partly because I still might not have
the right answer... :)

The sub profiler logic is roughly:

 * Before the call:
        record current time
        record current cumulative_subr_secs as initial_subr_secs
        record current cumulative_overhead_ticks as initial_overhead_ticks

 * For perl subs, before the first statement is executed, a destructor
    is pushed onto the stack that will invoke the following logic when
    the sub call returns, by whatever means.
   For xsubs, we don't get control until the xsub has returned,
    in which case we execute the following logic at once:

 * When the call returns:
        overhead_ticks  = cumulative_overhead_ticks - initial_overhead_ticks
        called_sub_secs = cumulative_subr_secs - initial_subr_secs
        incl_subr_sec = (time since call) - overhead_ticks
        excl_subr_sec = incl_subr_sec - called_sub_secs
   Then:
        cumulative_subr_secs += excl_subr_sec

A problem with xsub calls is that we don't get control until after the
xsub has returned. If it exits via an exception then we don't get
control at all and the call is 'lost'.

Here's the key part of the trace of sub exits (edited for clarity):

         A                  B              C                 D       E        B 
       G       H       I         K  L
negsub   4.044777s incl -   3.989190s =    0.05s excl (sub 111.963-107.974=   
3.98, oh 8.7e+08-8.3e+08=3.0e+07t) d1 @568:337
oddsub   4.851043s incl -   4.850924s =    0.00s excl (sub 112.019-107.168=   
4.85, oh 8.7e+08-8.3e+08=3.5e+07t) d1 @568:224
negsub  14.143920s incl -  13.951861s =    0.19s excl (sub 127.769-113.817=  
13.95, oh 9.9e+08-8.8e+08=1.1e+08t) d1 @568:337
oddsub  15.940870s incl -  15.940748s =    0.00s excl (sub 127.961-112.02 =  
15.94, oh 9.9e+08-8.7e+08=1.2e+08t) d1 @568:224
negsub 177.096818s incl - 248.946867s = -426.00s excl (sub 410.867-161.92 = 
248.94, oh 3.2e+09-1.2e+09=1.9e+09t) d1 @568:337
oddsub 293.834499s incl --143.138517s =  436.90s excl (sub 
-15.176-127.962=-143.19, oh 3.2e+09-9.9e+08=-2.e+09t) d1 @568:224
negsub   0.064451s incl -   0.064371s =    0.00s excl (sub 520.538-520.474=   
0.06, oh 3.2e+09-3.2e+09=70860t)   d1 @568:337
oddsub   0.065378s incl -   0.065252s =    0.00s excl (sub 520.539-520.473=   
0.06, oh 3.2e+09-3.2e+09=80430t)   d1 @1:123

The first sign of trouble is B being greater than A on the 5th line.
B is derived from D and E. E looks consistent. D looks high.

Here's a key to the columns:

A: Inclusive time in subroutine and the subroutine it called
    incl_subr_sec <= (time_now-initial_call_time) - 
(overhead_ticks/ticks_per_sec)

B: Time spent in subs it called (current cumulative 'time in subs' - initial 
value)
    called_sub_secs <= cumulative_subr_secs D - initial_subr_secs E

C: Time spent in this sub minus Time spent in subs it called
    excl_subr_sec <= incl_subr_sec A - called_sub_secs B

D: Current cumulative exclusive subroutine time
    cumulative_subr_secs <= [ += excl_subr_sec C after each sub call returns ]

E: Initial cumulative exclusive subroutine time when sub was called
    initial_subr_secs <= [ value of cumulative_subr_secs D at when sub was 
called ]

G: cumulative_overhead_ticks  <= cumulative statement profiler overheads
H: initial_overhead_ticks     <= cumulative_overhead_ticks when sub was called
I: overhead_ticks             <= cumulative_overhead_ticks G - 
initial_overhead_ticks H
K: call_depth                 <= recursion depth (was broken till r795)
L: fid_line                   <= file id and line where sub was called from

D (cumulative_subr_secs) is accumulated from the C (excl_subr_sec) of
each subroutine as it exits. For D to be too high a previous sub call
must have had excl_subr_sec (C) too high. For C to be too high

I suspect the problem is exceptions being propagated out of xsub calls.
For example:

    perl sub -> xsub -> perl code that croaks

My hypothesis is that the problem is caused by the exit of the xsub being
'lost' (due to the way xsubs are profiled, missing them if they croak).

If the sub that called the xsub doesn't catch the exception then it'll
exit triggering the profiler to complete the profile of that sub call.
The calculation will be missing the time spent in the xsub because
cumulative_subr_secs didn't get incremented at the end of the xsub call.
So called_sub_secs will be too low, making excl_subr_sec too high.
But that doesn't fit the data.

So what happens if the sub that called the xsub does catch the
exception? In that case it carries on working, making calls etc.
Eventually it'll exit and the same logic will apply. So that doesn't fit
either.

There's another puzzle in the overhead figures. The statement profiler
measures time at the start and end of processing for each statement (so
time spent in the profiler, writing to the file for example, is excluded.)
It accumulates the measured overhead into the cumulative_overhead_ticks
variable. The subroutine profiler snapshots that value on subroutine
entry (into initial_overhead_ticks) so that on exit it can measure the
growth (cumulative_overhead_ticks-initial_overhead_ticks) and subtract it.

So cumulative_overhead_ticks should always be growing and
initial_overhead_ticks should always be <= cumulative_overhead_ticks.
But something odd happened:

    G             H               I
oh  8.7002e+08 -  8.39972e+08 =   3.0050e+07t
oh  8.7002e+08 -  8.34367e+08 =   3.5657e+07t
oh  9.9385e+08 -  8.82582e+08 =   1.1127e+08t
oh  9.9385e+08 -  8.70024e+08 =   1.2382e+08t
oh 32.1610e+08 - 12.38190e+08 =  19.7791e+08t
oh 32.1610e+08 -  9.93854e+08 = -21.4748e+08t

That last line smells of an integer overflow to me. The large negative
overhead would have added 214s to A (incl_subr_sec). That seems to fit
what happend on line 6 of the trace, but I'm not sure if it sheds any
light on line 5 and I'm way over time on this.

This looks like the culprit for the overhead overflow:

-    NV overhead_ticks = (int)(cumulative_overhead_ticks - 
sub_call_start->current_overhead_ticks);
+    NV overhead_ticks = cumulative_overhead_ticks - 
sub_call_start->initial_overhead_ticks;

I've no idea why that int cast was in there. It would be most likely to
affect a user with a 32-bit perl, on a system with posix realtime clock
(and thus a 10ns tick interval) profiling a long running process.

I'm guessing that's a good description of you :-)

I've checked in a bunch of changes, including this one and some others
that may also be bug fixes.

So, can you give r796 a whirl and let us know how it goes?

Tim.

--~--~---------~--~----~------------~-------~--~----~
You've received this message because you are subscribed to
the Devel::NYTProf Development User group.

Group hosted at:  http://groups.google.com/group/develnytprof-dev
Project hosted at:  http://perl-devel-nytprof.googlecode.com
CPAN distribution:  http://search.cpan.org/dist/Devel-NYTProf

To post, email:  [email protected]
To unsubscribe, email:  [email protected]
-~----------~----~----~----~------~----~------~--~---

[develnytprof-dev: 1305] How the sub profiler works, for the record, plus a fix (was: Completely confused ...)

Reply via email to