I've had similarly happy experiences with the profiler when working on Redex's pattern matcher. It has been great. (+1 on the sorting thing.)
Thanks, Eli. Robby On Sat, Mar 17, 2012 at 7:26 PM, Danny Yoo <[email protected]> wrote: > Here's a small running "internal dialogue" to demonstrate how the use > of a profiler can help detect unexpected performance problems. > > --- > > I was trying to trace a compilation performance issue in Whalesong. > One of the things I've added to help me chase this is an internal > debugging flag "--enable-profiling", that automatically runs Racket's > statistical profiler as it's doing its work. > > > So I run Whalesong on one of my programs and watch what comes out of > the profiler. From earlier discussion, I run Racket without the JIT > so that the profiler presents more useful information. > > ###### > $ racket --no-jit ../../whalesong.rkt build --enable-profiling nucleic2.rkt > ###### > > ... lots and lots of profiler output starts printing to screen. > > (It would be _very_ useful if the text renderer of the profiler were > sorted in terms of self time, by the way! Can this be changed?) > > > The profiler generates a sequence of "node" outputs, where each node > represents a function call and its context. I start scanning down the > "self" time column, which is how much time is being spent doing work > in a particular function. > > One of the functions in the profile jumps out at me: > > ####################################################################### > Profiling results > ----------------- > Total cpu time observed: 9144ms (out of 9268ms) > Number of samples taken: 3595 (once every 3ms) > ... > [106] 1822(20.0%) 1822(20.0%) for-loop > /home/dyoo/local/racket-5.2.1/lib/racket/collects/compiler/zo-parse.rkt:490:2 > ####################################################################### > > > Oh! About twenty percent of the time is being spent here, in parsing > zo files! Whoa. Why? Let me take a look at that snippet of code. > Here's is the surrounding context: > > ;; in zo-parse.rkt ;;;;;;;;;;;;;;;;; > (define cpt-table > ;; The "schcpt.h" mapping > `([0 escape] > [1 symbol] > ;; ... > ;; dyoo: omitting some output > ;; ... > [247 255 small-application])) > > (define (cpt-table-lookup i) > (for/or ([ent (in-list cpt-table)]) > (match ent > [(list k sym) > (and (= k i) (cons k sym))] > [(list k k* sym) > (and (<= k i) > (< i k*) > (cons k sym))]))) > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > cpt-table-lookup defines the for-loop that the profiler reports as > being a hot-spot. Oh, wait! The function looks odd. What > cpt-table-lookup is doing is a lot of work, and I expected it to be a > simple, constant table lookup. (In a sense, it is "constant", but the > constant is larger than it should be.) > > > Since cpt-table is itself a constant value, that leaves > cpt-table-lookup ripe for an optimization: I can precompute a flat > lookup table. I'll fix cpt-table-lookup by creating a vector, > up-front, that flattens out the cpt-table. Here's what that looks > like: > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > (define cpt-table-as-vector (make-vector 256)) > (for ([ent (in-list cpt-table)]) > (match ent > [(list k sym) > (vector-set! cpt-table-as-vector k (cons k sym))] > [(list k k* sym) > (for ([i (in-range k k*)]) > (vector-set! cpt-table-as-vector i (cons k sym)))])) > > (define (cpt-table-lookup i) > (vector-ref cpt-table-as-vector i)) > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > > Ok, let's see what the profiler tells me now: > > ############################################ > > Profiling results > ----------------- > Total cpu time observed: 7052ms (out of 7148ms) > Number of samples taken: 2812 (once every 3ms) > ############################################ > > Sweet. That helped cut it down from 9 seconds to 7. The node > associated to the cpt-table-lookup function doesn't show up in the > profile anymore, either. About a 20% performance improvement. Cool! > > > The effect of this optimization isn't so pronounced when Racket is > running under its JIT. Under the JIT, my program took 4.3 seconds > before the optimization, and 3.9 seconds after the optimization, > giving me a 9% improvement. Still, not too bad. > > > I'll send a patch to compiler/zo-parse later. But I wanted to share > the importance of the profiler: I would not have easily pinpointed > zo-parse without it. > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users

