On Tue, Jun 30, 2009 at 03:50:18PM -0700, Jeffrey Thalhammer wrote: > Hi Tim- > > I'm sorry I couldn't get this to you before YAPC.
No problem Jeffrey. You got it to me before OSCON, which is what I'm preparing for, so thanks for that. > I'm not very experienced > with code optimization, so I'm not sure how much insight I really have. Discovering what didn't work, or blind alleys, is also very useful. > When we started looking to optimize Perl-Critic, our profile looked like > this (total run time: 742 seconds): > I didn't see any obvious bottlenecks, but it was pretty clear that some > operations were happening a lot of times. For example isa() was called > more than 6 million times. Our first attempt to optimize was to memoize > some of the methods listed above. However, none of those methods were > particularly slow -- the expense just came from the cost of changing the > stack frame for each subroutine call. So memoizing actually slowed things > down, because each call now required memoize to make one or more additional > subroutine calls to normalize the arguments and find the return value in > the cache. Good point. I need to talk about "when not to memoize". Noted, thanks. > Since we couldn't make these subroutines any faster, we started looking for > ways to change our algorithms so we wouldn't have to call these subroutines > quite so often. NYTProf made this easy because we could drill into the > reports and find out who was calling these subroutines and how often. But > since they are xsubs, we couldn't get that information for the isa() and > refaddr() subroutines. To work around this, we implemented pure-perl > versions of the subroutines just for diagnostic purposes. This made the > code a lot slower, but now we could see where the calls were coming from. I need to fix the need to do that. Currently, for xsubs in packages that also have perl subs executed, I add dummy stubs to the end of the source code and annotate those with timing and caller details. For xsubs in packages where no perl subs were executed I need to do more work. Possibly even manufacturing a completely fake source file. I'll bump that up the to-do list. Getting cases like UNIVERSAL.pm to work shouldn't be too hard. > Most Policies in Perl-Critic work by process-of-elimination. When given a > node in the PPI document, the Policy performs series of assertions on the > node. If the node fails any of these assertions, then a violation is > emitted (or not). By looking at the NYTProf analysis, we could see which > assertions were the most expensive and/or the most likely to fail. Then, > we could put the assertions in the order that would take the least time or > would fail most often, thus allowing us to exit the subroutine as quickly > as possible. Yes, "fail fast" (or "exit early") is a useful concept. Noted, thanks. > NYTProf also showed us that operator overloading is expensive because it > requires additional subroutine calls each time. Again, this is mostly just > because it happens so often in Perl-Critic -- other systems might not have > this problem. So we changed all of our code to call the underlying > methods, rather than relying on operator overloading. I'm planning to write a script to demonstrate the cost differences of various constructs. I'll aim demo overloading vs direct. That'll be interesting. I'd guess it's roughly a 'one extra call' overhead (and that the overhead for tie() is similar). > Since memoizing didn't help, we started looking for other ways to cache > results to subroutine calls. We ended up overriding the most frequently > used PPI methods to look up the results directly in a hash. This avoided > the overhead of the extra subroutine calls that memoize does. We didn't > want the cache to be an attribute of the PPI objects because we didn't want > to pollute it with a bunch of stuff. That seems odd at first sight, but I'm sure you have good reasons. > So instead we used a series of global hashes (one for each method that > we optimized). The tricky part was figuring out when to purge the > cache, and I'm still not sure we've got it right. Having an option to disable, or frequently discard, the cache would give you a quick way to check that it's not the cause of any problems that get reported in future. (Like disabling garbage collection in parrot.) > Finally, I used NYTProf as a benchmarking tool as well. I didn't really > care about the overhead of NYTProf itself, I just wanted to know if it ran > faster or slower than the last time (assuming that the overhead of NYTProf > was fairly constant, or at least linear). But of course the total run time > depends on the current load on the server so there was always some > variation that I couldn't account for, and sometimes, the variation could > be large. You could use usecputime=1 to get (rough) timings that should be fairly independent of load. If yor system supports POSIX realtime clocks then you could try clock=N where N is the integer id of the POSIX realtime clock you want to use. In this case it might be CLOCK_PROF, CLOCK_VIRTUAL or CLOCK_PROCESS_CPUTIME_ID. For more info see http://search.cpan.org/~timb/Devel-NYTProf/lib/Devel/NYTProf.pm#CLOCKS > I don't know if this makes sense, but I found myself wanting to > see the time spent in each line/block/subroutine as a percentage of total > time. That way I could make a more sensible comparison between runs. For subroutines the percentage is available as a tool-tip when you hover over the timing values in the subroutine table. I thought about adding columns for it but decided not to at the time, though I can't recall why now. I'll ponder it again. I'd like an ecosystem of reporting tools/modules to grow around NYTProf (which is partly why I've put effort into the data model) but that's not happend yet. Many great tools, like a "performance diff" could be developed. I think a key enabler of that would be to refactor the reporting code into modules with a single 'nytprof' command as a common front-end. Plugins could then be developed to support all sorts of interesting analysis and reporting behaviour. > We haven't released all the optimized code yet, but this is our best > profile so far (total run time: 449 seconds): > You can see that we reduced the number of times isa() and other subroutines > are called by more than 50%. This has produced a total speedup of about > 40% as well. Impressive. Well done! > At this point, I'm not sure we can squeeze any more > performance out of Perl-Critic without major changes to the architecture. How helpful is NYTProf at showing you what architectural-level changes might help? > Ideally, we could create an optimized domain-specific language for > describing Policies, but that's not likely to happen soon. Also, rough > calculations show that about 30% of the time is spent in PPI just parsing > the document. I know Adam Kennedy is working on an XS version of PPI, so > we could gain some speed there. > > As for my general experience with NYTProf, I have only a few comments: > Personally, I found the red/yellow/green coloring of the results to be > unhelpful Umm. Could you explain why? I tend to see them as a way to get a quick sense of what to focus on (red) and ignore (green). Do you think a finer granularity of colors help or hinder? > and I was sometimes confused by the different units of > measurement for time. I suppose that's why the colors are there, but I > think I might prefer to see everything displayed in the same unit of > measure, with the same number of significant digits (to me, the "size" of > the number stands out more than anything else). Interesting. That could be added as an option. > I absolutely love the way NYTProf shows me the callers of each > subroutine, sorted by the number of times it was called from there. > I wonder if it would make sense to be able to sort by time as well. Yes, and I'd love to do that. The problem really is just my lack of experience with JavaScript and jQuery. Volunteers welcome! > I haven't really figured out how to use the > analysis by package prefix on the index page. I'm happy to say that that'll be superceeded by an interactive treemap soon! > I haven't had much use for the block & sub views. Me neither. I'd like to get rid of them and show the block and sub level data interactively on the same page. I think there can be valuable information in the block data especially, but it needs a better visual presentation. The effect of the perl optimizer also confuses the picture. > I usually start by > looking for bottleneck subroutines in the "Top 15" list, and then start > looking at the times for individual lines within those subs. I suppose > that if you had really large subroutines then the block view would help > narrow down the slow parts of the sub, Yes, that's the idea. > but Perl-Critic is fairly well > factored. If I did use the block views, I think I might also want to see > the "Top N" blocks in a given subroutine, and then see the "Top N" lines > within that block. In other words, drill down through > module->subroutine->block->line and see the top offenders at each level. > But I'm speculating about all this, and it might just be a sign of my > inexperience with the tools. No, you're right. Though I'd counter that this is where the coloring helps by letting you scroll through the code fast looking for patches of red. > So that pretty much sums up my experience using NYTProf with Perl-Critic. > It is a fabulous tool and I can't thank you enough for all the work you've > done. With more time and practice, I might be able to give you more > insights. Drop me (or [email protected]) a note if anything comes to mind. > Please keep up the outstanding work! Ditto! Many thanks Jeffrey. Very useful! Tim. --~--~---------~--~----~------------~-------~--~----~ You've received this message because you are subscribed to the Devel::NYTProf Development User group. Group hosted at: http://groups.google.com/group/develnytprof-dev Project hosted at: http://perl-devel-nytprof.googlecode.com CPAN distribution: http://search.cpan.org/dist/Devel-NYTProf To post, email: [email protected] To unsubscribe, email: [email protected] -~----------~----~----~----~------~----~------~--~---
