Perrin Harkins writes: > To fix this, we moved to not generating anything until it was requested. We > would fetch the data the first time it was asked for, and then cache it for > future requests. (I think this corresponds to your option 2.) Of course > then you have to decide on a cache consistency approach for keeping that > data fresh. We used a simple TTL approach because it was fast and easy to > implement ("good enough").
I'd be curious to know the cache hit stats. BTW, this case seems to be an example of immutable data, which is definitely worth caching if performance dictates. > However, for many of us caching is a necessity for decent > performance. I agree with latter clause, but take issue with the former. Typical sites get a few hits a second at peak times. If a site isn't returning "typical" pages in under a second using mod_perl, it probably has some type of basic problem imo. A common problem is a missing database index. Another is too much memory allocation, e.g. passing around a large scalar instead of a reference or overuse of objects (classical Java problem). It isn't always the case that you can fix the problem, but caching doesn't fix it either. At least understand the performance problem(s) thoroughly before adding the cache. Here's a fun example of a design flaw. It is a performance test sent to another list. The author happened to work for one of our competitors. :-) >> That may well be the problem. Building giant strings using .= can be incredibly slow; Perl has to reallocate and copy the string for each append operation. Performance would likely improve in most situations if an array were used as a buffer, instead. Push new strings onto the array instead of appending them to a string. #!/usr/bin/perl -w ### Append.bench ### use Benchmark; sub R () { 50 } sub Q () { 100 } @array = (" " x R) x Q; sub Append { my $str = ""; map { $str .= $_ } @array; } sub Push { my @temp; map { push @temp, $_ } @array; my $str = join "", @temp; } timethese($ARGV[0], { append => \&Append, push => \&Push }); << Such a simple piece of code, yet the conclusion is incorrect. The problem is in the use of map instead of foreach for the performance test iterations. The result of Append is an array of whose length is Q and whose elements grow from R to R * Q. Change the map to a foreach and you'll see that push/join is much slower than .=. Return a string reference from Append. It saves a copy. If this is "the page", you'll see a significant improvement in performance. Interestingly, this couldn't be "the problem", because the hypothesis is incorrect. The incorrect test just validated something that was faulty to begin with. This brings up "you can't talk about it unless you can measure it". Use a profiler on the actual code. Add performance stats in your code. For example, we encapsulate all DBI accesses and accumulate the time spent in DBI on any request. We also track the time we spend processing the entire request. Adding a cache is piling more code onto a solution. It sometimes is like adding lots of salt to bad cooking. You do it when you have to, but you end up paying for it later. Sorry if my post seems pedantic or obvious. I haven't seen this type of stuff discussed much in this particular context. Besides I'm a contrarian. ;-) Rob