Re: When to cache

Rob Nagler Thu, 24 Jan 2002 10:36:32 -0800

Perrin Harkins writes:
> To fix this, we moved to not generating anything until it was requested.  We
> would fetch the data the first time it was asked for, and then cache it for
> future requests.  (I think this corresponds to your option 2.)  Of course
> then you have to decide on a cache consistency approach for keeping that
> data fresh.  We used a simple TTL approach because it was fast and easy to
> implement ("good enough").


I'd be curious to know the cache hit stats.  BTW, this case seems to
be an example of immutable data, which is definitely worth caching if
performance dictates.

> However, for many of us caching is a necessity for decent
> performance.

I agree with latter clause, but take issue with the former.  Typical
sites get a few hits a second at peak times.  If a site isn't
returning "typical" pages in under a second using mod_perl, it
probably has some type of basic problem imo.

A common problem is a missing database index.  Another is too much
memory allocation, e.g. passing around a large scalar instead of a
reference or overuse of objects (classical Java problem).  It isn't
always the case that you can fix the problem, but caching doesn't fix
it either.  At least understand the performance problem(s) thoroughly
before adding the cache.

Here's a fun example of a design flaw.  It is a performance test sent
to another list.  The author happened to work for one of our
competitors.  :-)

>>
  That may well be the problem. Building giant strings using .= can be
  incredibly slow; Perl has to reallocate and copy the string for each
  append operation. Performance would likely improve in most
  situations if an array were used as a buffer, instead. Push new
  strings onto the array instead of appending them to a string.

    #!/usr/bin/perl -w
    ### Append.bench ###

    use Benchmark;

    sub R () { 50 }
    sub Q () { 100 }
    @array = (" " x R) x Q;

    sub Append {
        my $str = "";
        map { $str .= $_ } @array;
    }

    sub Push {
        my @temp;
        map { push @temp, $_ } @array;
        my $str = join "", @temp;
    }

    timethese($ARGV[0],
        { append => \&Append,
          push   => \&Push });
<<

Such a simple piece of code, yet the conclusion is incorrect.  The
problem is in the use of map instead of foreach for the performance
test iterations.  The result of Append is an array of whose length is
Q and whose elements grow from R to R * Q.  Change the map to a
foreach and you'll see that push/join is much slower than .=.

Return a string reference from Append.  It saves a copy.
If this is "the page", you'll see a significant improvement in
performance.

Interestingly, this couldn't be "the problem", because the hypothesis
is incorrect.  The incorrect test just validated something that was
faulty to begin with.  This brings up "you can't talk about it unless
you can measure it".  Use a profiler on the actual code.  Add
performance stats in your code.  For example, we encapsulate all DBI
accesses and accumulate the time spent in DBI on any request.  We also
track the time we spend processing the entire request.

Adding a cache is piling more code onto a solution.  It sometimes is
like adding lots of salt to bad cooking.  You do it when you have to,
but you end up paying for it later.

Sorry if my post seems pedantic or obvious.  I haven't seen this type
of stuff discussed much in this particular context.  Besides I'm a
contrarian. ;-)

Rob

Re: When to cache

Reply via email to