Re: performance coding project? (was: Re: When to cache)

2002-02-01 Thread Ask Bjoern Hansen

On Sat, 26 Jan 2002, Stas Bekman wrote:

[...]
  It's much better to build your system, profile it, and fix the bottlenecks.
  The most effective changes are almost never simple coding changes like the
  one you showed, but rather large things like using qmail-inject instead of
  SMTP, caching a slow database query or method call, or changing your
  architecture to reduce the number of network accesses or inter-process
  communications.
 
 It all depends on what kind of application do you have. If you code is 
 CPU-bound these seemingly insignificant optimizations can have a very 
 significant influence on the overall service performance. Of course if 
 you app, is IO-bound or depends with some external component, than your 
 argumentation applies.

Eh, any real system will be a combination.  Sure; when everything
works then it's worth finding the CPU intensive places and fix them
up, but for the most part the system design is far far more
important than any code optimiziation you can ever do.

My usual rhetorics: Your average code optimization will gain you at
most a few percent performance gain.  A better design can often make
things 10 times faster and use only a fraction of your memory.

 On the other hand how often do you get a chance to profile your code and 
   see how to improve its speed in the real world. Managers never plan 
 for debugging period, not talking about optimizations periods. And while 
 premature optimizations are usually evil, as they will bait you later, 
 knowing the differences between coding styles does help in a long run 
 and I don't consider these as premature optimizations.

If you don't waste time profiling every little snippet of code you 
might have more time to fix the real bottlenecks in the end. ;-)
 
[...]
 All I want to say is that there is no one-fits-all solution in Perl, 
 because of TIMTOWTDI, so you can learn a lot from running benchmarks and 
 picking your favorite coding style and change it as the language 
 evolves. But you shouldn't blindly apply the outcomes of the benchmarks 
 without running your own benchmarks.

Amen.

(And don't get me wrong; I think a repository of information about
the nitty gritty optimization things would be great - I just find it
to be bad advice to not tell people to do the proper design first).


 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
more than a billion impressions per week, http://valueclick.com





Re: performance coding project? (was: Re: When to cache)

2002-02-01 Thread Ask Bjoern Hansen

On Sat, 26 Jan 2002, Perrin Harkins wrote:

  It all depends on what kind of application do you have. If you code is
  CPU-bound these seemingly insignificant optimizations can have a very
  significant influence on the overall service performance.
 
 Do such beasts really exist?  I mean, I guess they must, but I've never
 seen a mod_perl application that was CPU-bound.  They always seem to be
 constrained by database speed and memory.

At ValueClick we only have a few BerkeleyDBs that are in the
request loop for 99% of the traffic; everything else is in fairly
efficient in-memory data structures.

So there we do of course care about the tiny small optimiziations 
because there's a direct correlation between saved CPU cycles and 
request capacity.

However, it's only that way because we made a good design for the
application in the first place. :-)  (And for all the other code we
rarely care about using a few more CPU cycles if it is
easier/cleaner/more flexible/comes to mind first). Who cares if the
perl code gets ready to wait for the database a few milliseconds
faster? :-)


 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
more than a billion impressions per week, http://valueclick.com




Re: performance coding project? (was: Re: When to cache)

2002-01-31 Thread raptor

One memory  speed saving is using global VARS, I know it is not
recomended practice, but if from the begining of the project u set a
convention what are the names of global vars it is ok..F.e. I'm using in
all DB pages at the begining :

our $dbh = dbConnect() or dbiError();

See I know (i'm sure) that when I use DB I will always initialize the
var. One other example is (ASP.pm):

our $userID = $$Session{userID};
my $something = $$Request{Params}{something}

This is not saving me memory, but shorten my typewriting especialy if it
is used frequently or if I need to change FORM-param from something to
anything..etc..

I think in mod_perl world we are about to search MEMORY optimisation
RATHER speed optimisation... :)


raptor
[EMAIL PROTECTED]




Re: performance coding project? (was: Re: When to cache)

2002-01-27 Thread Ged Haywood

Hi all,

Stas has a point.  Perl makes it very easy to do silly things.
This is what I was doing last week:

if( m/\b$Needle\b/ ) {...}
Eight hours. (Silly:)

if( index($Haystack,$Needle)  m/\b$Needle\b/ ) {...}
Twelve minutes.

73,
Ged.







Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Perrin Harkins

 It all depends on what kind of application do you have. If you code is
 CPU-bound these seemingly insignificant optimizations can have a very
 significant influence on the overall service performance.

Do such beasts really exist?  I mean, I guess they must, but I've never
seen a mod_perl application that was CPU-bound.  They always seem to be
constrained by database speed and memory.

 On the other hand how often do you get a chance to profile your code
and
   see how to improve its speed in the real world. Managers never plan
 for debugging period, not talking about optimizations periods.

If you plan a good architecture that avoids the truly slow stuff
(disk/network access) as much as possible, your application is usually
fast enough without spending much time on optimization (except maybe
some database tuning).  At my last couple of jobs we actually did have
load testing and optimization as part of the development plan, but
that's because we knew we'd be getting pretty high levels of traffic.
Most people don't need to tune very much if they have a good
architecture, and it's enough for them to fix problems as they become
visible.

Back to your idea: you're obviously interested in the low-level
optimization stuff, so of course you should go ahead with it.  I don't
think it needs to be a separate project, but improvements to the
performance section of the guide are always a good idea.  I know that I
have taken all of the DBI performance tips to heart and found them very
useful.

I'm more interested in writing about higher level performance issues
(efficient shared data, config tuning, caching), so I'll continue to
work on those things.  I'm submitting a proposal for a talk on data
sharing techniques at this year's Perl Conference, so hopefully I can
contribute that to the guide after I finish it.

- Perrin




Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Ed Grimm

On Sat, 26 Jan 2002, Perrin Harkins wrote:

 It all depends on what kind of application do you have. If you code
 is CPU-bound these seemingly insignificant optimizations can have a
 very significant influence on the overall service performance.

 Do such beasts really exist?  I mean, I guess they must, but I've
 never seen a mod_perl application that was CPU-bound.  They always
 seem to be constrained by database speed and memory.

I've seen one.  However, it was much like a normal performance problem -
the issue was with one loop which ran one line which was quite
pathological.  Replacing loop with an s///eg construct eliminated the
problem; there was no need for seemlingly insignificant optimizations.
(Actually, the problem was *created* by premature optimization - the
coder had utilized code that was more efficient than s/// in one special
case, to handle a vastly different instance.)

However, there could conceivably be code which was more of a performance
issue, especially when the mod_perl utilizes a very successful cache on
a high traffic site.

 On the other hand how often do you get a chance to profile your code
 and see how to improve its speed in the real world. Managers never
 plan for debugging period, not talking about optimizations periods.

Unless there's already a problem, and you have a good manager.  We've
had a couple of instances where we were given time (on the schedule,
before the release) to improve speed after a release.  It's quite rare,
though, and I've never seen it for a mod_perl project.

Ed




Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Sam Tregar

On Sat, 26 Jan 2002, Perrin Harkins wrote:

  It all depends on what kind of application do you have. If you code is
  CPU-bound these seemingly insignificant optimizations can have a very
  significant influence on the overall service performance.

 Do such beasts really exist?  I mean, I guess they must, but I've never
 seen a mod_perl application that was CPU-bound.  They always seem to be
 constrained by database speed and memory.

Think search engines.  Once you've figured out how to get your search
database to fit in memory (or devised a cachin strategy to get the
important parts there) you're essentially looking at a CPU-bound problem.
These days the best solution is probably some judicious use of Inline::C.
Back when I last tackled the problem I had to hike up mount XS to find my
grail...

-sam






Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Milo Hyson

On Saturday 26 January 2002 03:40 pm, Sam Tregar wrote:
 Think search engines.  Once you've figured out how to get your search
 database to fit in memory (or devised a cachin strategy to get the
 important parts there) you're essentially looking at a CPU-bound problem.
 These days the best solution is probably some judicious use of Inline::C.
 Back when I last tackled the problem I had to hike up mount XS to find my
 grail...

I agree. There are some situations that are just too complex for a DBMS to 
handle directly, at least in any sort of efficient fashion. However, 
depending on the load in those cases, Perrin's solution for eToys is probably 
a good approach (i.e. custom search software written in C/C++).

-- 
Milo Hyson
CyberLife Labs, LLC



Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Stas Bekman

Perrin Harkins wrote:


 Back to your idea: you're obviously interested in the low-level
 optimization stuff, so of course you should go ahead with it.  I don't
 think it needs to be a separate project, but improvements to the
 performance section of the guide are always a good idea.


It has to be a run-able code, so people can verify the facts which may 
change with different OS/versions of Perl. e.g. Joe says that $r-args 
is slower then Apache::Request-param, I saw the opposite. Having these 
as a run-able bits, is much nicer.

  I know that I
 have taken all of the DBI performance tips to heart and found them very
 useful.


:)

That's mostly JWB's work I think.


 I'm more interested in writing about higher level performance issues
 (efficient shared data, config tuning, caching), so I'll continue to
 work on those things.  I'm submitting a proposal for a talk on data
 sharing techniques at this year's Perl Conference, so hopefully I can
 contribute that to the guide after I finish it.

Go Perrin!


_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Issac Goldstand

Ah yes, but don't forget that to get this speed, you are sacrificing 
memory.  You now have another locally scoped variable for perl to keep 
track of, which increases memory usage and general overhead (allocation 
and garbage collection).  Now, those, too, are insignificant with one 
use, but the significance will probably rise with the speed gain as you 
use these techniques more often...

  Issac


Stas Bekman wrote:

 Rob Nagler wrote:

 Perrin Harkins writes:


 Here's a fun example of a design flaw.  It is a performance test sent
 to another list.  The author happened to work for one of our
 competitors.  :-)


   That may well be the problem. Building giant strings using .= can be
   incredibly slow; Perl has to reallocate and copy the string for each
   append operation. Performance would likely improve in most
   situations if an array were used as a buffer, instead. Push new
   strings onto the array instead of appending them to a string.

 #!/usr/bin/perl -w
 ### Append.bench ###

 use Benchmark;

 sub R () { 50 }
 sub Q () { 100 }
 @array = (  x R) x Q;

 sub Append {
 my $str = ;
 map { $str .= $_ } @array;
 }

 sub Push {
 my @temp;
 map { push @temp, $_ } @array;
 my $str = join , @temp;
 }

 timethese($ARGV[0],
 { append = \Append,
   push   = \Push });
 

 Such a simple piece of code, yet the conclusion is incorrect.  The
 problem is in the use of map instead of foreach for the performance
 test iterations.  The result of Append is an array of whose length is
 Q and whose elements grow from R to R * Q.  Change the map to a
 foreach and you'll see that push/join is much slower than .=.

 Return a string reference from Append.  It saves a copy.
 If this is the page, you'll see a significant improvement in
 performance.

 Interestingly, this couldn't be the problem, because the hypothesis
 is incorrect.  The incorrect test just validated something that was
 faulty to begin with.  This brings up you can't talk about it unless
 you can measure it.  Use a profiler on the actual code.  Add
 performance stats in your code.  For example, we encapsulate all DBI
 accesses and accumulate the time spent in DBI on any request.  We also
 track the time we spend processing the entire request.


 While we are at this topic, I want to suggest a new project. I was 
 planning to start working on it long time ago, but other things always 
 took over.

 The perl.apache.org/guide/performance.html and a whole bunch of 
 performance chaptes in the upcoming modperl book have a lot of 
 benchmarks, comparing various coding techniques. Such as the example 
 you've provided. The benchmarks are doing both pure Perl and mod_perl 
 specific code (which requires running Apache, a perfect job for the 
 new Apache::Test framework.)

 Now throw in the various techniques from 'Effective Perl' book and 
 voila you have a great project to learn from.

 Also remember that on varous platforms and various Perl versions the 
 benchmark results will differ, sometimes very significantly.

 I even have a name for the project: Speedy Code Habits  :)

 The point is that I want to develop a coding style which tries hard to 
 do early premature optimizations. Let me give you an example of what I 
 mean. Tell me what's faster:

 if (ref $b eq 'ARRAY'){
$a = 1;
 }
 elsif (ref $b eq 'HASH'){
$a = 1;
 }

 or:

 my $ref = ref $b;
 if ($ref eq 'ARRAY'){
$a = 1;
 }
 elsif ($ref eq 'HASH'){
$a = 1;
 }

 Sure, the win can be very little, but it ads up as your code base's 
 size grows.

 Give you a similar example:

 if ($a-lookup eq 'ARRAY'){
$a = 1;
 }
 elsif ($a-lookup eq 'HASH'){
$a = 1;
 }

 or

 my $lookup = $a-lookup;
 if ($lookup eq 'ARRAY'){
$a = 1;
 }
 elsif ($lookup eq 'HASH'){
$a = 1;
 }

 now throw in sub attributes and re-run the test again.

 add examples of map vs for.
 add examples of method lookup vs. procedures
 add examples of concat vs. list vs. other stuff from the guide.

 mod_perl specific examples from the guide/book ($r-args vs 
 Apache::Request::param, etc)

 If you understand where I try to take you, help me to pull this 
 project off and I think in a long run we can benefit a lot.

 This goes along with the Apache::Benchmark project I think (which is 
 yet another thing I want to start...), probably could have these two 
 ideas put together.

 _







Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Stas Bekman

Issac Goldstand wrote:

 Ah yes, but don't forget that to get this speed, you are sacrificing 
 memory.  You now have another locally scoped variable for perl to keep 
 track of, which increases memory usage and general overhead (allocation 
 and garbage collection).  Now, those, too, are insignificant with one 
 use, but the significance will probably rise with the speed gain as you 
 use these techniques more often...

Yes, I know. But from the benchmark you can probably have an idea 
whether the 'caching' is worth the speedup (given that the benchmark is 
similar to your case). For example it depends on how many times you need 
to use the cache. And how big is the value. e.g. may be caching 
$foo-bar doesn't worth it, but what about $foo-bar-baz? or if you 
have a deeply nested hash and you need to work with only a part of 
subtree, do you grab a reference to this sub-tree node and work it or do 
you keep on dereferencing all the way up to the root on every call?

But personally I still didn't decide which one is better and every time 
I'm in a similar situation, I'm never sure which way to take, to cache 
or not to cache. But that's the cool thing about Perl, it keeps you on 
your toes all the time (if you want to :).

BTW, if somebody has interesting reasonings for using one technique 
versus the other performance-wise (speed+memory), please share them.

This project's idea is to give stright numbers for some definitely bad 
coding practices (e.g. map() in the void context), and things which vary 
a lot depending on the context, but are interesting to think about (e.g. 
the last example of caching the result of ref() or a method call)

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Rob Nagler

 This project's idea is to give stright numbers for some definitely bad 
 coding practices (e.g. map() in the void context), and things which vary 
 a lot depending on the context, but are interesting to think about (e.g. 
 the last example of caching the result of ref() or a method call)

I think this would be handy.  I spend a fair bit of time
wondering/testing myself.  Would be nice to have a repository of the
tradeoffs.

OTOH, I spend too much time mulling over unimportant performance
optimizations.  The foreach/map comparison is a good example of this.
It only starts to matter (read milliseconds) at the +100KB and up
range, I find.  If a site is returning 100KB pages for typical
responses, it has a problem at a completely different level than map
vs foreach.

Rob

Pre-optimization is the root of all evil -- C.A.R. Hoare



Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Perrin Harkins

 The point is that I want to develop a coding style which tries hard to
 do early premature optimizations.

We've talked about this kind of thing before.  My opinion is still the same
as it was: low-level speed optimization before you have a working system is
a waste of your time.

It's much better to build your system, profile it, and fix the bottlenecks.
The most effective changes are almost never simple coding changes like the
one you showed, but rather large things like using qmail-inject instead of
SMTP, caching a slow database query or method call, or changing your
architecture to reduce the number of network accesses or inter-process
communications.

The exception to this rule is that I do advocate thinking about memory usage
from the beginning.  There are no good tools for profiling memory used by
Perl, so you can't easily find the offenders later on.  Being careful about
passing references, slurping files, etc. pays off in better scalability
later.

- Perrin




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread David Wheeler

On Fri, 2002-01-25 at 09:08, Perrin Harkins wrote:

snip /

 It's much better to build your system, profile it, and fix the bottlenecks.
 The most effective changes are almost never simple coding changes like the
 one you showed, but rather large things like using qmail-inject instead of
 SMTP, caching a slow database query or method call, or changing your
 architecture to reduce the number of network accesses or inter-process
 communications.

qmail-inject? I've just been using sendmail or, preferentially,
Net::SMTP. Isn't using a system call more expensive? If not, how does
qmail-inject work?

Thanks,

David

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
   Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Matt Sergeant

On 25 Jan 2002, David Wheeler wrote:

 On Fri, 2002-01-25 at 09:08, Perrin Harkins wrote:

 snip /

  It's much better to build your system, profile it, and fix the bottlenecks.
  The most effective changes are almost never simple coding changes like the
  one you showed, but rather large things like using qmail-inject instead of
  SMTP, caching a slow database query or method call, or changing your
  architecture to reduce the number of network accesses or inter-process
  communications.

 qmail-inject? I've just been using sendmail or, preferentially,
 Net::SMTP. Isn't using a system call more expensive? If not, how does
 qmail-inject work?

With qmail, SMTP generally uses inetd, which is slow, or daemontools,
which is faster, but still slow, and more importantly, it anyway goes:

  perl - SMTP - inetd - qmail-smtpd - qmail-inject.

So with going direct to qmail-inject, your email skips out a boat load of
processing and goes direct into the queue.

Of course none of this is relevant if you're not using qmail ;-)

-- 
!-- Matt --
:-Get a smart net/:-




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Tatsuhiko Miyagawa

On Fri, 25 Jan 2002 21:15:54 + (GMT)
Matt Sergeant [EMAIL PROTECTED] wrote:

 
 With qmail, SMTP generally uses inetd, which is slow, or daemontools,
 which is faster, but still slow, and more importantly, it anyway goes:
 
   perl - SMTP - inetd - qmail-smtpd - qmail-inject.
 
 So with going direct to qmail-inject, your email skips out a boat load of
 processing and goes direct into the queue.
 
 Of course none of this is relevant if you're not using qmail ;-)

Yet another solution:

use Mail::QmailQueue, directly 
http://search.cpan.org/search?dist=Mail-QmailQueue


--
Tatsuhiko Miyagawa [EMAIL PROTECTED]




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread David Wheeler

On Fri, 2002-01-25 at 13:15, Matt Sergeant wrote:

 With qmail, SMTP generally uses inetd, which is slow, or daemontools,
 which is faster, but still slow, and more importantly, it anyway goes:
 
   perl - SMTP - inetd - qmail-smtpd - qmail-inject.
 
 So with going direct to qmail-inject, your email skips out a boat load of
 processing and goes direct into the queue.

Okay, that makes sense. In my activitymail CVS script I just used
sendmail.

 http://www.cpan.org/authors/id/D/DW/DWHEELER/activitymail-0.987

But it looks like this might be more efficient, if qmail happens to be
installed (not sure on SourceForge's servers).
 
 Of course none of this is relevant if you're not using qmail ;-)

Yes, and in Bricolage, I used Net::SMTP to keep it as
platform-independent as possible. It should work on Windows, even!
Besides, all mail gets sent during the Apache cleanup phase, so there
should be no noticeable delay for users.

David

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
   Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Joe Schaefer

Stas Bekman [EMAIL PROTECTED] writes:

 I even have a name for the project: Speedy Code Habits  :)
 
 The point is that I want to develop a coding style which tries hard to  
 do early premature optimizations.

I disagree with the POV you seem to be taking wrt write-time 
optimizations.  IMO, there are precious few situations where
writing Perl in some prescribed style will lead to the fastest code.
What's best for one code segment is often a mediocre (or even stupid)
choice for another.  And there's often no a-priori way to predict this
without being intimate with many dirty aspects of perl's innards.

I'm not at all against divining some abstract _principles_ for
accelerating a given solution to a problem, but trying to develop a 
Speedy Style is IMO folly.  My best and most universal advice would 
be to learn XS (or better Inline) and use a language that was _designed_
for writing finely-tuned sections of code.  But that's in the
post-working-prototype stage, *not* before.

[...]

 mod_perl specific examples from the guide/book ($r-args vs 
 Apache::Request::param, etc)

Well, I've complained about that one before, and since the 
guide's text hasn't changed yet I'll try saying it again:  

  Apache::Request::param() is FASTER THAN Apache::args(),
  and unless someone wants to rewrite args() IN C, it is 
  likely to remain that way. PERIOD.

Of course, if you are satisfied using Apache::args, than it would
be silly to change styles.

YMMV
-- 
Joe Schaefer




Re: performance coding project? (was: Re: When to cache)

2002-01-25 Thread Stas Bekman

Perrin Harkins wrote:

The point is that I want to develop a coding style which tries hard to
do early premature optimizations.

 
 We've talked about this kind of thing before.  My opinion is still the same
 as it was: low-level speed optimization before you have a working system is
 a waste of your time.
 
 It's much better to build your system, profile it, and fix the bottlenecks.
 The most effective changes are almost never simple coding changes like the
 one you showed, but rather large things like using qmail-inject instead of
 SMTP, caching a slow database query or method call, or changing your
 architecture to reduce the number of network accesses or inter-process
 communications.

It all depends on what kind of application do you have. If you code is 
CPU-bound these seemingly insignificant optimizations can have a very 
significant influence on the overall service performance. Of course if 
you app, is IO-bound or depends with some external component, than your 
argumentation applies.

On the other hand how often do you get a chance to profile your code and 
  see how to improve its speed in the real world. Managers never plan 
for debugging period, not talking about optimizations periods. And while 
premature optimizations are usually evil, as they will bait you later, 
knowing the differences between coding styles does help in a long run 
and I don't consider these as premature optimizations.

Definitely this discussion has no end. Everybody is right in their 
particular project, since there are no two projects which are the same.

All I want to say is that there is no one-fits-all solution in Perl, 
because of TIMTOWTDI, so you can learn a lot from running benchmarks and 
picking your favorite coding style and change it as the language 
evolves. But you shouldn't blindly apply the outcomes of the benchmarks 
without running your own benchmarks.

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: When to cache

2002-01-24 Thread Rob Nagler

 1) The old cache entry is overwritten with the new.
 2) The old cache entry is expired, thus forcing a database hit (and 
 subsequent cache load) on the next request.

3) Cache only stuff which doesn't expire (except on server restarts).

We don't cache any mutable data, and there are no sessions. We let the
database do the caching.  We use Oracle, which has a pretty good
cache.  We do cache some stuff that doesn't change, e.g. default
permissions, and we release weekly, which involves a server restart
and a refresh of the cache.

If you hit http://www.bivio.com , you'll get a page back in under
300ms. There are probably 10 database queries involved if you are
logged in.  This page is complex, but far from our most complex.
For example, this page
http://www.bivio.com/demo_club/accounting/investments
sums up all the holdings of a portfolio from the individual
transactions (buys, sells, splits, etc.).  It also comes back in under
300ms.

Sorry if this wasn't the answer you were looking for. :)

Rob




Re: When to cache

2002-01-24 Thread Perrin Harkins

 I'm interested to know what the opinions are of those on this list with
 regards to caching objects during database write operations. I've
encountered
 different views and I'm not really sure what the best approach is.

I described some of my views on this in the article on the eToys design,
which is archived at perl.com.

 Take a typical caching scenario: Data/objects are locally stored upon
loading
 from a database to improve performance for subsequent requests. But when
 those objects change, what's the best method for refreshing the cache?
There
 are two possible approaches (maybe more?):

 1) The old cache entry is overwritten with the new.
 2) The old cache entry is expired, thus forcing a database hit (and
 subsequent cache load) on the next request.

 The first approach would tend to yield better performance. However there's
no
 guarantee the data will ever be read. The cache could end up with a large
 amount of data that's never referenced. The second approach would probably
 allow for a smaller cache by ensuring that data is only cached on reads.

There are actually thousands of variations on caching.  In this case you
seem to be asking about one specific aspect: what to cache.  Another
important question is how to ensure cache consistency.  The approach you
choose depends on frequency of updates, single server vs. cluster, etc.

There's a simple answer for what to cache: as much as you can, until you hit
some kind of limit or performance is good enough.  Sooner or later you will
hit the point where the tradeoff in storage or in time spent ensuring cache
consistency will force you to limit your cache.

People usually use something like a dbm or Cache::Cache to implement
mod_perl caches, since then you get to share the cache between processes.
Storing the cache on disk means your storage is nearly unlimited, so we'll
ignore that aspect for now.  There's a lot of academic research about
deciding what to cache in web proxy servers based on a limited amount of
space which you can look at if you have space limitations.  Lots of stuff on
LRU, LFU, and other popular cache expiration algorithms.

The limit you are more likely to hit is that it will start to take too long
to populate the cache with everything.  Here's an example from eToys:

We used to generate most of the site as static files by grinding through all
the products in the database and running the data through a templating
system.  This is a form of caching, and it gave great performance.  One day
we had to add a large number of products that more than doubled the size of
our database.  The time to generate all of them became prohibitive in that
our content editors wanted updates to happen within a certain number of
hours but it was taking longer than that number of hours to generate all the
static files.

To fix this, we moved to not generating anything until it was requested.  We
would fetch the data the first time it was asked for, and then cache it for
future requests.  (I think this corresponds to your option 2.)  Of course
then you have to decide on a cache consistency approach for keeping that
data fresh.  We used a simple TTL approach because it was fast and easy to
implement (good enough).

This is just scratching the surface of caching.  If you want to learn more,
I would suggest some introductory reading.  You can find lots of general
ideas about caching by searching Google for things like cache consistency.
There are also a couple of good articles on the subject that I've read
recently.  Randal has an article that shows an implementation of what I
usually call lazy reloading:
http://www.linux-mag.com/2001-01/perl_01.html

There's one about cache consistency on O'Reilly's onjava.com, but all the
examples are in Java:
http://www.onjava.com/pub/a/onjava/2002/01/09/dataexp1.html

Also, in reference to Rob Nagler's post, it's obviously better to be in a
position where you don't need to cache to improve performance.  Caching adds
a lot of complexity and causes problems that are hard to explain to
non-technical people.  However, for many of us caching is a necessity for
decent performance.

- Perrin




Re: When to cache

2002-01-24 Thread Rob Nagler

Perrin Harkins writes:
 To fix this, we moved to not generating anything until it was requested.  We
 would fetch the data the first time it was asked for, and then cache it for
 future requests.  (I think this corresponds to your option 2.)  Of course
 then you have to decide on a cache consistency approach for keeping that
 data fresh.  We used a simple TTL approach because it was fast and easy to
 implement (good enough).

I'd be curious to know the cache hit stats.  BTW, this case seems to
be an example of immutable data, which is definitely worth caching if
performance dictates.

 However, for many of us caching is a necessity for decent
 performance.

I agree with latter clause, but take issue with the former.  Typical
sites get a few hits a second at peak times.  If a site isn't
returning typical pages in under a second using mod_perl, it
probably has some type of basic problem imo.

A common problem is a missing database index.  Another is too much
memory allocation, e.g. passing around a large scalar instead of a
reference or overuse of objects (classical Java problem).  It isn't
always the case that you can fix the problem, but caching doesn't fix
it either.  At least understand the performance problem(s) thoroughly
before adding the cache.

Here's a fun example of a design flaw.  It is a performance test sent
to another list.  The author happened to work for one of our
competitors.  :-)


  That may well be the problem. Building giant strings using .= can be
  incredibly slow; Perl has to reallocate and copy the string for each
  append operation. Performance would likely improve in most
  situations if an array were used as a buffer, instead. Push new
  strings onto the array instead of appending them to a string.

#!/usr/bin/perl -w
### Append.bench ###

use Benchmark;

sub R () { 50 }
sub Q () { 100 }
@array = (  x R) x Q;

sub Append {
my $str = ;
map { $str .= $_ } @array;
}

sub Push {
my @temp;
map { push @temp, $_ } @array;
my $str = join , @temp;
}

timethese($ARGV[0],
{ append = \Append,
  push   = \Push });


Such a simple piece of code, yet the conclusion is incorrect.  The
problem is in the use of map instead of foreach for the performance
test iterations.  The result of Append is an array of whose length is
Q and whose elements grow from R to R * Q.  Change the map to a
foreach and you'll see that push/join is much slower than .=.

Return a string reference from Append.  It saves a copy.
If this is the page, you'll see a significant improvement in
performance.

Interestingly, this couldn't be the problem, because the hypothesis
is incorrect.  The incorrect test just validated something that was
faulty to begin with.  This brings up you can't talk about it unless
you can measure it.  Use a profiler on the actual code.  Add
performance stats in your code.  For example, we encapsulate all DBI
accesses and accumulate the time spent in DBI on any request.  We also
track the time we spend processing the entire request.

Adding a cache is piling more code onto a solution.  It sometimes is
like adding lots of salt to bad cooking.  You do it when you have to,
but you end up paying for it later.

Sorry if my post seems pedantic or obvious.  I haven't seen this type
of stuff discussed much in this particular context.  Besides I'm a
contrarian. ;-)

Rob



Re: When to cache

2002-01-24 Thread Perrin Harkins

 Perrin Harkins writes:
  To fix this, we moved to not generating anything until it was requested.
We
  would fetch the data the first time it was asked for, and then cache it
for
  future requests.  (I think this corresponds to your option 2.)  Of
course
  then you have to decide on a cache consistency approach for keeping that
  data fresh.  We used a simple TTL approach because it was fast and easy
to
  implement (good enough).

 I'd be curious to know the cache hit stats.

In this case, there was a high locality of access, so we got about a 99% hit
rate.  Obviously not every cache will be this successful.

 BTW, this case seems to
 be an example of immutable data, which is definitely worth caching if
 performance dictates.

It wasn't immutable, but it was data that we could allow to be out of sync
for a certain amount of time that was dictated by the business requirements.
When you dig into it, most sites have a lot of data that can be out of sync
for some period.

 I agree with latter clause, but take issue with the former.  Typical
 sites get a few hits a second at peak times.  If a site isn't
 returning typical pages in under a second using mod_perl, it
 probably has some type of basic problem imo.

Some sites have complex requirements.  eToys may have been an anomaly
because of the amount of traffic, but the thing that forced us to cache was
database performance.  Tuning the perl stuff was not very hard, and it was
all pretty fast to begin with.  Tuning the database access hit a wall when
our DBAs had gone over the queries, indexes had been adjusted, and some
things were still slow.  The nature of the site design (lots of related data
on a single page) required many database calls and some of them were fairly
heavy SQL.  Some people would say to denormalize the database at that point,
but that's really just another form of caching.

 Use a profiler on the actual code.

Agreed.

 Add
 performance stats in your code.  For example, we encapsulate all DBI
 accesses and accumulate the time spent in DBI on any request.

No need to do that yourself.  Just use DBIx::Profile to find the hairy
queries.

 Adding a cache is piling more code onto a solution.  It sometimes is
 like adding lots of salt to bad cooking.  You do it when you have to,
 but you end up paying for it later.

It may seem like the wrong direction to add code in order to make things go
faster, but you have to consider the relative speeds: Perl code is really
fast, databases are often slower than we want them to be.

Ironically, I am quoted in Philip Greenspun's book on web publishing saying
just what you are saying: that databases should be fast enough without
middle-tier caching.  Sadly, sometimes they just aren't.

- Perrin




Re: When to cache

2002-01-24 Thread Rob Nagler

 When you dig into it, most sites have a lot of data that can be out of sync
 for some period.

Agreed. We run an accounting application which just happens to be
delivered via the web.  This definitely colors (distorts?) my view.

 heavy SQL.  Some people would say to denormalize the database at that point,
 but that's really just another form of caching.

Absolutely.  Denormalization is the root of all evil. ;-)

 No need to do that yourself.  Just use DBIx::Profile to find the hairy
 queries.

History.  Also, another good trick is to make sure your select
statements are as similar as possible.  It is often better to bundle a
couple of similar queries into a single one.  The query compiler
caches queries.

 Ironically, I am quoted in Philip Greenspun's book on web publishing saying
 just what you are saying: that databases should be fast enough without
 middle-tier caching.  Sadly, sometimes they just aren't.

Every system design decision often has an equally valid converse.
The art is knowing when to buy and when to sell.  And Greenspun's book
is a great resource btw.

Rob



performance coding project? (was: Re: When to cache)

2002-01-24 Thread Stas Bekman

Rob Nagler wrote:

 Perrin Harkins writes:

 Here's a fun example of a design flaw.  It is a performance test sent
 to another list.  The author happened to work for one of our
 competitors.  :-)
 
 
   That may well be the problem. Building giant strings using .= can be
   incredibly slow; Perl has to reallocate and copy the string for each
   append operation. Performance would likely improve in most
   situations if an array were used as a buffer, instead. Push new
   strings onto the array instead of appending them to a string.
 
 #!/usr/bin/perl -w
 ### Append.bench ###
 
 use Benchmark;
 
 sub R () { 50 }
 sub Q () { 100 }
 @array = (  x R) x Q;
 
 sub Append {
 my $str = ;
 map { $str .= $_ } @array;
 }
 
 sub Push {
 my @temp;
 map { push @temp, $_ } @array;
 my $str = join , @temp;
 }
 
 timethese($ARGV[0],
 { append = \Append,
   push   = \Push });
 
 
 Such a simple piece of code, yet the conclusion is incorrect.  The
 problem is in the use of map instead of foreach for the performance
 test iterations.  The result of Append is an array of whose length is
 Q and whose elements grow from R to R * Q.  Change the map to a
 foreach and you'll see that push/join is much slower than .=.
 
 Return a string reference from Append.  It saves a copy.
 If this is the page, you'll see a significant improvement in
 performance.
 
 Interestingly, this couldn't be the problem, because the hypothesis
 is incorrect.  The incorrect test just validated something that was
 faulty to begin with.  This brings up you can't talk about it unless
 you can measure it.  Use a profiler on the actual code.  Add
 performance stats in your code.  For example, we encapsulate all DBI
 accesses and accumulate the time spent in DBI on any request.  We also
 track the time we spend processing the entire request.

While we are at this topic, I want to suggest a new project. I was 
planning to start working on it long time ago, but other things always 
took over.

The perl.apache.org/guide/performance.html and a whole bunch of 
performance chaptes in the upcoming modperl book have a lot of 
benchmarks, comparing various coding techniques. Such as the example 
you've provided. The benchmarks are doing both pure Perl and mod_perl 
specific code (which requires running Apache, a perfect job for the new 
Apache::Test framework.)

Now throw in the various techniques from 'Effective Perl' book and voila 
you have a great project to learn from.

Also remember that on varous platforms and various Perl versions the 
benchmark results will differ, sometimes very significantly.

I even have a name for the project: Speedy Code Habits  :)

The point is that I want to develop a coding style which tries hard to 
do early premature optimizations. Let me give you an example of what I 
mean. Tell me what's faster:

if (ref $b eq 'ARRAY'){
$a = 1;
}
elsif (ref $b eq 'HASH'){
$a = 1;
}

or:

my $ref = ref $b;
if ($ref eq 'ARRAY'){
$a = 1;
}
elsif ($ref eq 'HASH'){
$a = 1;
}

Sure, the win can be very little, but it ads up as your code base's size 
grows.

Give you a similar example:

if ($a-lookup eq 'ARRAY'){
$a = 1;
}
elsif ($a-lookup eq 'HASH'){
$a = 1;
}

or

my $lookup = $a-lookup;
if ($lookup eq 'ARRAY'){
$a = 1;
}
elsif ($lookup eq 'HASH'){
$a = 1;
}

now throw in sub attributes and re-run the test again.

add examples of map vs for.
add examples of method lookup vs. procedures
add examples of concat vs. list vs. other stuff from the guide.

mod_perl specific examples from the guide/book ($r-args vs 
Apache::Request::param, etc)

If you understand where I try to take you, help me to pull this project 
off and I think in a long run we can benefit a lot.

This goes along with the Apache::Benchmark project I think (which is yet 
another thing I want to start...), probably could have these two ideas 
put together.

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




When to cache

2002-01-23 Thread Milo Hyson

I'm interested to know what the opinions are of those on this list with 
regards to caching objects during database write operations. I've encountered 
different views and I'm not really sure what the best approach is.

Take a typical caching scenario: Data/objects are locally stored upon loading 
from a database to improve performance for subsequent requests. But when 
those objects change, what's the best method for refreshing the cache? There 
are two possible approaches (maybe more?):

1) The old cache entry is overwritten with the new.
2) The old cache entry is expired, thus forcing a database hit (and 
subsequent cache load) on the next request.

The first approach would tend to yield better performance. However there's no 
guarantee the data will ever be read. The cache could end up with a large 
amount of data that's never referenced. The second approach would probably 
allow for a smaller cache by ensuring that data is only cached on reads.

In the end, this probably boils down to application requirements. RAM and 
disk storage is so cheap these days that the first method is probably fine 
for most purposes. However I'm sure there are situations where resources are 
limited and the second is more effective. What does everyone think?

-- 
Milo Hyson
CyberLife Labs, LLC