[OT] benchmarking "typical" programs

2012-09-19 Thread Nicholas Clark
Sorry to ask an off topic question...

It's very easy to write a benchmark for a particular thing.
It's fairly easy to get such a beast to show that a particular change will
speed that benchmark up.

But

Such benchmarks typically don't actually represent realistic code. They
usually don't throw enough data around or create enough objects to start
to stress the memory subsystems. And they don't do enough different things
to thrash any CPU instruction cache. So It's much harder to show whether a
particular change slows everything else down meaningfully enough to not be
worth it.

So

Does the mighty hive mind of london.pm have any suggestions (preferably
useful) of what to use for benchmarking "typical" Perl programs?

Needs to do realistic things on a big enough scale to stress a typical system.
Needs to avoid external library dependencies, or particular system specifics.
Preferably needs to avoid being too Perl version specific.
Preferably needs to avoid being a maintenance headache itself.
With a pony too, if possible. :-)

Nicholas Clark

PS Ilmari, lunch!


Last chance to see...

2012-09-19 Thread Damian Conway
Dear fellow Mongers,

After more than a decade of only giving private classes in the U.K.,
I've been very fortunate to have had the opportunity to offer
public classes in London not once, but twice, this year.

However I'm sorry to say that it now looks as though the second set of
public classes we're running next month:

http://blogs.perl.org/users/damian_conway/2012/08/back-in-london.html

will also be the last I'm able to offer in the U.K. for the foreseeable future.

I just wanted to let folks know that, in case anyone had been contemplating
coming along, but had decided to postpone attending until "next time".
At present, "next time" is likely to be at least five away...perhaps longer. :-(

Meanwhile, I'm delighted that some of you *have* already signed up for the
October 11 and 12 classes and I'm very much looking forward to seeing
you there.

Damian


Re: [OT] benchmarking "typical" programs

2012-09-19 Thread Rafiq Gemmail

On 19 Sep 2012, at 12:09, Nicholas Clark wrote:

> Needs to do realistic things on a big enough scale to stress a typical system.
> Needs to avoid external library dependencies, or particular system specifics.
> Preferably needs to avoid being too Perl version specific.
> Preferably needs to avoid being a maintenance headache itself.

On a previous web-servicey $work project, I had some positive experiences in 
using Splunk (spunk.com) to extract real malperformant/norma sample data to 
throw at benchmarking, profiling and load-testing code. :-) (spunk.com)  Splunk 
also let me use real live behaviour as a gauge in itself and deeply analyse the 
performance of classes of production request over time.  It also has a query 
api on cpan.

With regard to the 'controlled tests,' benchmarking used the benchmark module 
(and jMeter), the profiling was NYTProf and the load testing was jMeter 
(+ab,whatever).  The tools did not matter so much as the sample data and the 
fact that I was able to compare runs against a fairly consistent architecture, 
datasets and execution paths (a function of the test data).  

Splunk is a really great tool for mining, joining and performing statistical 
time series analysis on sloppy structured data (different daemon logs/network 
activity/custom instrumented output)- it's not free, however the free license 
gives you a reasonable daily volume of analytics to play with.  You can 
probably knock up your own scripts to extract sample input data, but I found it 
quite painless and powerful (also great visualisation tools).  With respect to 
the data, my criteria was that I picked the worst outliers (in terms of 
response time/occurrence) and increased their incidence, mixed in with sample 
requests spread around the mean response time and various other 'known 
requests.'   

This would let me hammer the application with likely runtime scenarios taken 
from real user-behaviour (rather than totally fabricated and contrived 
imaginings used to just exercise the code you're thinking of).  "Real" runtime 
scenarios would obviously differ due to caching, load balancing and all the 
usual production gaff, however it did give me metrics to _compare between 
releases_, such that one could see if certain/all classes of queries had 
improved or degraded in these control conditions.

As stated, in addition to this I was also able to use Splunk to watch trends in 
different classes of request over time.  This allowed one to proactively home 
in and investigate areas which had degraded in performance or always been 
generally rubbish.  One could even fire alerts based on these analytics and 
warn you during exceptionally ill-performing periods.

This is just my experience and "not a hammer to death and load test all all 
potential execution paths," but I think that is not as important as insuring 
you haven't regressed in your ability to cover likely runtime scenarios.

Not sure if that helps.

Splunk Fanboy