On Wed, Oct 09, 2002 at 10:44:02PM +0000, Nicholas Clark wrote:
> On Tue, Oct 08, 2002 at 05:49:33PM +0100, Simon Wistow wrote:
> 
> > Some examples. You want to remove the duplicates from a list. I always
> > use :
> > 
> > @nodups = keys %{{ map { $_ => 1 } @dups }};
> > 
> >  
> > I didn't always but once I worked it out by myself (or was shown it or
> > however I learnt it) it was useful to have in my head.
> 
> The one I always use (modulo bugs) is something close to
> 
> my %hash;
> @nodups = grep { !$hash{$_}++ } @dups
> 
> admittedly this uses a temporary hash, so it's not as elegant as yours.
> 
> 
> However, thinking about how OPS ARE BAD, I believe the optimal answer is
> 
> my %hash;
> @hash{@dups} = @dups;
> @nopdups = values %hash;

And the 3 laws of optimising are Benchmark, Benchmark, Benchmark

[and we all know what comes after lies and damn lies, so I think that gives
an idea of how reliable this whole game is]

> I believe I understand what I'm about to do, and that my analysis is correct:

With more tea, I now disagree with myself. (I thought last night "I should
have benchmarked this" but still hit send)

> Woohoo! No stinking loops.

But, given data that is highly repetitive (all the "words" in the 5.8
changes file - 546016 works, 33748 unique)

Benchmark: timing 10 iterations of Cookbook, Nick, Simon...
  Cookbook: 12 wallclock secs (12.37 usr +  0.01 sys = 12.38 CPU) @  0.81/s (n=10)
      Nick: 24 wallclock secs (23.65 usr +  0.01 sys = 23.66 CPU) @  0.42/s (n=10)
     Simon: 31 wallclock secs (31.37 usr +  0.03 sys = 31.40 CPU) @  0.32/s (n=10)
         s/iter    Simon     Nick Cookbook
Simon      3.14       --     -25%     -61%
Nick       2.37      33%       --     -48%
Cookbook   1.24     154%      91%       --


Cookbook wins.

OK, so it's probably using less RAM than mine. Lets try something with less
repeated "words" - all the lines in the current AUTHORS file:

Benchmark: timing 1000 iterations of Cookbook, Nick, Simon...
  Cookbook:  5 wallclock secs ( 4.69 usr +  0.00 sys =  4.69 CPU) @ 213.22/s (n=1000)
      Nick:  5 wallclock secs ( 5.35 usr +  0.00 sys =  5.35 CPU) @ 186.92/s (n=1000)
     Simon:  8 wallclock secs ( 7.61 usr +  0.01 sys =  7.62 CPU) @ 131.23/s (n=1000)
          Rate    Simon     Nick Cookbook
Simon    131/s       --     -30%     -38%
Nick     187/s      42%       --     -12%
Cookbook 213/s      62%      14%       --


Oh. Cookbook still wins. 

I'll buy the cookbook.

Wait - I already have

Nicholas Clark

PS I wonder why - possibly time to investigate what the hash slice ops are
    actually doing.
PPS I'm sure Dave will argue that there's nothing wrong with having two or
    more copies of the same book (preferably his http://www.manning.com/cross/
    1 for home, 1 for work, 1 for the reading on the toilet... Makes the
    perfect gift for all your friends, family, hamsters...)


Reply via email to