Re: Hash Performance

Steven Lembark Thu, 15 Dec 2005 09:43:20 -0800


-- "Haufler, Wayne A" <[EMAIL PROTECTED]>


> One of my roles in a Tools Development group at Boeing is to help
> maintain existing Perl programs and to champion the use of Perl in
> future development.
> 
> One of the developers here has the impression /opinion that Perl Hashes
> are slow.  That was because of the poor performance of the GUI menu of a
> particular Perl tool, but I believe the poor performance is due to other
> factors.  May be able to prove that later.
> 
> I have browsed through my extensive collection of O'Reilly Perl books,
> and websites, but can't seem to find anything to support my contention
> that hashes are very efficient, almost as much as arrays, and shouldn't
> be avoided.
> 
> Am I right?  Can anyone point me to supporting material?
> (I couldn't find any perl-performance related mailing list)
> 
> I hope to include that in a Perl class I will be teaching to my group
> early next year.

"Slow" comapred to what?

You can make anything slow enough with an inappropriate
algorithem.

Hash access in most cases is slower than array for
esequential access; though for random access or very
long, sparse lists hashes can be faster.

If you want to walk down a single list then a for loop
on an array such as:

    for( @ary )
    {
        frobnicate $_;
    }

is going to be hugely faster than getting a list of 
keys and using a hash access for each value with 
something like:

    for( keys %hash )
    {
        frobnicate $hash{ $_ }
    }

On the other hand, if the array has fifty entries ranging 
from 1 to 1_000_000_000_000_000 then you are obviously 
going to get better results with a hash -- if you could even
fit the array into virtual memory.

All this is mitigated by the fact that most modern 
(say post 1990) computers are fast enough that array
or hash access outside of large, tight loops takes a 
miniscule amount of time: saving one half of 15 ms 
will not affect your users' appreciation for a menuing
system.

Hashes are also faster for random access. Given the 
choce of:

    my ( $item ) = grep { $_ eq $value } @array;

or

    my $item = $hash{ $value }

which one would you pick? 


So, in the end it depends on what you are trying to do:

- To keep order use an array.
- To keep an association use a hash.
- For purely sequential access probably use an array.
- For random access hashes become more likely.
- For sparse data hashes are also helpful, especially if
  the keyspace is not numeric or the number of items is
  moderatly large (grep on a flat list gets expensive
  after a while).
- Hash slices are usually faster than array selections.
- Hash joins are nearly always faster than equivalent 
  grep/push.

One thing you are nearly always doing with the code is
maintaining it: using mnemonic keys for values can save
a huge amount of programming time. You can tweak the 
performance all you like but if the code breaks or takes
too long to write then it has also failed. 

Someone will probably point out a hole somewhere in this,
but the point is that hashes serve a rather useful purpose
and the convolutions you go through to avoid them where they
are usful will probably hurt your performance and definately
screw up your development.

-- 
Steven Lembark                                       85-09 90th Street
Workhorse Computing                                Woodhaven, NY 11421
[EMAIL PROTECTED]                                     1 888 359 3508

Re: Hash Performance

Reply via email to