From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: 28 February 2006 15:47
To: [email protected]
Subject: How do Perl hashes work?
> Wizards,
>
> Those who've answered my previous questions about hashes have been
most helpful. I especially appreciated John > W. Kennedy's excellent
card catalogue cabinet analogy.
> I have one last (I hope) question on the topic:I understand from
previous answers that not all data sets will
> hash "politely." Some will have a very nice one-item-to-one-bucket
outcome, while others can have several, even > many, items in just a few
buckets. And that the worst case could be "10,000 items all in one
bucket." Is
> there a way of determining how well my data goes into its hash? As an
example of my question:
>
> $h{$_} = $_ for 'a'..'zzz';
> print scalar( %h ), "\n";
>
> prints out
>
> 14056/32768
>
> This says that the 18,278 values generated used only 14,056 of the
reserved 32,768 buckets. I understand that
> permutations would hash to the same key (er, the same bucket?), for
example "cab" and "abc" (although I suspect > this would depend on the
hashing algorithm, wouldn't it?). My question here is simply if there's
a way to see
> how "well-behaved" my data set is. Some of my scalars come out to:
>
> 59/128 -- for 69 values
> 78/128 -- for 120 values
>
> In fact, none of them come out "okay" -- that is, x buckets for x
values.
I would be very surprised if any came out "okay". Perl's hashing
algorithm has been refined over many years, and has a good reputation
(AIUI), but I wouldn't expect that sort of performance. What you are
talking about, where each key maps to a unique bucket, is generally
called a "perfect hash". This can normally only be done when you know
the set of keys in advance and can generate the hash function from them,
and is the sort of thing you might get into for your doctoral thesis.
See gperf for an example of a perfect hash generator. A further
refinement is a "minimal perfect hash", which is where the number of
keys equals the number of buckets.
>
> I guess a second question might be whether the answer to the first is
even meaningful--at least insofar as this > has any effect on
performance. If it doesn't, I suppose the question's almost pointless,
except as a point of
> knowledge.
Unless you have a performance problem that you can identify as hash
related, I wouldn't worry about it. If you want to find out about
hashes, I would suggest using a $search_engine or a good computer
science text. If you want the low down on how Perl hashes work, there is
always the source code.
HTH
--
Brian Raven
=================================
Atos Euronext Market Solutions Disclaimer
=================================
The information contained in this e-mail is confidential and solely for the
intended addressee(s). Unauthorised reproduction, disclosure, modification,
and/or distribution of this email may be unlawful.
If you have received this email in error, please notify the sender immediately
and delete it from your system. The views expressed in this message do not
necessarily reflect those of Atos Euronext Market Solutions.
L'information contenue dans cet e-mail est confidentielle et uniquement
destinee a la (aux) personnes a laquelle (auxquelle(s)) elle est adressee.
Toute copie, publication ou diffusion de cet email est interdite. Si cet e-mail
vous parvient par erreur, nous vous prions de bien vouloir prevenir
l'expediteur immediatement et d'effacer le e-mail et annexes jointes de votre
systeme. Le contenu de ce message electronique ne represente pas necessairement
la position ou le point de vue d'Atos Euronext Market Solutions.
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs