Wizards,
Those who've answered my previous questions about hashes have been most helpful. I especially appreciated John W. Kennedy's excellent card catalogue cabinet analogy.
I have one last (I hope) question on the topic:I understand from previous answers that not all data sets will hash "politely." Some will have a very nice one-item-to-one-bucket outcome, while others can have several, even many, items in just a few buckets. And that the worst case could be "10,000 items all in one bucket." Is there a way of determining how well my data goes into its hash? As an example of my question:
$h{$_} = $_ for 'a'..'zzz';
print scalar( %h ), "\n";
prints out
14056/32768
This says that the 18,278 values generated used only 14,056 of the reserved 32,768 buckets. I understand that permutations would hash to the same key (er, the same bucket?), for example "cab" and "abc" (although I suspect this would depend on the hashing algorithm, wouldn't it?). My question here is simply if there's a way to see how "well-behaved" my data set is. Some of my scalars come out to:
59/128 -- for 69 values
78/128 -- for 120 values
In fact, none of them come out "okay" -- that is, x buckets for x values.
I guess a second question might be whether the answer to the first is even meaningful--at least insofar as this has any effect on performance. If it doesn't, I suppose the question's almost pointless, except as a point of knowledge.
Thanks,
Deane
_______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
