Wizards,

Those who've answered my previous questions about hashes have been most helpful. I especially appreciated John W. Kennedy's excellent card catalogue cabinet analogy.

I have one last (I hope) question on the topic:I understand from previous answers that not all data sets will hash "politely." Some will have a very nice one-item-to-one-bucket outcome, while others can have several, even many, items in just a few buckets. And that the worst case could be "10,000 items all in one bucket."   Is there a way of determining how well my data goes into its hash? As an example of my question:

        $h{$_} = $_ for 'a'..'zzz';
        print scalar( %h ), "\n";

prints out

        14056/32768

This says that the 18,278 values generated used only 14,056 of the reserved 32,768 buckets. I understand that permutations would hash to the same key (er, the same bucket?), for example "cab" and "abc" (although I suspect this would depend on the hashing algorithm, wouldn't it?). My question here is simply if there's a way to see how "well-behaved" my data set is. Some of my scalars come out to:

        59/128 -- for 69 values
        78/128 -- for 120 values

In fact, none of them come out "okay" -- that is, x buckets for x values.

I guess a second question might be whether the answer to the first is even meaningful--at least insofar as this has any effect on performance. If it doesn't, I suppose the question's almost pointless, except as a point of knowledge.

Thanks,

Deane
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to