You are misreading the documentation. n + N(n-1)/2k is the formula for
expected comparisons for a random hash where each bucket is
filled to the same level. The total number of comparisons for the actual hash
is equal to the sum of the squares of the number of
entries in each bucket.
>From looking at the code, I wonder if the numbers are correct though, as the
>ratio seems to be reversed and therefore should be less
than 100%. I don't have time to investigate this though. The actual code
lives in dump.c in the core Perl distribution:
if (HvARRAY(sv) && HvKEYS(sv)) {
/* Show distribution of HEs in the ARRAY */
int freq[200];
#define FREQ_MAX ((int)(sizeof freq / sizeof freq[0] - 1))
int i;
int max = 0;
U32 pow2 = 2, keys = HvKEYS(sv);
NV theoret, sum = 0;
PerlIO_printf(file, " (");
Zero(freq, FREQ_MAX + 1, int);
for (i = 0; (STRLEN)i <= HvMAX(sv); i++) {
HE* h;
int count = 0;
for (h = HvARRAY(sv)[i]; h; h = HeNEXT(h))
count++;
if (count > FREQ_MAX)
count = FREQ_MAX;
freq[count]++;
if (max < count)
max = count;
}
for (i = 0; i <= max; i++) {
if (freq[i]) {
PerlIO_printf(file, "%d%s:%d", i,
(i == FREQ_MAX) ? "+" : "",
freq[i]);
if (i != max)
PerlIO_printf(file, ", ");
}
}
PerlIO_putc(file, ')');
/* The "quality" of a hash is defined as the total number of
comparisons needed to access every element once, relative
to the expected number needed for a random hash.
The total number of comparisons is equal to the sum of
the squares of the number of entries in each bucket.
For a random hash of n keys into k buckets, the expected
value is
n + n(n-1)/2k
*/
for (i = max; i > 0; i--) { /* Precision: count down. */
sum += freq[i] * i * i;
}
while ((keys = keys >> 1))
pow2 = pow2 << 1;
theoret = HvKEYS(sv);
theoret += theoret * (theoret-1)/pow2;
PerlIO_putc(file, '\n');
Perl_dump_indent(aTHX_ level, file, " hash quality = %.1"NVff"%%",
theoret/sum*100);
}
Cheers,
-Jan
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: December 3, 2007 10:28 AM
To: [email protected]
Subject: "Quality Percentage" of hashes as reported by Devel::Peek
Gurus,
I tried directing this query to the module's author, but the e-mail address as
listed in the module's documentation is no longer
valid--not surprising, after nearly a decade, I guess. Anyhoo, I thought I'd
ask if any of you folks might know the answer...
In the documentation for this module's Dump subroutine, a hash's "quality" is
described as a ratio of "total" comparisons versus
"expected" comparisons. There is a formula given for total comparisons, n +
n(n-1)/2k, but no formula is given to determine expected
comparisons. The question is then, how can a user determine mathematically what
an expected quality percentage might be?
I know that Dump outputs the percentage, I'd just like to be able to calculate
it predictively, instead of fiddling keys(%hash) = X
assignments and TIAS-ing for differing values of 'X.'
Is there a formula for expected comparisons available?
Thanks,
Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150
A word to the wise ain't necessary, it's the stupid ones who need the advice.
-- Bill Cosby
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs