RE: "Quality Percentage" of hashes as reported by Devel::Peek

Jan Dubois Mon, 03 Dec 2007 11:51:42 -0800

You are misreading the documentation.  n + N(n-1)/2k is the formula for 
expected comparisons for a random hash where each bucket is
filled to the same level.  The total number of comparisons for the actual hash 
is equal to the sum of the squares of the number of
entries in each bucket.


 

>From looking at the code, I wonder if the numbers are correct though, as the 
>ratio seems to be reversed and therefore should be less
than 100%.  I don't have time to investigate this though.  The actual code 
lives in dump.c in the core Perl distribution:

 

        if (HvARRAY(sv) && HvKEYS(sv)) {

            /* Show distribution of HEs in the ARRAY */

            int freq[200];

#define FREQ_MAX ((int)(sizeof freq / sizeof freq[0] - 1))

            int i;

            int max = 0;

            U32 pow2 = 2, keys = HvKEYS(sv);

            NV theoret, sum = 0;

 

            PerlIO_printf(file, "  (");

            Zero(freq, FREQ_MAX + 1, int);

            for (i = 0; (STRLEN)i <= HvMAX(sv); i++) {

                HE* h;

                int count = 0;

                for (h = HvARRAY(sv)[i]; h; h = HeNEXT(h))

                    count++;

                if (count > FREQ_MAX)

                    count = FREQ_MAX;

                freq[count]++;

                if (max < count)

                    max = count;

            }

            for (i = 0; i <= max; i++) {

                if (freq[i]) {

                    PerlIO_printf(file, "%d%s:%d", i,

                                  (i == FREQ_MAX) ? "+" : "",

                                  freq[i]);

                    if (i != max)

                        PerlIO_printf(file, ", ");

                }

            }

            PerlIO_putc(file, ')');

            /* The "quality" of a hash is defined as the total number of

               comparisons needed to access every element once, relative

               to the expected number needed for a random hash.

 

               The total number of comparisons is equal to the sum of

               the squares of the number of entries in each bucket.

               For a random hash of n keys into k buckets, the expected

               value is

                                n + n(n-1)/2k

            */

 

            for (i = max; i > 0; i--) { /* Precision: count down. */

                sum += freq[i] * i * i;

            }

            while ((keys = keys >> 1))

                pow2 = pow2 << 1;

            theoret = HvKEYS(sv);

            theoret += theoret * (theoret-1)/pow2;

            PerlIO_putc(file, '\n');

            Perl_dump_indent(aTHX_ level, file, "  hash quality = %.1"NVff"%%", 
theoret/sum*100);

        }

 

Cheers,

-Jan

 

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: December 3, 2007 10:28 AM
To: [email protected]
Subject: "Quality Percentage" of hashes as reported by Devel::Peek

 


Gurus, 

I tried directing this query to the module's author, but the e-mail address as 
listed in the module's documentation is no longer
valid--not surprising, after nearly a decade, I guess.  Anyhoo, I thought I'd 
ask if any of you folks might know the answer... 

In the documentation for this module's Dump subroutine, a hash's "quality" is 
described as a ratio of "total" comparisons versus
"expected" comparisons. There is a formula given for total comparisons, n + 
n(n-1)/2k, but no formula is given to determine expected
comparisons. The question is then, how can a user determine mathematically what 
an expected quality percentage might be? 

I know that Dump outputs the percentage, I'd just like to be able to calculate 
it predictively, instead of fiddling keys(%hash) = X
assignments and TIAS-ing for differing values of 'X.' 

Is there a formula for expected comparisons available? 

Thanks, 

Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150

A word to the wise ain't necessary, it's the stupid ones who need the advice. 
-- Bill Cosby

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: "Quality Percentage" of hashes as reported by Devel::Peek

Reply via email to