<sighs> :: <sighs> <sigh> | <sigh>

The tests were re-run with a selected list of candidate entries. Timing for the 
candidates was compared (used) for both the existing and the new algorithms. 
The test computer was my home AMD 2100+ instead of my work Pentium Really Fast 
CPU. This accounts for some of the time differences, but, as we shall see, not 
all of them. The test harness for the first four tests (Amortized, Minimum, 
Maximum, and Random) is unchanged. The test harness for the residual tests are 
to extract the name from the Delta_Time array, stuff it into a loop invariant 
variable, and use it to call the functions. Overhead timing for testing the new 
and used algorithms are the same and no special account is made for it (the 
timing).

With the following conclusions:
1. In all cases the new algorithm is faster. This differs
   from the results at work in which the Minimum test showed
   that the existing algorithm was faster.
2. As expected, there is a gradual increase in test time for
   items with the same first letter. But there is a time
   reversal between COLOR_ACTIVEBORDER and COLOR_ACTIVECAPTION,
   and between almost anygint and DS_3DLOOK, and etc. Anomolies
   which need explanation.
3. The test for the putative minimum for the Minimum Test
   (BS_3STATE) is some 10 times slower in the new test cycle
   than the Minimum Test. I can't account for the anomoly.

The anomolous behavior may be a factor in the optimization used. I might try 
this at work to see if the results are consistent, or hey, you might try it.

Compilation was done with: gcc -O3 main.cpp GUI_Constants-new.cpp 
GUI_Constants.cpp
Using GCC Version:         gcc (GCC) 3.4.4 (cygming special) (gdc 0.12, using 
dmd 0.125)


    Test Name            count    new      old
Amortized Cost          <791998 0.219000 0.594000>
Minimum Cost            <791998 0.031000 0.047000>
Maximum Cost            <791998 0.172000 0.578000>
Random  Cost            <791998 0.250000 0.594000>
BS_3STATE               <791998 0.187000 0.484000>
BS_AUTO3STATE           <791998 0.219000 0.563000>
BS_AUTOCHECKBOX         <791998 0.203000 0.531000>
BS_AUTORADIOBUTTON      <791998 0.203000 0.547000>
BS_CHECKBOX             <791998 0.203000 0.547000>
COLOR_3DFACE            <791998 0.204000 0.422000>
COLOR_ACTIVEBORDER      <791998 0.250000 0.484000>
COLOR_ACTIVECAPTION     <791998 0.250000 0.125000>
COLOR_APPWORKSPACE      <791998 0.328000 0.469000>
DS_3DLOOK               <791998 0.156000 0.328000>

A short note on hashing.

Hashing can be divided into the following (rough) categories:
1. Closed hashing algorithms. Collisions cause a rehash
   into the same table.
   Statistics: Asymptotic behavior as the table is > 70% full.
2. Open hashing algorithms. Collisions are organized into
   a linear list with a linear search to resolve the collision.
   Statistics: Approximately O(1) for all entries.
3. Perfect hashing table. No collisions.
4. Minimal hashing table. Table size is the same as the input
   data size (see 1.)
5. Minimal Perfect hashing table. No collisions and the table
   size is the same as the data size.

It is always possible to get a Perfect hash table. The limitation is the amount 
of space needed. For example, take a telephone book for your favorite city. Use 
the full first name, full last name, full middle name, and a characteristic 
which differentiates people with the same name. Now construct a table of size 
(using English capital letters):
  Table Size = 26**(|last name| + |first name| + |middle name|) * | unique 
characteristic |

(Skipping the discussion of what all this stuff might mean) This is a really 
big table. But it is a Perfect hash table.

A Minimal hashing table just has enough entries to contain the sample data. As 
Glenn has properly pointed out, this works if the data is static. A change in 
the input data requires a change to the table (potentially).

A Minimal Perfect hashing algorithm is hard (same caveat about data being 
static). The general techniques that I have seen to construct this table are 
basically branch and bound, although I do believe that I have a non-iterative, 
complete, and perfect solution which I worked on for my Master's - I just have 
to complete the work based on 'new' knowledge.

The idea of a Minimal Perfect hashing algorithm was put into print (I believe 
firstly but I don't know for sure) by D. Knuth in Vol II of his books on the 
Art of Computer Progamming (Seminumerical Algorithms). He is most famously 
known for his statement that 1/2 day should be sufficient to construct a 
Minimal Perfect hashing algorithm.

I defer to your judgement in naming your algorithm based on historical 
precendents. My own experience is in hashing algorithms of the form:

    Key = HashFunction(mangled input);

art

=====================================================================

MAIN.CPP

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "test.h"

typedef long int DWORD;

# define NOTXSPROC
# define SIZE  sizeof(aWinOptions)/sizeof(aWinOptions[0]) - 2
# define REP   2000

static int test_rand[REP * SIZE];

extern DWORD new_constant(NOTXSPROC char *name, int arg);
extern DWORD     constant(NOTXSPROC char *name, int arg);

int main ()
{ 

    struct test_time {
        char * test_name;
        int    count;
        double new_constant;
        double old_constant;
    };
    
    enum Test { Amortized, Minimum, Maximum, Random };
    
    clock_t begin;

    int ndx;
    int rep;
    int arg;
    int ret;
    

//                               Test Name             count new  old
    test_time Delta_Time[] = { {"Amortized Cost     ",   0,  0.0, 0.0}
                             , {"Minimum Cost       ",   0,  0.0, 0.0}
                             , {"Maximum Cost       ",   0,  0.0, 0.0}
                             , {"Random  Cost       ",   0,  0.0, 0.0}
                             , {"BS_3STATE          ",   0,  0.0, 0.0}
                             , {"BS_AUTO3STATE      ",   0,  0.0, 0.0}
                             , {"BS_AUTOCHECKBOX    ",   0,  0.0, 0.0}
                             , {"BS_AUTORADIOBUTTON ",   0,  0.0, 0.0}
                             , {"BS_CHECKBOX        ",   0,  0.0, 0.0}
                             , {"COLOR_3DFACE       ",   0,  0.0, 0.0}
                             , {"COLOR_ACTIVEBORDER ",   0,  0.0, 0.0}
                             , {"COLOR_ACTIVECAPTION",   0,  0.0, 0.0}
                             , {"COLOR_APPWORKSPACE ",   0,  0.0, 0.0}
                             , {"DS_3DLOOK          ",   0,  0.0, 0.0}
                             };

   //---------------------------
   //     Amortized Test Time
   //---------------------------
    Delta_Time[Amortized].count = REP * SIZE;
    
    begin = clock();
    
    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = constant(aWinOptions[ndx].sOption, arg)) != 
aWinOptions[ndx].eOption ) 
            printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret);
       }
    }

    Delta_Time[Amortized].old_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    
    begin = clock();

    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = new_constant(aWinOptions[ndx].sOption, arg)) != 
aWinOptions[ndx].eOption ) 
            printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret);
       }
    }

    Delta_Time[Amortized].new_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;

   //---------------------------
   //     Miniumum Test Time
   //---------------------------
    
    Delta_Time[Minimum].count = REP * SIZE;
    
    begin = clock();
    
    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = constant(aWinOptions[0].sOption, arg)) != 
aWinOptions[0].eOption ) 
            printf("ERROR: %s[%i] != %i\n", aWinOptions[0].sOption, 0, ret);
       }
    }

    Delta_Time[Minimum].old_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    
    begin = clock();
    
    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = new_constant(aWinOptions[197].sOption, arg)) != 
aWinOptions[197].eOption )
            printf("ERROR: %s[%i] != %i\n", aWinOptions[197].sOption, 197, ret);
       }
    }

    Delta_Time[Minimum].new_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;

   //---------------------------
   //     Maximum Test Time
   //---------------------------
    
    Delta_Time[Maximum].count = REP * SIZE;
    
    begin = clock();
    
    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = constant(aWinOptions[eWS_EX_WINDOWEDGE].sOption, arg)) != 
aWinOptions[eWS_EX_WINDOWEDGE].eOption ) 
            printf("ERROR: %s[%i] != %i\n", 
aWinOptions[eWS_EX_WINDOWEDGE].sOption, eWS_EX_WINDOWEDGE, ret);
       }
    }

    Delta_Time[Maximum].old_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    
    begin = clock();
    
    
    for ( rep = REP; rep; rep-- ) {
       ndx = SIZE;
       
       while ( ndx-- ) {
          if ( (ret = new_constant(aWinOptions[0].sOption, arg)) != 
aWinOptions[0].eOption )
            printf("ERROR: %s[%i] != %i\n", aWinOptions[0].sOption, 0, ret);
       }
    }

    Delta_Time[Maximum].new_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;

   //---------------------------
   //     Random Test Time
   //---------------------------

      //---------------------------
      //     Setup
      //---------------------------
    
      srand(1001);

      ndx = sizeof(test_rand)/sizeof(test_rand[0]);
      
      while( ndx ) {
         int num = rand() % SIZE;
         if ( num < SIZE ) test_rand[--ndx] = num;
      }

      //---------------------------
      //  execute Random Test Time
      //---------------------------
    
    Delta_Time[Random].count = REP * SIZE;
    
    begin = clock();
    
    for ( rep = sizeof(test_rand)/sizeof(test_rand[0]); rep; rep-- ) {
       ndx = test_rand[rep - 1];
       if ( (ret = constant(aWinOptions[ndx].sOption, arg)) != 
aWinOptions[ndx].eOption ) 
         printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret);
    }

    Delta_Time[Random].old_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    
    begin = clock();
    
    for ( rep = sizeof(test_rand)/sizeof(test_rand[0]); rep; rep-- ) {
       ndx = test_rand[rep - 1];
       if ( (ret = new_constant(aWinOptions[ndx].sOption, arg)) != 
aWinOptions[ndx].eOption ) 
         printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret);
    }

    Delta_Time[Random].new_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;

   //---------------------------
   //     'k' Test Time
   //---------------------------

    for ( rep = Random + 1; rep < sizeof(Delta_Time)/sizeof(Delta_Time[0]); 
rep++ ) {
       char * name = Delta_Time[rep].test_name;
       
       ndx = REP * SIZE;

       Delta_Time[rep].count = REP * SIZE;

       begin  = clock();
       
       while ( ndx-- ) ret = constant(name, arg);

       Delta_Time[rep].old_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    }

    for ( rep = Random + 1; rep < sizeof(Delta_Time)/sizeof(Delta_Time[0]); 
rep++ ) {
       char * name = Delta_Time[rep].test_name;

       ndx = REP * SIZE;

       Delta_Time[rep].count = REP * SIZE;

       begin  = clock();
       
       while ( ndx-- ) ret = new_constant(name, arg);

       Delta_Time[rep].new_constant = ((double) (clock() - begin)) / 
CLOCKS_PER_SEC;
    }

    printf("\n    Test Name            count    new      old\n");
    
    for ( ndx = 0; ndx < sizeof(Delta_Time)/sizeof(Delta_Time[0]); ndx++ ) {
       printf("%s     <%d %f %f>\n", Delta_Time[ndx].test_name
                                   , Delta_Time[ndx].count
                                   , Delta_Time[ndx].new_constant
                                   , Delta_Time[ndx].old_constant
             );
    }
    
    return 0;
    
};

 -------------- Original message ----------------------
From: Glenn Linderman <[EMAIL PROTECTED]>
> On approximately 10/13/2005 1:18 PM, came the following characters from 
> the keyboard of [EMAIL PROTECTED]:
> > Glenn (et alia);
> > I don't believe that the case for the new algorithm is closed, 
> 
> It's good enough for me.  Unless you wish to implement a "real" hashing 
> algorithm, based, say on some perfect hash generator.  I did that quite 
> a few years back for a similar keyword lookup situation, and it was 
> reasonably beneficial then (on slower processors).
> 
> Since you have this nice testing framework set up, I wonder if you would 
>   approximately determine k, for which this hash function becomes slower 
> than the binary search.  For sufficiently small lists of constants, the 
> existing algorithm may suffice, it would be nice to have a rule of thumb 
> as to approximately how many constants one would need to have before it 
> is better to use a binary search.
> -- 
> Glenn -- http://nevcal.com/
> ===========================
> Having identified a vast realm of ignorance, Wolfram is saying that much
> of this realm lies forever outside the light cone of human knowledge.
>                            -- Michael Swaine, Dr Dobbs Journal, Sept 2002



Reply via email to