<sighs> :: <sighs> <sigh> | <sigh> The tests were re-run with a selected list of candidate entries. Timing for the candidates was compared (used) for both the existing and the new algorithms. The test computer was my home AMD 2100+ instead of my work Pentium Really Fast CPU. This accounts for some of the time differences, but, as we shall see, not all of them. The test harness for the first four tests (Amortized, Minimum, Maximum, and Random) is unchanged. The test harness for the residual tests are to extract the name from the Delta_Time array, stuff it into a loop invariant variable, and use it to call the functions. Overhead timing for testing the new and used algorithms are the same and no special account is made for it (the timing).
With the following conclusions: 1. In all cases the new algorithm is faster. This differs from the results at work in which the Minimum test showed that the existing algorithm was faster. 2. As expected, there is a gradual increase in test time for items with the same first letter. But there is a time reversal between COLOR_ACTIVEBORDER and COLOR_ACTIVECAPTION, and between almost anygint and DS_3DLOOK, and etc. Anomolies which need explanation. 3. The test for the putative minimum for the Minimum Test (BS_3STATE) is some 10 times slower in the new test cycle than the Minimum Test. I can't account for the anomoly. The anomolous behavior may be a factor in the optimization used. I might try this at work to see if the results are consistent, or hey, you might try it. Compilation was done with: gcc -O3 main.cpp GUI_Constants-new.cpp GUI_Constants.cpp Using GCC Version: gcc (GCC) 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125) Test Name count new old Amortized Cost <791998 0.219000 0.594000> Minimum Cost <791998 0.031000 0.047000> Maximum Cost <791998 0.172000 0.578000> Random Cost <791998 0.250000 0.594000> BS_3STATE <791998 0.187000 0.484000> BS_AUTO3STATE <791998 0.219000 0.563000> BS_AUTOCHECKBOX <791998 0.203000 0.531000> BS_AUTORADIOBUTTON <791998 0.203000 0.547000> BS_CHECKBOX <791998 0.203000 0.547000> COLOR_3DFACE <791998 0.204000 0.422000> COLOR_ACTIVEBORDER <791998 0.250000 0.484000> COLOR_ACTIVECAPTION <791998 0.250000 0.125000> COLOR_APPWORKSPACE <791998 0.328000 0.469000> DS_3DLOOK <791998 0.156000 0.328000> A short note on hashing. Hashing can be divided into the following (rough) categories: 1. Closed hashing algorithms. Collisions cause a rehash into the same table. Statistics: Asymptotic behavior as the table is > 70% full. 2. Open hashing algorithms. Collisions are organized into a linear list with a linear search to resolve the collision. Statistics: Approximately O(1) for all entries. 3. Perfect hashing table. No collisions. 4. Minimal hashing table. Table size is the same as the input data size (see 1.) 5. Minimal Perfect hashing table. No collisions and the table size is the same as the data size. It is always possible to get a Perfect hash table. The limitation is the amount of space needed. For example, take a telephone book for your favorite city. Use the full first name, full last name, full middle name, and a characteristic which differentiates people with the same name. Now construct a table of size (using English capital letters): Table Size = 26**(|last name| + |first name| + |middle name|) * | unique characteristic | (Skipping the discussion of what all this stuff might mean) This is a really big table. But it is a Perfect hash table. A Minimal hashing table just has enough entries to contain the sample data. As Glenn has properly pointed out, this works if the data is static. A change in the input data requires a change to the table (potentially). A Minimal Perfect hashing algorithm is hard (same caveat about data being static). The general techniques that I have seen to construct this table are basically branch and bound, although I do believe that I have a non-iterative, complete, and perfect solution which I worked on for my Master's - I just have to complete the work based on 'new' knowledge. The idea of a Minimal Perfect hashing algorithm was put into print (I believe firstly but I don't know for sure) by D. Knuth in Vol II of his books on the Art of Computer Progamming (Seminumerical Algorithms). He is most famously known for his statement that 1/2 day should be sufficient to construct a Minimal Perfect hashing algorithm. I defer to your judgement in naming your algorithm based on historical precendents. My own experience is in hashing algorithms of the form: Key = HashFunction(mangled input); art ===================================================================== MAIN.CPP #include <stdio.h> #include <stdlib.h> #include <time.h> #include "test.h" typedef long int DWORD; # define NOTXSPROC # define SIZE sizeof(aWinOptions)/sizeof(aWinOptions[0]) - 2 # define REP 2000 static int test_rand[REP * SIZE]; extern DWORD new_constant(NOTXSPROC char *name, int arg); extern DWORD constant(NOTXSPROC char *name, int arg); int main () { struct test_time { char * test_name; int count; double new_constant; double old_constant; }; enum Test { Amortized, Minimum, Maximum, Random }; clock_t begin; int ndx; int rep; int arg; int ret; // Test Name count new old test_time Delta_Time[] = { {"Amortized Cost ", 0, 0.0, 0.0} , {"Minimum Cost ", 0, 0.0, 0.0} , {"Maximum Cost ", 0, 0.0, 0.0} , {"Random Cost ", 0, 0.0, 0.0} , {"BS_3STATE ", 0, 0.0, 0.0} , {"BS_AUTO3STATE ", 0, 0.0, 0.0} , {"BS_AUTOCHECKBOX ", 0, 0.0, 0.0} , {"BS_AUTORADIOBUTTON ", 0, 0.0, 0.0} , {"BS_CHECKBOX ", 0, 0.0, 0.0} , {"COLOR_3DFACE ", 0, 0.0, 0.0} , {"COLOR_ACTIVEBORDER ", 0, 0.0, 0.0} , {"COLOR_ACTIVECAPTION", 0, 0.0, 0.0} , {"COLOR_APPWORKSPACE ", 0, 0.0, 0.0} , {"DS_3DLOOK ", 0, 0.0, 0.0} }; //--------------------------- // Amortized Test Time //--------------------------- Delta_Time[Amortized].count = REP * SIZE; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = constant(aWinOptions[ndx].sOption, arg)) != aWinOptions[ndx].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret); } } Delta_Time[Amortized].old_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = new_constant(aWinOptions[ndx].sOption, arg)) != aWinOptions[ndx].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret); } } Delta_Time[Amortized].new_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; //--------------------------- // Miniumum Test Time //--------------------------- Delta_Time[Minimum].count = REP * SIZE; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = constant(aWinOptions[0].sOption, arg)) != aWinOptions[0].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[0].sOption, 0, ret); } } Delta_Time[Minimum].old_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = new_constant(aWinOptions[197].sOption, arg)) != aWinOptions[197].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[197].sOption, 197, ret); } } Delta_Time[Minimum].new_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; //--------------------------- // Maximum Test Time //--------------------------- Delta_Time[Maximum].count = REP * SIZE; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = constant(aWinOptions[eWS_EX_WINDOWEDGE].sOption, arg)) != aWinOptions[eWS_EX_WINDOWEDGE].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[eWS_EX_WINDOWEDGE].sOption, eWS_EX_WINDOWEDGE, ret); } } Delta_Time[Maximum].old_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; begin = clock(); for ( rep = REP; rep; rep-- ) { ndx = SIZE; while ( ndx-- ) { if ( (ret = new_constant(aWinOptions[0].sOption, arg)) != aWinOptions[0].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[0].sOption, 0, ret); } } Delta_Time[Maximum].new_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; //--------------------------- // Random Test Time //--------------------------- //--------------------------- // Setup //--------------------------- srand(1001); ndx = sizeof(test_rand)/sizeof(test_rand[0]); while( ndx ) { int num = rand() % SIZE; if ( num < SIZE ) test_rand[--ndx] = num; } //--------------------------- // execute Random Test Time //--------------------------- Delta_Time[Random].count = REP * SIZE; begin = clock(); for ( rep = sizeof(test_rand)/sizeof(test_rand[0]); rep; rep-- ) { ndx = test_rand[rep - 1]; if ( (ret = constant(aWinOptions[ndx].sOption, arg)) != aWinOptions[ndx].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret); } Delta_Time[Random].old_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; begin = clock(); for ( rep = sizeof(test_rand)/sizeof(test_rand[0]); rep; rep-- ) { ndx = test_rand[rep - 1]; if ( (ret = new_constant(aWinOptions[ndx].sOption, arg)) != aWinOptions[ndx].eOption ) printf("ERROR: %s[%i] != %i\n", aWinOptions[ndx].sOption, ndx, ret); } Delta_Time[Random].new_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; //--------------------------- // 'k' Test Time //--------------------------- for ( rep = Random + 1; rep < sizeof(Delta_Time)/sizeof(Delta_Time[0]); rep++ ) { char * name = Delta_Time[rep].test_name; ndx = REP * SIZE; Delta_Time[rep].count = REP * SIZE; begin = clock(); while ( ndx-- ) ret = constant(name, arg); Delta_Time[rep].old_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; } for ( rep = Random + 1; rep < sizeof(Delta_Time)/sizeof(Delta_Time[0]); rep++ ) { char * name = Delta_Time[rep].test_name; ndx = REP * SIZE; Delta_Time[rep].count = REP * SIZE; begin = clock(); while ( ndx-- ) ret = new_constant(name, arg); Delta_Time[rep].new_constant = ((double) (clock() - begin)) / CLOCKS_PER_SEC; } printf("\n Test Name count new old\n"); for ( ndx = 0; ndx < sizeof(Delta_Time)/sizeof(Delta_Time[0]); ndx++ ) { printf("%s <%d %f %f>\n", Delta_Time[ndx].test_name , Delta_Time[ndx].count , Delta_Time[ndx].new_constant , Delta_Time[ndx].old_constant ); } return 0; }; -------------- Original message ---------------------- From: Glenn Linderman <[EMAIL PROTECTED]> > On approximately 10/13/2005 1:18 PM, came the following characters from > the keyboard of [EMAIL PROTECTED]: > > Glenn (et alia); > > I don't believe that the case for the new algorithm is closed, > > It's good enough for me. Unless you wish to implement a "real" hashing > algorithm, based, say on some perfect hash generator. I did that quite > a few years back for a similar keyword lookup situation, and it was > reasonably beneficial then (on slower processors). > > Since you have this nice testing framework set up, I wonder if you would > approximately determine k, for which this hash function becomes slower > than the binary search. For sufficiently small lists of constants, the > existing algorithm may suffice, it would be nice to have a rule of thumb > as to approximately how many constants one would need to have before it > is better to use a binary search. > -- > Glenn -- http://nevcal.com/ > =========================== > Having identified a vast realm of ignorance, Wolfram is saying that much > of this realm lies forever outside the light cone of human knowledge. > -- Michael Swaine, Dr Dobbs Journal, Sept 2002