Glenn Linderman <[EMAIL PROTECTED]> To 12/17/2005 03:51 [EMAIL PROTECTED] PM cc Jeremy White <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED] forge.net Subject Re: [perl-win32-gui-hackers] Constants and hashing [Was: Win32::GUI::Dialog() failure]
On approximately 12/17/2005 4:39 AM, came the following characters from the keyboard of Robert May: > So, it was my plan to do a later implementation moving the lookup back > into XS, using a 'minimal perfect hash' and an efficient C array. The > storage for this approach would be something like 8 bytes per constant > identifier (string pointer + long value) plus the constant identifier > string length (for the constant lookup array), plus the storage for the > polynomial values used by the hash function. Agreed, speed probably isn't the issue, and mph may or may not be necessary (it mostly benefits speed issues). After some reflection from our original discussions until now, I think it may be more worthwhile to figure out how to make _all_ 18000 constants available to Win32::GUI programs, without paying a huge price in memory. Those constants that actually get used get cached in the stash anyway... If speed isn't the issue then a binary search has at most 15 probes to find any of the 18,000 entries in the table with a no time cost expansion for another 14,000 entries. As benchmarking showed some time back, the implemented 'hash' was better than a binary search for some small number of searches. I would expect that the binary search would be faster than a hash for some searches. Given a uniform distribution of searches, the expected number of probes is approximately 8. Again, if speed is not a criteria and a minor speed decrease is acceptable, then a binary search is a good way to proceed. The only overhead are the string pointers in the table of entries. If you want a tailored partitioned binary search then one data structure / partition is required, the binary search function code can be shared. This results in a decrease in the number of probes (since the table size for each partition is smaller than for the entire data structure), with added overhead to determine what partition to access. And on the added overhead, we esssentially trade time, i.e. probes, for code to determine what partition to use, and we add some small amount of additional complexity for multiple tables. Since the animal breeds, I would expect that there will be future additions and deletions of constants. Hence, a need to address the future. At the end, I'm not wealthy because I do a terrible job of predicting. art