On approximately 12/17/2005 6:46 AM, came the following characters from the keyboard of Arthur Schwarz:
> Robert; "... For example, my last email just suggested doing a form of linear searching, because it would allow space to be saved.... but this can be fully optimal in certain circumstances... what circumstances? FWIW (I really hate these acronyms, only because I rarely know what they mean, so, for what it's worth) linear searching is 'optimal' in only a few cases, to wit: 1. The locale of reference is small. 2. It is possible to load all of the list into primary cache. 3. Predictive loads load data not in primary cache into secondary cache before use (and then into primary cache before use). 4. Dynamic instruction code optimization does not intefere with the search by putting in latencies. In terms of just looking at code and operation, a binary search should be faster in most (but not all) cases. In particular, conditions based on a small locale of reference are violated for a binary search. See, for example, Self-organizing Linear Search J.H. Hester, D. S. Hirschberg ACM Computing Surveys, V17N3 for another look at linear searching. > It does several things that have ne'er before seen > the light of day. Assuming a keyword list of ASCII words (not strictly > necessary): > 1. Given a list of keywords, it selects the optimum set of columns to use > to get a minimum height search tree. I've never heard this idea codified before, but I did, in fact, attempt to manually approximate that a few times in the past when I was generating mph polynomials. Computers might be fast enough now to make it practical to calculate this automatically. Calculation of an optimal parameter set from an arbitrarily large set does not take long on average (I suspect). Worst case performance can be really bad and depends on tree depth. If at most a single keyword is removed from the keyword list at every level then the worst case is bounded by: Iterations = O((# keyword) * (maximum # parameters) * (# keywords)) O == big O (upper bound of computation) and depth == (maximum # parameters) The algorithm has nothing to do with MPH, it is merely useful as a prepass optimization step. Since the solution is deterministic (finiteness guaranteed) and not a branch-and-bound problem it doesn't suffer from some of the search issues. > 2. Given the results of the prepass, it will generate a polynomial whose > return value is an integer, 0 <= i < |# keywords| for each input keyword. I've heard this one discussed, of course without the results of the prepass. But in the codes I've seen, it was never stated as a guaranteed achievement; the operational instructions said you could try for that, but if it didn't find one, then you might increase the size of the max integer, which would result in some gaps in the sequence of possible return values, which would "help" the polynomial generator find a solution. There are a number of polynomial and non-polynomial solutions given in the literature (that I've looked at). All of them that I have so far seen depend on a search technique of some sort. One such solution which is readily available is 'gperf' in Cygwin. My proposed solution does not depend on a search and is deterministic. I'll be interested to see the Arthur Schwarz mph generator when it is published. Me to :). That doesn't mean that I necessarily think that it will be the right solution for Win32::GUI::Constant ... my thinking is quite the opposite at the moment. Not my decision. If used I am sure it will be analyzed for performance and used if it performs well.