On approximately 12/17/2005 6:46 AM, came the following characters from
the keyboard of Arthur Schwarz:
> Robert;
"... For example, my last email just suggested doing a form
of linear searching, because it would allow space to be saved.... but
this can be fully optimal in certain circumstances... what
circumstances?
FWIW (I really hate these acronyms, only because I rarely
know what they mean, so, for what it's worth) linear searching
is 'optimal' in only a few cases, to wit:
1. The locale of reference is small.
2. It is possible to load all of the list into primary cache.
3. Predictive loads load data not in primary cache into
secondary cache before use (and then into primary cache
before use).
4. Dynamic instruction code optimization does not intefere with
the search by putting in latencies.
In terms of just looking at code and operation, a binary search
should be faster in most (but not all) cases. In particular,
conditions based on a small locale of reference are violated
for a binary search.
See, for example,
Self-organizing Linear Search
J.H. Hester, D. S. Hirschberg
ACM Computing Surveys, V17N3
for another look at linear searching.
> It does several things that have ne'er before seen
> the light of day. Assuming a keyword list of ASCII words (not strictly
> necessary):
> 1. Given a list of keywords, it selects the optimum set of columns to use
> to get a minimum height search tree.
I've never heard this idea codified before, but I did, in fact, attempt
to manually approximate that a few times in the past when I was
generating mph polynomials. Computers might be fast enough now to make
it practical to calculate this automatically.
Calculation of an optimal parameter set from an arbitrarily large set
does not take long on average (I suspect). Worst case performance can
be really bad and depends on tree depth. If at most a single keyword
is removed from the keyword list at every level then the worst
case is bounded by:
Iterations = O((# keyword) * (maximum # parameters) * (# keywords))
O == big O (upper bound of computation)
and depth == (maximum # parameters)
The algorithm has nothing to do with MPH, it is merely useful as a
prepass optimization step. Since the solution is deterministic
(finiteness guaranteed) and not a branch-and-bound problem it doesn't
suffer from some of the search issues.
> 2. Given the results of the prepass, it will generate a polynomial whose
> return value is an integer, 0 <= i < |# keywords| for each input keyword.
I've heard this one discussed, of course without the results of the
prepass. But in the codes I've seen, it was never stated as a
guaranteed achievement; the operational instructions said you could try
for that, but if it didn't find one, then you might increase the size of
the max integer, which would result in some gaps in the sequence of
possible return values, which would "help" the polynomial generator find
a solution.
There are a number of polynomial and non-polynomial solutions given
in the literature (that I've looked at). All of them that I have so
far seen depend on a search technique of some sort. One such solution
which is readily available is 'gperf' in Cygwin. My proposed solution
does not depend on a search and is deterministic.
I'll be interested to see the Arthur Schwarz mph generator when it is
published.
Me to :).
That doesn't mean that I necessarily think that it will be
the right solution for Win32::GUI::Constant ... my thinking is quite the
opposite at the moment.
Not my decision. If used I am sure it will be analyzed for performance
and used if it performs well.