On approximately 12/17/2005 6:46 AM, came the following characters from 
the keyboard of Arthur Schwarz:

> Robert;
"...  For example, my last email just suggested doing a form 
of linear searching, because it would allow space to be saved.... but 
this can be fully optimal in certain circumstances... what 
circumstances?  

   FWIW (I really hate these acronyms, only because I rarely
   know what they mean, so, for what it's worth) linear searching
   is 'optimal' in only a few cases, to wit:
   1. The locale of reference is small.
   2. It is possible to load all of the list into primary cache.
   3. Predictive loads load data not in primary cache into 
      secondary cache before use (and then into primary cache
      before use).
   4. Dynamic instruction code optimization does not intefere with
      the search by putting in latencies.

   In terms of just looking at code and operation, a binary search
   should be faster in most (but not all) cases. In particular, 
   conditions based on a small locale of reference are violated 
   for a binary search.

   See, for example, 
     Self-organizing Linear Search
     J.H. Hester, D. S. Hirschberg
     ACM Computing Surveys, V17N3

   for another look at linear searching.

> It does several things that have ne'er before seen
> the light of day. Assuming a keyword list of ASCII words (not strictly
> necessary):
> 1. Given a list of keywords, it selects the optimum set of columns to use
> to get a minimum height search tree.

I've never heard this idea codified before, but I did, in fact, attempt 
to manually approximate that a few times in the past when I was 
generating mph polynomials.  Computers might be fast enough now to make 
it practical to calculate this automatically.

  Calculation of an optimal parameter set from an arbitrarily large set
  does not take long on average (I suspect). Worst case performance can
  be really bad and depends on tree depth. If at most a single keyword
  is removed from the keyword list at every level then the worst 
  case is bounded by:

      Iterations = O((# keyword) * (maximum # parameters) * (# keywords))
      O         == big O (upper bound of computation)
      and depth == (maximum # parameters)

  The algorithm has nothing to do with MPH, it is merely useful as a
  prepass optimization step. Since the solution is deterministic
  (finiteness guaranteed) and not a branch-and-bound problem it doesn't
  suffer from some of the search issues.

> 2. Given the results of the prepass, it will generate a polynomial whose
> return value is an integer, 0 <= i <  |# keywords| for each input keyword.

I've heard this one discussed, of course without the results of the 
prepass.  But in the codes I've seen, it was never stated as a 
guaranteed achievement; the operational instructions said you could try 
for that, but if it didn't find one, then you might increase the size of 
the max integer, which would result in some gaps in the sequence of 
possible return values, which would "help" the polynomial generator find 
a solution.

   There are a number of polynomial and non-polynomial solutions given
   in the literature (that I've looked at). All of them that I have so
   far seen depend on a search technique of some sort. One such solution
   which is readily available is 'gperf' in Cygwin. My proposed solution
   does not depend on a search and is deterministic.

I'll be interested to see the Arthur Schwarz mph generator when it is 
published.  

  Me to :).

That doesn't mean that I necessarily think that it will be 
the right solution for Win32::GUI::Constant ... my thinking is quite the 
opposite at the moment.

  Not my decision. If used I am sure it will be analyzed for performance
  and used if it performs well.


Reply via email to