Re: [htdig3-dev] Architecture Overview: htsearch parsing revisited

Geoff Hutchison Thu, 30 Mar 2000 11:22:21 -0800
On Thu, 30 Mar 2000, Andrew Scherpbier wrote:

> If a redesign of htsearch is in order, I think you should think about
> broadening the functionality a little at the same time.
> [snip]
> I believe this can still be done with a single query/process setup.  The
> tricky part will be the limiting and cleanup of the cache.

Oh sure. IMHO, part of the problem is that without some redesign, it's not
entirely clear where this sort of functionality should go. I belive the
"in" term for my comments is "refactoring."

> (more comments below...)
> > What I've just described is IMHO some requirements for a Parser
> > class--it transforms the query into an expression tree. Then a
> 
> Suggestion:  A parent ParseTree class with a derived class for each of the
> search modes:
> BooleanParseTree, OrParseTree, AndParseTree, etc.
> Objects should tend to represent data, not functionality...

I think actually there's really only one tree, the BooleanParseTree. The
others can easily be transformed into this type (because they limit the
operators) and this is necessary b/c of htfuzzy. (see below)

Now maybe you're suggesting that the other forms are subclasses of the
BooleanParseTree and by constructing a new object, these subclasses create
the appropriate Boolean tree. The differences between the classes
would be in their parse methods. If so, we're talking about the same thing
(verbiage can often get in the way).

> Since the fuzzy algorithms can *add* new terms to a boolean search, the fuzzy
> step needs to come after the parsing and before the searching.
> Since fuzzy is an algorithm to be applied to the parse tree, it should
> probably be incorporated into the ParseTree class.

Essentially, the fuzzy algorithms expand the ParseTree. They take words
and turn them into OR expressions (to use your terminology, they make a
new OrParseTree). An additional enhancement also occurs to me. Some fuzzy
matches may want to add weighted words (different shades of fuzzy?). So
fuzzy algorithms should return a fully functional ParseTree to add to the
leaves of our original tree. For the current algorithms (which make no
distinction), they can simply use OrParseTree(word_list).

> Actual searching is also something that is applied to the parse tree, so it
> should probably be incorporated into the ParseTree class as well.  Its return
> value should be a Results object.
> 
>       Results searchResults = parseTree->search();

For consistency with the current code, I think we can leave the
ResultList name. Most of the current class can be left easily, but expanded.

> The Result object should be the one that generates the results with the help
> of something like an OutputRepresentation object.

I was thinking more along the lines of passing the ResultList *to* an
display agent.

> The Result object should also be in charge of paging the results:
> 
>       // Generate output for page 1 using 20 results per page
>       searchResults->output(representation, 20, 1);

See, I think this is backwards. I don't think a ResultList should need
to know *anything* about pages. What if an output method doesn't have
pages? After all, maybe a caching scheme is an output method. >:-)
This would solve the problem for someone going to the next page or
changing the sort or something minor.

(I'm implying the sort is in the output because a nice optimization is
to only do a partial sort up to the number of results needed.)

> With this, the main() in htsearch would essentially be reduced to the sample
> code I included.
> All the work will be done by the appropriate objects.

I think that's my main gripe about htsearch right now. The main() does
a *lot* of the work, but this makes it hard to use inheritance and so on.

> A cache class can be in charge of managing the cache size and do funky LRU
> type stuff.

Well, the cache may want to use mmap() where appropriate. But funky
LRU type stuff gives me chills. Perhaps someone else will be
interested in such things.

(I don't have nightmares about htsearch--I have enough other things to
give me nightmares! :-)

-Geoff




------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
Re: [htdig3-dev] Architecture Overview: htsearch parsing revisited

Reply via email to