On Thu, 31 Aug 2000, Quim Sanmarti wrote:
> > ... If someone wants to
> > write out a user query parser, I'd be glad to take it...
Cool. I've been quite busy this week (moving apartments). I'll put this in
and expire the ParseTree code. While that code is definitely working, the
object models are close enough and yours is further along.
> - Parsers for
> And/Or expressions
One of the popular requests is for a method=exact, which should be added
but shouldn't be hard.
> 'AndParser', 'OrParser' generate a single Operator on the top of a
> sequence of words/phrases. A 'NearParser' could be included also.
Or ExactParser, as above.
> Result caching is supported, but is not persistent between eventual program
> invocations. I guess it's feasible to support a cache between invocations,
> too.
I think the idea is to allow people to turn on a small BerkeleyDB (or
other indexed file) for pre-parsed results. Querying for a word is only
going to be faster if it has pre-computed scores, but the biggies would be
for already computed And/Or/Near/Not levels and of course for the query
itself. The latter is obviously a big win when you go to Page 2 of the
results. :-)
> (problem: the order of factors is important here. Query strings 'foo and
> bar' and 'bar and foo' will store results two different cache entries.)
I think this may be easier, esp. if we're scoring proximity. Often a user
entering foo and bar wants things scored slightly differently than bar and
foo.
> fuzzy algos. I haven't studied enough Gilles' suggestion about fuzzy
> chaining, but it can be down by deriving a class from FuzzyExpander. Fuzzy
> expansion is performed at the moment of query parse.
I think we'll want to hold off on this a bit. I agree it might be a good
idea, but we also want to avoid loops and work out a good way to specify
this in the config file. Maybe we can hold off for 3.2.1 or 3.3.
(Remember, I also want to see 3.2.0 out the door as soon as reasonably
possible.)
> Further optimisations are to be studied. For instance, it would be nice to
> be able to simplify things such as 'a or a = a', 'a not a = 0', etc...
> before or during evaluation.
I don't know how often this would kick in. Maybe if people are entering
things like "what is the weight of the moon" -> "what or is or the or
weight or of or moon" Plus, if you're doing query caching, you're going to
get a cache hit for the second item.
> Phrase search is *quite* faster. Testing phrase "las del" takes about 2
> secs on the GTD intranet's digger database (match counts for each word and
Not bad. I'm not surprised that a cleanup of that mess gives better
performance. :-)
It looks like I'll want to work on WordQuery et. al. I promised long ago
to allow returning all documents for a query of "*" as well as restricting
words to specific flags like "title:Foo."
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.