On Monday 30 May 2005 02:44, Erik Hatcher wrote: > I concur with Daniel on this. For the moment, my preference is to > bring in Paul's parser into contrib/surround and let it gain some > additional exposure there. I don't believe its possible or even > preferable to attempt to build one query parser to rule them all. > While a decent general purpose one is handy, I'm finding that my > projects really demand more custom parsing capabilities than the > built-in QueryParser can handle and that the quirks of the current > parser cause some frustrations sometimes. > > Perhaps over time, the built-in QueryParser can adopt some additional > capabilities such as supporting the SpanQuery family but let's take > that sort of thing slowly. >
How about extending the surround parser to allow the use of all queries currently in Lucene? The goal would be to allow as many queries as possible. The queries not available in the current surround parser are: - FuzzyQuery, WildCardQuery, PrefixQuery - SpanFirstQuery - SpanNotQuery - MultiPhraseQuery (or the various phrase scorers), - optional terms/clauses FuzzyQuery and SpanFirstQuery could be done with a prefix operator including a number (like the nn in the nnN near operator) followed by a single query, with appropriate restrictions. A prefix operator followed by a single query is currently not present, but relatively easy to add. SpanNotQuery always has two subqueries, so would need an infix operator only. MultiPhraseQuery would need an infix operator and a prefix operator, just like the N and W operators, and a restriction to terms, truncations and OR as subqueries. Left truncation could also be allowed, truncations currently have to start with a normal character. Truncation might also be left to WildCardQuery and PrefixQuery instead of the current "equivalent" in Surround that uses regular expressions to find the matching terms. That leaves the optional terms/clauses, and I can't think of an easy way to handle these. Any ideas? OR does not work for this because it requires at least one. The normal QueryParser syntax for this is +aa bb cc, where bb and cc are the optional parts. Some control over performance is outside the language. A basic query factory must be provided to the create a Lucene query from a Surround query, and this throws an exception when rewriting causes too many terms to be used, much like the TooManyClauses for BooleanQuery. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
