Paul,
I'm swamped this weekend and all this coming week moving
(physically). I would be happy to mentor someone tackling these
changes. Could you go ahead and put your ideas on the wiki and list
me as the ASF mentor? (I know it says ASF members and committers, but
feel free to add it on my behalf).
Erik
On Jun 5, 2005, at 5:07 AM, Paul Elschot wrote:
How about putting this here:
http://wiki.apache.org/general/SummerOfCode2005
It seems to be a nice fit for the sponsor.
Regards,
Paul Elschot
On Saturday 04 June 2005 22:25, Paul Elschot wrote:
On Monday 30 May 2005 02:44, Erik Hatcher wrote:
I concur with Daniel on this. For the moment, my preference is to
bring in Paul's parser into contrib/surround and let it gain some
additional exposure there. I don't believe its possible or even
preferable to attempt to build one query parser to rule them all.
While a decent general purpose one is handy, I'm finding that my
projects really demand more custom parsing capabilities than the
built-in QueryParser can handle and that the quirks of the current
parser cause some frustrations sometimes.
Perhaps over time, the built-in QueryParser can adopt some
additional
capabilities such as supporting the SpanQuery family but let's take
that sort of thing slowly.
How about extending the surround parser to allow the use of all
queries currently in Lucene? The goal would be to allow as many
queries as possible.
The queries not available in the current surround parser are:
- FuzzyQuery, WildCardQuery, PrefixQuery
- SpanFirstQuery
- SpanNotQuery
- MultiPhraseQuery (or the various phrase scorers),
- optional terms/clauses
FuzzyQuery and SpanFirstQuery could be done with a prefix operator
including a number (like the nn in the nnN near operator) followed
by a
single query, with appropriate restrictions.
A prefix operator followed by a single query is currently not
present, but
relatively easy to add.
SpanNotQuery always has two subqueries, so would need an infix
operator
only.
MultiPhraseQuery would need an infix operator and a prefix
operator, just
like the N and W operators, and a restriction to terms,
truncations and OR
as subqueries.
Left truncation could also be allowed,
truncations currently have to start with a normal character.
Truncation might also be left to WildCardQuery and
PrefixQuery instead of the current "equivalent" in Surround
that uses regular expressions to find the matching terms.
That leaves the optional terms/clauses, and I can't think of an
easy way to
handle these. Any ideas? OR does not work for this because it
requires
at least one. The normal QueryParser syntax for this is +aa bb cc,
where bb and cc are the optional parts.
Some control over performance is outside the language.
A basic query factory must be provided to the create a Lucene query
from a Surround query, and this throws an exception when
rewriting causes too many terms to be used,
much like the TooManyClauses for BooleanQuery.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]