On Oct 13, 2005, at 3:15 AM, Paul Elschot wrote:
The main negative to this query, just like with WildcardQuery and
FuzzyQuery, is the possible performance issue.  However, just like
WildcardQuery, this really depends on how clever the indexing side of
things is and matching that cleverness with an appropriate regex.  In
my actual use of these queries involves doing overlapped rotated term
indexing and also rotating the query term to have the best possible
prefix for term enumeration.  Naive use of this query using ".*foo"
of course will have the same impact as WildcardQuery using *foo - and
perhaps slightly slower with regex matching involved.

Overall, I think it is a good addition and will allow users to be
more expressive than the lower-level MultiPhraseQuery (aka
PhrasePrefixQuery).

Thoughts?


In the surround language, this was done by splitting the query term
in a fixed prefix and a remainder starting with a truncation character.
For this remainder a regular expression is built and used.
The prefix is used to limit the number of terms fed to the regular expression
matcher. The code is in SrndTruncQuery.java here:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/ surround/src/java/org/apache/lucene/queryParser/surround/query/

Likewise with my PatternQuery - it limits the term enumeration just as WildcardQuery does, to the fixed prefix.

So, with an addition to the javadocs that the length of the prefix is
important for performance, I think a regular expression based query term
would be very useful, especially when combined an analyzer that does
appropriate term rotation.

Right - I just mentioned the caveat to have the bases covered. It would be possible to do a PatternQuery("*") that would enumerate every term. At this point - anyone using such a query would have to do it by the API, just as they would the SpanQuery family - so it would be for power users that hopefully would understand how these queries work.

And with term rotation, as  you say, things get much much better!

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to