Chuck Williams wrote:
Doug Cutting wrote:
  > It would indeed be nice to be able to short-circuit rewriting for
  > queries where it is a no-op.  Do you have a proposal for how this
could
  > be done?

First, this gets into the other part of Bug 31841.  I don't believe
MultiSearcher.rewrite() is ever called.  Rewriting is done in the
Weight's, which invoke the rewrite() method of the Searcher, which is
always the Seacher invoked by the MultiSearcher, not the MultiSearcher
itself.

This would be fixed by the proposal under consideration. Weights would be constructed much earlier, using the top-level Searcher, so rewrites would use this too.


In fact, MultiSearcher.rewrite() is broken.  It requires
Query.combine() which is unsupported except for the derived queries
(i.e., those for which rewriting is not a no-op).  When I added
topmostSearcher to get the Weight's to call the MultiSearcher.docFreq(),
that also caused them to call MultiSearcher.rewrite() which blows up on,
for example, a simple TermQuery, because there is no
TermQuery.combine().  That's why my patch contains a new default
implementation for Query.combine() (which as noted in the bug report is
probably not a good idea in general).

So, I don't believe there is any valid rewrite() implementation for
MultiSearcher to start from, unless I've completely misunderstood
something.

It looks like MultiSearcher.rewrite() was never implemented correctly since it was never called -- a latent bug. It only needs to be called when queries are rewritten to something different:


  public Query rewrite(Query original) throws IOException {
    Query[] queries = new Query[searchables.length];
    boolean changed = false;
    for (int i = 0; i < searchables.length; i++) {
      Query rewritten = searchables[i].rewrite(original);
      changed = !rewritten.equals(original);
      queries[i] = rewritten;
    }
    if (changed) {
      return original.combine(queries);
    } else {
      return original;
    }
  }

Then we'll need an implementation of combine() for all query types. The implementation for BooleanQuery is fairly simple: combine() each of the corresponding clauses. For TermQuery, PhraseQuery and SpanQuery combine should create a deduplicated OR. Derived queries already have an implementation.

To address the question above, RemoteSearchable.rewrite() should be a
no-op, i.e. always return this.  For good error handling, it should
verify that the query does not require rewriting.  This requires some
mechanism to determine whether or not a query requires rewriting.  The
challenge here is that some query types have a non-trivial rewrite()
method not because they require rewriting, but because they might have
subqueries that require rewriting (e.g., BooleanQuery).  Other query
types (e.g., MultiTermQuery) always require rewriting, while those that
implement Weight's never require it.  I think an upward incompatibility
is required in the API to address this.

If that is acceptable, then this could work:
  1.  Add a new interface called Rewritable that specifies a boolean
rewriteRequired() method.
  2.  Have Query implement Rewritable but NOT provide an implementation
for rewriteRequired().  This will force all applications to add support
for this in order to upgrade.
  2.  Change all the Weight's to call Query.maybeRewrite() instead of
Query.rewrite().
  3.  Have Query.maybeRewrite() only call Query.rewrite() if
Query.rewriteRequired() is true.
  4.  Have RemoteSearchable.maybeRewrite() throw an Exception if
Query.rewriteRequired() is true.
  5.  Implement rewriteRequired() for all the built-in Query types
(which is either true for derived queries, false for primitive queries,
or an or of rewriteRequired() for all the subqueries).

That sounds hairy. Why not just add a single new method:

   boolean Query.isRewritten() { returns true; }

Then override this in TermQuery, PhraseQuery and SpanQuery to return false, and in BooleanQuery to walk its clauses and return true iff any of them return true. As an optimization, RemoteSearchable could avoid calling rewrite() when this is true.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to