[ 
https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887228#action_12887228
 ] 

Michael McCandless commented on LUCENE-2130:
--------------------------------------------

I think this is very important to fix...

We should somehow make the MultiTermQuery.RewriteMethod handle a per-segment 
API (ie, add .setNextReader).

I'm running some unrelated perf-tests, and ended up testing the fuzzy query 
united~0.6 (max edit distance=2), against the same 5M Wikipedia index in 
optimized and unoptimized (13 segs) form.  The unoptimized index takes ~45 
seconds to run, while the same index optimized takes 750 msec (~60X slower)!  
For query united~0.7 (max edit distance=1), it's ~4330 msec vs ~114 msec (~38X 
slower).

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2130.patch, LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. 
> The only idea I have come up with is fairly ugly, and unless something better 
> comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's 
> with auto (when the heuristic doesnt cut over to constant filter), or 
> constant boolean rewrite could enum terms against a single segment and then 
> apply a boolean query against each segment with just the terms that are known 
> to be in that segment. This also allows you to avoid DirectoryReaders 
> MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - 
> lateCnstRewrite or something, that defaults to false. MTQ would return true 
> if its in a constant score mode. On the top level rewrite, if this is 
> detected, an empty ConstantScoreQuery is made, and its Weight is turned to 
> lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets 
> its boost set to the MTQ's boost. Then when we are searching per segment, if 
> the Weight is lateCnstRewrite, we grab the orig query and actually do the 
> rewrite against the subreader and grab the actual constantscore weight. It 
> works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be 
> met in another way.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to