[ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-2690: ---------------------------------- Attachment: LUCENE-2690.patch Attached is a new patch with two changes: - moved the BQ reordering to MTQ for now. A general reordering of BooleanQueries should be done in a separate issue (with more performant rewrite). Currently this uses the same comparator like BQ before. You may wonder: why not simply use a sorted map? - the idea is that sorting at the end is faster than using a TreeMap where all terms are compared against (even those falling out of queue). I sort the BQ clauses directly like BQ, to not create an additional array to hold all terms again. Maybe its still faster by copying all BytesRefs to an array before and then build BQ? For now this should be enough. To improve we need SorterTemplate again (for the BytesRefHash case) :-) - fixed an issue with the PQ in TopTermsRewrite: The bottom information was previously only set when the PQ was overflowing. In the past and now its set once the queue is full. This was an optimization bug, its now as it was always. Maybe this explains Mike's score changes on wikipedia index? Mike: can you test? > Do MultiTermQuery boolean rewrites per segment > ---------------------------------------------- > > Key: LUCENE-2690 > URL: https://issues.apache.org/jira/browse/LUCENE-2690 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 4.0 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 4.0 > > Attachments: LUCENE-2690-attributes.patch, > LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, > LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch > > > MultiTermQuery currently rewrites FuzzyQuery (using > TopTermsBooleanQueryRewrite), the auto constant rewrite method and the > ScoringBQ rewrite methods using a MultiFields wrapper on the top-level > reader. This is inefficient. > This patch changes the rewrite modes to do the rewrites per segment and uses > some additional datastructures (hashed sets/maps) to exclude duplicate terms. > All tests currently pass, but FuzzyQuery's tests should not, because it > depends for the minimum score handling, that the terms are collected in > order.. > Robert will fix FuzzyQuery in this issue, too. This patch is just a start. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org