Good morning,

we currently use Lucene 4.3 in our project. We automatically generate PrefixQueries and we are passing the rewritten query to the Highlighter to highlight search terms in the search result. Up until a few days ago, we were using a MultiTermQuery.CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE because the highlighter does not work with the ConstantScoreQueries generated by the MultiTermQuery.ConstantScoreAutoRewrite. We have also set the "maxClauseCount" to a very large number to avoid the TooManyClausesException. This has worked well for years until now.

Now there have been some searches for "a b c" or "s t am p s" which generated OutOfMemoryErrors, so we now use the ConstantScoreAutoRewrite and accept that some terms are not highlighted in the search result.
However, I read in the changelog of Lucene 5.0 that
MultiTermQuery.ConstantScoreAutoRewrite was removed in favour of MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE.

My problems:

1) PrefixQueries rewritten with a MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE don't work with the default Highlighter at all. 2) Passing the original query to the Highlighter directly worked in my testcases, but without a very large dataset. I have noticed the the WeightedSpanTermExtractor which is used by the Highlighter uses a MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE so I fear if we do that, we will get OutOfMemory again when somebody searches for "a b c".

What method do you suggest to highlight prefix-terms. I should also mention that we are using a custom formatter and a custom text-fragmenter. I have not found any tutorials for the FastVectorHighlighter. The PostingsHighlighter might work but I'm not sure how to implement custom fragment sizes.

Thanks in advance,

Nils Knappmeier

--
--

Nils Knappmeier | Software Engineer
intelligent views gmbh
Julius-Reiber-Str. 17 |64293 Darmstadt

Tel ++49(0)6151 - 5006-228 | Fax ++49(0)6151 - 5006-138
e-mail: n.knappme...@i-views.de | www.i-views.de


Geschäftsführer: Jörg Kleinz, Klaus Reichenberger
Die Gesellschaft ist eingetragen beim Amtsgericht Darmstadt (Sitz der
Gesellschaft) Nr. HRB 7965

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und 
loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe 
dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorised copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.


Reply via email to