[ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570979#action_12570979
 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>This may be largely irrelevant, but Solr has a ConstantScorePrefixQuery which 
>>has similar issues

No, very relevant. Only yesterday I had a user with exactly the same 
highlighting problem

>>it seems we prob shouldn't even keep it as configurable. Just drop it then?

My nightmare scenario is systems where people are using ConstantScoreRangeQuery 
in their queries to do both latitude and longitude ranges over large areas - 
that's a lot of terms. I'd at least want the option of NOT loading them all 
into RAM at once when highlighting.

Maybe we could look at having different highlight "matchers". The existing 
approach of keeping a big bag of query terms becomes a "TermsMatcher" (simply 
looks up tokens in a HashSet of terms), You can imagine a new "PrefixMatcher" 
which would examine tokens using "startsWith" and a "RangeMatcher" examine 
tokens using just a start and end term. However, there's  a danger we could end 
up re-implementing a lot of query logic so maybe the relevant queries/filters 
could implement a "Matcher" interface to enable the same logic that is used 
when scanning TermEnum at query time to be used by the Highlighter when looking 
at TokenStreams i,e. something like this:
interface Matcher
{
   boolean matches(String value)
}
Needs some more thought yet but it could be an approach.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  
> ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, 
> SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, 
> spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, 
> spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, 
> spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, 
> spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, 
> spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
> package that scores just like QueryScorer, but scores a 0 for Terms that did 
> not cause the Query hit. This gives 'actual' hit highlighting for the range 
> of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are 
> easy to add. There is also a new Fragmenter that attempts to fragment without 
> breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to