[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

Andrzej Bialecki (JIRA) Thu, 15 May 2008 06:14:23 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrzej Bialecki  updated LUCENE-1285:
--------------------------------------

    Description: 
Given a BooleanQuery with multiple clauses, if a term occurs both in a Span / 
Phrase query, and in a TermQuery, the results of term extraction are 
unpredictable and depend on the order of clauses. Concequently, the result of 
highlighting are incorrect.

Example text: t1 t2 t3 t4 t2
Example query: t2 t3 "t1 t2"
Current highlighting: [t1 t2] [t3] t4 t2
Correct highlighting: [t1 t2] [t3] t4 [t2]

The problem comes from the fact that we keep a Map<termText, WeightedSpanTerm>, 
and if the same term occurs in a Phrase or Span query the resulting 
WeightedSpanTerm will have a positionSensitive=true, whereas terms added from 
TermQuery have positionSensitive=false. The end result for this particular term 
will depend on the order in which the clauses are processed.

My fix is to use a subclass of Map, which on put() always sets the result to 
the most lax setting, i.e. if we already have a term with 
positionSensitive=true, and we try to put() a term with 
positionSensitive=false, we set the result positionSensitive=false, as it will 
match both cases.
        Summary: WeightedSpanTermExtractor incorrectly treats the same terms 
occurring in different query types  (was: WeightedSpanTermExtractor doesn')

> WeightedSpanTermExtractor incorrectly treats the same terms occurring in 
> different query types
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1285
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1285
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>            Reporter: Andrzej Bialecki 
>             Fix For: 2.4
>
>
> Given a BooleanQuery with multiple clauses, if a term occurs both in a Span / 
> Phrase query, and in a TermQuery, the results of term extraction are 
> unpredictable and depend on the order of clauses. Concequently, the result of 
> highlighting are incorrect.
> Example text: t1 t2 t3 t4 t2
> Example query: t2 t3 "t1 t2"
> Current highlighting: [t1 t2] [t3] t4 t2
> Correct highlighting: [t1 t2] [t3] t4 [t2]
> The problem comes from the fact that we keep a Map<termText, 
> WeightedSpanTerm>, and if the same term occurs in a Phrase or Span query the 
> resulting WeightedSpanTerm will have a positionSensitive=true, whereas terms 
> added from TermQuery have positionSensitive=false. The end result for this 
> particular term will depend on the order in which the clauses are processed.
> My fix is to use a subclass of Map, which on put() always sets the result to 
> the most lax setting, i.e. if we already have a term with 
> positionSensitive=true, and we try to put() a term with 
> positionSensitive=false, we set the result positionSensitive=false, as it 
> will match both cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

Reply via email to