[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Benjamin Keil (JIRA) Wed, 28 Oct 2009 14:53:28 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Keil updated LUCENE-2013:
----------------------------------

    Description: 
Since the resolution of #LUCENE-1685, users are not supposed to rewrite their 
queries before submitting them to QueryScorer:

bq.------------------------------------------------------------------------
bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
bq.
bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for 
Highlighting. The SpanScorer implementation has replaced QueryScorer and the 
old term highlighting QueryScorer has been renamed to QueryTermScorer. 
Multi-term queries are also now expanded by default. If you were previously 
rewritting the query for multi-term query highlighting, you should no longer do 
that (unless you switch to using QueryTermScorer). The SpanScorer API (now 
QueryScorer) has also been improved to more closely match the API of the 
previous QueryScorer implementation.
bq.------------------------------------------------------------------------

This is a great convenience for the most part, but it's causing me difficulties 
with SpanRegexQuerys, as the WeightedSpanTermExtractor uses 
Query.extractTerms() to collect the fields used in the query, but 
SpanRegexQuery does not implement this method, so highlighting any query with a 
SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is 
circumvented, there is still the issue of SpanRegexQuery throwing an exception 
when someone calls its getSpans() method.

I can provide the patch that I am currently using, but I'm not sure that my 
solution is optimal.  It adds two methods to SpanQuery: 
extractFields(Set<String> fields) which is equivalent to fields.add(getField()) 
except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() 
which returns true for SpanQuery, false for SpanTermQuery, and is overridden in 
each composite SpanQuery to return a value depending on its components.  In 
this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be 
adjusted.

Currently the collection of fields and non-weighted terms are done in a single 
step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery 
proceeds in two steps.  First, if the QueryScorer's field is null, then the 
fields are collected from the SpanQuery using the extractFields() method.  
Second the terms are collected using extractTerms(), rewriting the query for 
each field if mustBeRewrittenToGetSpans() returns true.

  was:
Since the resolution of #LUCENE-1685, users are not supposed to rewrite their 
queries before submitting them to QueryScorer:

bq.{{------------------------------------------------------------------------
r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line

LUCENE-1685: The position aware SpanScorer has become the default scorer for 
Highlighting. The SpanScorer implementation has replaced QueryScorer and the 
old term highlighting QueryScorer has been renamed to QueryTermScorer. 
Multi-term queries are also now expanded by default. If you were previously 
rewritting the query for multi-term query highlighting, you should no longer do 
that (unless you switch to using QueryTermScorer). The SpanScorer API (now 
QueryScorer) has also been improved to more closely match the API of the 
previous QueryScorer implementation.
------------------------------------------------------------------------}}

This is a great convenience for the most part, but it's causing me difficulties 
with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses 
{{Query.extractTerms()}} to collect the fields used in the query, but 
{{SpanRegexQuery}} does not implement this method, so highlighting any query 
with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.  If this 
issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing 
an exception when someone calls its {{getSpans()}} method.

I can provide the patch that I am currently using, but I'm not sure that my 
solution is optimal.  It adds two methods to {{SpanQuery}}: 
{{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for 
everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} 
which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and 
is overridden in each composite {{SpanQuery}} to return a value depending on 
its components.  In this way {{SpanRegexQuery}} (and any other custom 
{{SpanQuery}}s) do not need to be adjusted.

Currently the collection of fields and non-weighted terms are done in a single 
step.  In the proposed patch the {{WeightedSpanTerm}} extraction from a 
{{SpanQuery}} proceeds in two steps.  First, if the {{QueryScorer}}'s field is 
{{null}}, then the fields are collected from the {{SpanQuery}} using the 
{{extractFields()}} method.  Second the terms are collected using 
{{extractTerms()}}, rewriting the query for each field if 
{{mustBeRewrittenToGetSpans()}} returns {{true}}.


Removed failed attempts at {{monospace font}} in description.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28-2135.patch, 
> lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their 
> queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 
> line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer 
> for Highlighting. The SpanScorer implementation has replaced QueryScorer and 
> the old term highlighting QueryScorer has been renamed to QueryTermScorer. 
> Multi-term queries are also now expanded by default. If you were previously 
> rewritting the query for multi-term query highlighting, you should no longer 
> do that (unless you switch to using QueryTermScorer). The SpanScorer API (now 
> QueryScorer) has also been improved to more closely match the API of the 
> previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me 
> difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses 
> Query.extractTerms() to collect the fields used in the query, but 
> SpanRegexQuery does not implement this method, so highlighting any query with 
> a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is 
> circumvented, there is still the issue of SpanRegexQuery throwing an 
> exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my 
> solution is optimal.  It adds two methods to SpanQuery: 
> extractFields(Set<String> fields) which is equivalent to 
> fields.add(getField()) except when MaskedFieldQuerys get involved, and 
> mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for 
> SpanTermQuery, and is overridden in each composite SpanQuery to return a 
> value depending on its components.  In this way SpanRegexQuery (and any other 
> custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a 
> single step.  In the proposed patch the WeightedSpanTerm extraction from a 
> SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, 
> then the fields are collected from the SpanQuery using the extractFields() 
> method.  Second the terms are collected using extractTerms(), rewriting the 
> query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Reply via email to