[
https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Keil updated LUCENE-2013:
----------------------------------
Attachment: lucene-2013-2009-10-28.patch
Patch for LUCENE-2013
> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
> Key: LUCENE-2013
> URL: https://issues.apache.org/jira/browse/LUCENE-2013
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/highlighter
> Affects Versions: 2.9
> Environment: Lucene-Java 2.9
> Reporter: Benjamin Keil
> Attachments: lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their
> queries before submitting them to QueryScorer:
> bq.{{------------------------------------------------------------------------
> r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> LUCENE-1685: The position aware SpanScorer has become the default scorer for
> Highlighting. The SpanScorer implementation has replaced QueryScorer and the
> old term highlighting QueryScorer has been renamed to QueryTermScorer.
> Multi-term queries are also now expanded by default. If you were previously
> rewritting the query for multi-term query highlighting, you should no longer
> do that (unless you switch to using QueryTermScorer). The SpanScorer API (now
> QueryScorer) has also been improved to more closely match the API of the
> previous QueryScorer implementation.
> ------------------------------------------------------------------------}}
> This is a great convenience for the most part, but it's causing me
> difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}}
> uses {{Query.extractTerms()}} to collect the fields used in the query, but
> {{SpanRegexQuery}} does not implement this method, so highlighting any query
> with a {{SpanRegexQuery}} throws an UnsupportedOpertationException. If this
> issue is circumvented, there is still the issue of {{SpanRegexQuery}}
> throwing an exception when someone calls its {{getSpans()}} method.
> I can provide the patch that I am currently using, but I'm not sure that my
> solution is optimal. It adds two methods to {{SpanQuery}}:
> {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for
> everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}}
> which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}},
> and is overridden in each composite {{SpanQuery}} to return a value depending
> on its components. In this way {{SpanRegexQuery}} (and any other custom
> {{SpanQuery}}s) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a
> single step. In the proposed patch the {{WeightedSpanTerm}} extraction from
> a {{SpanQuery}} proceeds in two steps. First, if the {{QueryScorer}}'s field
> is {{null}}, then the fields are collected from the {{SpanQuery}} using the
> {{extractFields()}} method. Second the terms are collected using
> {{extractTerms()}}, rewriting the query for each field if
> {{mustBeRewrittenToGetSpans()}} returns {{true}}.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]