Federico Grillini created SOLR-12808:
----------------------------------------
Summary: Wrong highlighting using PatternReplaceCharFilterFactory
Key: SOLR-12808
URL: https://issues.apache.org/jira/browse/SOLR-12808
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: highlighter
Affects Versions: 7.5, 7.4, 7.2.1
Environment: Java: Oracle Corporation Java HotSpot(TM) 64-Bit Server
VM 1.8.0_162 25.162-b12
OS: Linux Debian 8.11
Reporter: Federico Grillini
Attachments: text_analysis.png
Hi,
the default highlighter seems to work badly in conjunction with
PatternReplaceCharFilterFactory.
My query is: {{verb_esame_num_tnv:(00031665 0035 9)}}
The field type used by the field "verb_esame_num_tnv" is:
{code:xml}
<fieldType name="text_num_verbale" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="^0*([0-9]+\s+[0-9]+\s+[0-9]+)$" replacement=" $1"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+"
replacement=" "/>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
{code}
I've attached a screenshot of the text analysis.
It seems that the highlighter uses the wrong offsets in the original text to
highligth the matched tokens.
Hope this helps.
Regards.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]