[
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259307#comment-13259307
]
Michael McCandless commented on SOLR-3390:
------------------------------------------
This is a hard problem to solve (indexing a graph).
We've made some recent baby steps towards solving it, though: token streams can
now include the PositionLengthAttribute, indicating how many positions an
"alternate path" spans. SynonymFilter now sets this attribute only in certain
cases (when the inserted syn is a single token). Still, we then drop this attr
during indexing...
Handling the case when the inserted syn is multi-word is tricky... I think dns
would have to be changed to have posLen=3.
> Highlighting issue with multi-word synonyms causes to highlight the wrong
> terms
> -------------------------------------------------------------------------------
>
> Key: SOLR-3390
> URL: https://issues.apache.org/jira/browse/SOLR-3390
> Project: Solr
> Issue Type: Bug
> Components: highlighter, query parsers
> Affects Versions: 3.6
> Environment: Windows 7. (Development machine, not the server)
> Reporter: Rahul Babulal
> Labels: highlighter, multi-word, solr, synonyms
>
> I am using solr 3.6 and when I have multi-words synonyms the highlighting
> results have the wrong word highlighted.
> If I have the below entry in the synonyms file:
> dns, domain name system
> If I index something like: "A sample dns entry explaining the details".
> Searching for "name" (without quotes) in the highlight results/snippets I get
> : "A sample dns <em>entry</em> explaining the details". (The token "entry"
> overlaps with the token "name" in the analysis.jsp)
> Searching for "system" (without quotes) in the highlight results/snippets I
> get : "A sample dns entry <em>explaining</em> the details". (The token
> "explaining" overlaps with the token "system" in the analysis.jsp)
> Here is my schema field Type:
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]