[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483292#comment-13483292 ] Jonathan Cummins commented on SOLR-3390: I think you can fix it by using a custom synonym filter factory and without setting the luceneMatchVersion to LUCENE_33 in the solrconfig.xml. You can just do something like: package your.package.name; public class CustomSynonymFilterFactory extends SynonymFilterFactory { @Override public void init(MapString,String args){ this.setLuceneMatchVersion(Version.LUCENE_33); super.init(args); } } And then, in your schema, you can do something like this: filter class=your.package.name.CustomSynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ And that will let it use the SlowSynonymFilter from solr 3.3 for just the synonyms without changing the luceneMatchVersion in solrconfig.xml. It works basically by tricking the SynonymFilterFactory class into thinking the lucene version is 3.3 without it actually being 3.3. Hope that helps out! Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435800#comment-13435800 ] Angelo Quaglia commented on SOLR-3390: -- There is also issue LUCENE-3668. Is luceneMatchVersionLUCENE_33/luceneMatchVersion with Solr 3.6.1 an officially supported solution? Is that good for a production system? Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289270#comment-13289270 ] Okke Klein commented on SOLR-3390: -- Using multi word synonyms works a lot better in LUCENE_33 because of the way SlowSynonymFilter handles them. Is there a way to get the same behavior with the new filter? Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259896#comment-13259896 ] Rahul Babulal commented on SOLR-3390: - Thank you for the details. For now, I am setting the luceneMatchVersion to LUCENE_33. This sort of** fixes the highlighting issue. I am still testing to see if there are any other side effects of that. Do you guys now of any problems with setting the luceneMatchVersion to LUCENE_33. I will keep an eye on this issue. **The reason why I say it sort of works is that when I search name, in my case dns, domain name search, it matches with the document which has dns in its index, that's because I have expand set to true. Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259278#comment-13259278 ] Jan Høydahl commented on SOLR-3390: --- This is due to how the multi word synonym is inserted at the same position as the original term, and we have no way to tell whether you match the synonym or the original term since that information is lost after Analysis processing. This case would be solved by encoding term positions as a graph in such a way that the synonym node domain name system would occupy the same position as the original node dns. This however would be a major change. Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms
[ https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259307#comment-13259307 ] Michael McCandless commented on SOLR-3390: -- This is a hard problem to solve (indexing a graph). We've made some recent baby steps towards solving it, though: token streams can now include the PositionLengthAttribute, indicating how many positions an alternate path spans. SynonymFilter now sets this attribute only in certain cases (when the inserted syn is a single token). Still, we then drop this attr during indexing... Handling the case when the inserted syn is multi-word is tricky... I think dns would have to be changed to have posLen=3. Highlighting issue with multi-word synonyms causes to highlight the wrong terms --- Key: SOLR-3390 URL: https://issues.apache.org/jira/browse/SOLR-3390 Project: Solr Issue Type: Bug Components: highlighter, query parsers Affects Versions: 3.6 Environment: Windows 7. (Development machine, not the server) Reporter: Rahul Babulal Labels: highlighter, multi-word, solr, synonyms I am using solr 3.6 and when I have multi-words synonyms the highlighting results have the wrong word highlighted. If I have the below entry in the synonyms file: dns, domain name system If I index something like: A sample dns entry explaining the details. Searching for name (without quotes) in the highlight results/snippets I get : A sample dns ementry/em explaining the details. (The token entry overlaps with the token name in the analysis.jsp) Searching for system (without quotes) in the highlight results/snippets I get : A sample dns entry emexplaining/em the details. (The token explaining overlaps with the token system in the analysis.jsp) Here is my schema field Type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org