[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-10-24 Thread Jonathan Cummins (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483292#comment-13483292
 ] 

Jonathan Cummins commented on SOLR-3390:


I think you can fix it by using a custom synonym filter factory and without 
setting the luceneMatchVersion to LUCENE_33 in the solrconfig.xml.

You can just do something like:

package your.package.name;

public class CustomSynonymFilterFactory extends SynonymFilterFactory {

@Override
 public void init(MapString,String args){
this.setLuceneMatchVersion(Version.LUCENE_33);
super.init(args);
 }
}

And then, in your schema, you can do something like this:

filter class=your.package.name.CustomSynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/

And that will let it use the SlowSynonymFilter from solr 3.3 for just the 
synonyms without changing the luceneMatchVersion in solrconfig.xml. It works 
basically by tricking the SynonymFilterFactory class into thinking the lucene 
version is 3.3 without it actually being 3.3.

Hope that helps out!


 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-08-16 Thread Angelo Quaglia (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435800#comment-13435800
 ] 

Angelo Quaglia commented on SOLR-3390:
--

There is also issue LUCENE-3668.
Is luceneMatchVersionLUCENE_33/luceneMatchVersion with Solr 3.6.1 an 
officially supported solution?
Is that good for a production system?


 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-06-05 Thread Okke Klein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289270#comment-13289270
 ] 

Okke Klein commented on SOLR-3390:
--

Using multi word synonyms works a lot better in LUCENE_33 because of the way 
SlowSynonymFilter handles them. Is there a way to get the same behavior with 
the new filter?

 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-04-23 Thread Rahul Babulal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259896#comment-13259896
 ] 

Rahul Babulal commented on SOLR-3390:
-

Thank you for the details. 
For now, I am setting the luceneMatchVersion to LUCENE_33. This sort of** fixes 
the highlighting issue. I am still testing to see if there are any other side 
effects of that. Do you guys now of any problems with setting the 
luceneMatchVersion to LUCENE_33. 

I will keep an eye on this issue. 


**The reason why I say it sort of works is that when I search name, in my 
case dns, domain name search,  it matches with the document which has dns in 
its index, that's because I have expand set to true. 

 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-04-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259278#comment-13259278
 ] 

Jan Høydahl commented on SOLR-3390:
---

This is due to how the multi word synonym is inserted at the same position as 
the original term, and we have no way to tell whether you match the synonym or 
the original term since that information is lost after Analysis processing.

This case would be solved by encoding term positions as a graph in such a way 
that the synonym node domain name system would occupy the same position as 
the original node dns. This however would be a major change.

 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3390) Highlighting issue with multi-word synonyms causes to highlight the wrong terms

2012-04-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259307#comment-13259307
 ] 

Michael McCandless commented on SOLR-3390:
--

This is a hard problem to solve (indexing a graph).

We've made some recent baby steps towards solving it, though: token streams can 
now include the PositionLengthAttribute, indicating how many positions an 
alternate path spans.  SynonymFilter now sets this attribute only in certain 
cases (when the inserted syn is a single token).  Still, we then drop this attr 
during indexing...

Handling the case when the inserted syn is multi-word is tricky... I think dns 
would have to be changed to have posLen=3.

 Highlighting issue with multi-word synonyms causes to highlight the wrong 
 terms
 ---

 Key: SOLR-3390
 URL: https://issues.apache.org/jira/browse/SOLR-3390
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 3.6
 Environment: Windows 7. (Development machine, not the server) 
Reporter: Rahul Babulal
  Labels: highlighter, multi-word, solr, synonyms

 I am using solr 3.6 and when I have multi-words synonyms the highlighting 
 results have the wrong word highlighted. 
 If I have the below entry in the synonyms file:
 dns, domain name system 
 If I index something like: A sample dns entry explaining the details.
 Searching for name (without quotes) in the highlight results/snippets I get 
 :  A sample dns ementry/em explaining the details. (The token entry 
 overlaps with the token name in the analysis.jsp)
 Searching for system (without quotes) in the highlight results/snippets I 
 get :  A sample dns entry emexplaining/em the details. (The token 
 explaining overlaps with the token system in the analysis.jsp)
 Here is my schema field Type:
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org