Re: Problem with words thats amost similar
Den 17. des. 2009 kl. 13.48 skrev Shalin Shekhar Mangar: 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar: For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ Ty Shalin. This is my schema.xml file fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no dice. Do I need to use the SnowballPorterFilterFactory? And do I need to reindex the documents? Actually EnglishPorterFilterFactory is the same as SnowballPorterFilterFactory with language=English. Both will work. You will need to re-index the documents. What I've done so far is to add both restaurant and restaurering to protwords.txt. I've also refeed a single document (with the keyword restaurering) to check that it no longer appears in a search result for restaurant. Do i have to refeed every document in the index? Or restart so that solr re-reads the protwords.txt-file (this is on a different installation(prod) then the one i restarted earlier(dev))? Steinar
Re: Problem with words thats amost similar
2009/12/18 Steinar Asbjørnsen steinar...@gmail.com What I've done so far is to add both restaurant and restaurering to protwords.txt. I've also refeed a single document (with the keyword restaurering) to check that it no longer appears in a search result for restaurant. Do i have to refeed every document in the index? Or restart so that solr re-reads the protwords.txt-file (this is on a different installation(prod) then the one i restarted earlier(dev))? Documents already added in the index have already gone through the analysis step and terms such as restaurering would have already been stemmed. Therefore, you'll need to re-index all documents which contained the words you have specified in protwords.txt. -- Regards, Shalin Shekhar Mangar.
Problem with words thats amost similar
Hi all. I have a delicate problem when it comes to two words that are rather similar in the way they are typed, but when it comes to the meaning of the word they are completely different. The actual words are restaurant (as in restaurant) and restaurering (as in restoration). Solr seems to think these words are similar enough to present hits on both of them in the same search result. Obviously this is not desirable. Is there a way to take care of such spesific cases without disabling solr functionality for stemming and/or plurals? Or would I need to disable stemming to make this special case disapear? I'm using the dismax query handler The field im querying against is of type text. Any help is apreciated :) Regards, Steinar
Re: Problem with words thats amost similar
2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Hi all. I have a delicate problem when it comes to two words that are rather similar in the way they are typed, but when it comes to the meaning of the word they are completely different. The actual words are restaurant (as in restaurant) and restaurering (as in restoration). Solr seems to think these words are similar enough to present hits on both of them in the same search result. Obviously this is not desirable. Is there a way to take care of such spesific cases without disabling solr functionality for stemming and/or plurals? Or would I need to disable stemming to make this special case disapear? For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ -- Regards, Shalin Shekhar Mangar.
Re: Problem with words thats amost similar
Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar: 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Hi all. I have a delicate problem when it comes to two words that are rather similar in the way they are typed, but when it comes to the meaning of the word they are completely different. The actual words are restaurant (as in restaurant) and restaurering (as in restoration). Solr seems to think these words are similar enough to present hits on both of them in the same search result. Obviously this is not desirable. Is there a way to take care of such spesific cases without disabling solr functionality for stemming and/or plurals? Or would I need to disable stemming to make this special case disapear? For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ Ty Shalin. This is my schema.xml file fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no dice. Do I need to use the SnowballPorterFilterFactory? And do I need to reindex the documents? Steinar
Re: Problem with words thats amost similar
2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar: For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ Ty Shalin. This is my schema.xml file fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no dice. Do I need to use the SnowballPorterFilterFactory? And do I need to reindex the documents? Actually EnglishPorterFilterFactory is the same as SnowballPorterFilterFactory with language=English. Both will work. You will need to re-index the documents. -- Regards, Shalin Shekhar Mangar.