Re: Problem with words thats amost similar

2009-12-18 Thread Steinar Asbjørnsen
Den 17. des. 2009 kl. 13.48 skrev Shalin Shekhar Mangar:

 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com
 
 Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:
 
 
 
 For specific cases like this, you can add the word to a file and specify
 it
 in schema, for example:
 
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
 
 Ty Shalin.
 
 This is my schema.xml file
 fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType
 
 I added restaurant and restaurering to protwords.txt, restarted Tomcat, but
 no dice.
 Do I need to use the SnowballPorterFilterFactory?
 And do I need to reindex the documents?
 
 
 Actually EnglishPorterFilterFactory is the same as
 SnowballPorterFilterFactory with language=English. Both will work. You
 will need to re-index the documents.

What I've done so far is to add both restaurant and restaurering to 
protwords.txt.
I've also refeed a single document (with the keyword restaurering) to check 
that it no longer appears in a search result for restaurant.
Do i have to refeed every document in the index?
Or restart so that solr re-reads the protwords.txt-file (this is on a different 
installation(prod) then the one i restarted earlier(dev))?

Steinar

Re: Problem with words thats amost similar

2009-12-18 Thread Shalin Shekhar Mangar
2009/12/18 Steinar Asbjørnsen steinar...@gmail.com


 What I've done so far is to add both restaurant and restaurering to
 protwords.txt.
 I've also refeed a single document (with the keyword restaurering) to
 check that it no longer appears in a search result for restaurant.
 Do i have to refeed every document in the index?
 Or restart so that solr re-reads the protwords.txt-file (this is on a
 different installation(prod) then the one i restarted earlier(dev))?


Documents already added in the index have already gone through the analysis
step and terms such as restaurering would have already been stemmed.
Therefore, you'll need to re-index all documents which contained the words
you have specified in protwords.txt.
-- 
Regards,
Shalin Shekhar Mangar.


Problem with words thats amost similar

2009-12-17 Thread Steinar Asbjørnsen
Hi all.

I have a delicate problem when it comes to two words that are rather similar in 
the way they are typed, but when it comes to the meaning of the word they are 
completely different.
The actual words are restaurant (as in restaurant) and restaurering (as in 
restoration).

Solr seems to think these words are similar enough to present hits on both of 
them in the same search result.
Obviously this is not desirable.

Is there a way to take care of such spesific cases without disabling solr 
functionality for stemming and/or plurals?
Or would I need to disable stemming to make this special case disapear?

I'm using the dismax query handler
The field  im querying against is of type text.

Any help is apreciated :)

Regards,
Steinar

Re: Problem with words thats amost similar

2009-12-17 Thread Shalin Shekhar Mangar
2009/12/17 Steinar Asbjørnsen steinar...@gmail.com

 Hi all.

 I have a delicate problem when it comes to two words that are rather
 similar in the way they are typed, but when it comes to the meaning of the
 word they are completely different.
 The actual words are restaurant (as in restaurant) and restaurering (as in
 restoration).

 Solr seems to think these words are similar enough to present hits on both
 of them in the same search result.
 Obviously this is not desirable.

 Is there a way to take care of such spesific cases without disabling solr
 functionality for stemming and/or plurals?
 Or would I need to disable stemming to make this special case disapear?


For specific cases like this, you can add the word to a file and specify it
in schema, for example:

filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem with words thats amost similar

2009-12-17 Thread Steinar Asbjørnsen
Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:

 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com
 
 Hi all.
 
 I have a delicate problem when it comes to two words that are rather
 similar in the way they are typed, but when it comes to the meaning of the
 word they are completely different.
 The actual words are restaurant (as in restaurant) and restaurering (as in
 restoration).
 
 Solr seems to think these words are similar enough to present hits on both
 of them in the same search result.
 Obviously this is not desirable.
 
 Is there a way to take care of such spesific cases without disabling solr
 functionality for stemming and/or plurals?
 Or would I need to disable stemming to make this special case disapear?
 
 
 For specific cases like this, you can add the word to a file and specify it
 in schema, for example:
 
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/

Ty Shalin.

This is my schema.xml file
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no 
dice.
Do I need to use the SnowballPorterFilterFactory?
And do I need to reindex the documents?

Steinar

Re: Problem with words thats amost similar

2009-12-17 Thread Shalin Shekhar Mangar
2009/12/17 Steinar Asbjørnsen steinar...@gmail.com

 Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:

 
 
  For specific cases like this, you can add the word to a file and specify
 it
  in schema, for example:
 
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/

 Ty Shalin.

 This is my schema.xml file
 fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 I added restaurant and restaurering to protwords.txt, restarted Tomcat, but
 no dice.
 Do I need to use the SnowballPorterFilterFactory?
 And do I need to reindex the documents?


Actually EnglishPorterFilterFactory is the same as
SnowballPorterFilterFactory with language=English. Both will work. You
will need to re-index the documents.

-- 
Regards,
Shalin Shekhar Mangar.