Re: stemming (maybe?) question
Yonik Seeley wrote: Not sure... I just took the stock solr example, and it worked fine. I inserted o'meara into example/exampledocs/solr.xml field name=featuresAdvanced o'meara Full-Text Search Capabilities using Lucene/field the indexed everything: ./post.sh *.xml Then queried in various ways: q=o'meara q=omeara q=o%20meara All of the queries found the solr doc. i grabbed the original example schema.xml and made my username field use the following definition: fieldType name=text_user class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType i removed the stopwords and porter stuff because for proper names i don't want that. seems to work fine now, thanks! -jsd-
Re: stemming (maybe?) question
Yonik Seeley wrote: On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote: is it possible to make solr think that omeara and o'meara are the same thing? WordDelimiter would handle it if the document had o'meara (but you may or may not want the other stuff that comes with WordDelimiterFilter). You could also use a PatternReplaceFilter to normalize tokens like this. the document does have o'meara in it. i tried creating a new field type based on the wiki information. fieldType name=text_user class=solr.TextField positionIncrementGap=100 fieldtype name=subword class=solr.TextField analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype /fieldType i reindexed everything but now any search on that field returns zero results. what did i do wrong? -jsd-
Re: stemming (maybe?) question
Not sure... I just took the stock solr example, and it worked fine. I inserted o'meara into example/exampledocs/solr.xml field name=featuresAdvanced o'meara Full-Text Search Capabilities using Lucene/field the indexed everything: ./post.sh *.xml Then queried in various ways: q=o'meara q=omeara q=o%20meara All of the queries found the solr doc. -Yonik http://www.lucidimagination.com On Mon, Mar 16, 2009 at 8:34 PM, Jon Drukman jdruk...@gmail.com wrote: Yonik Seeley wrote: On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote: is it possible to make solr think that omeara and o'meara are the same thing? WordDelimiter would handle it if the document had o'meara (but you may or may not want the other stuff that comes with WordDelimiterFilter). You could also use a PatternReplaceFilter to normalize tokens like this. the document does have o'meara in it. i tried creating a new field type based on the wiki information. fieldType name=text_user class=solr.TextField positionIncrementGap=100 fieldtype name=subword class=solr.TextField analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype /fieldType i reindexed everything but now any search on that field returns zero results. what did i do wrong? -jsd-
stemming (maybe?) question
is it possible to make solr think that omeara and o'meara are the same thing? -jsd-
Re: stemming (maybe?) question
On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote: is it possible to make solr think that omeara and o'meara are the same thing? WordDelimiter would handle it if the document had o'meara (but you may or may not want the other stuff that comes with WordDelimiterFilter). You could also use a PatternReplaceFilter to normalize tokens like this. -Yonik http://www.lucidimagination.com