Re: stemming (maybe?) question

2009-03-17 Thread Jon Drukman

Yonik Seeley wrote:

Not sure... I just took the stock solr example, and it worked fine.

I inserted o'meara into example/exampledocs/solr.xml
 field name=featuresAdvanced o'meara Full-Text Search
Capabilities using Lucene/field

the indexed everything:  ./post.sh *.xml

Then queried in various ways:
q=o'meara
q=omeara
q=o%20meara

All of the queries found the solr doc.


i grabbed the original example schema.xml and made my username field use 
the following definition:


fieldType name=text_user class=solr.TextField 
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


i removed the stopwords and porter stuff because for proper names i 
don't want that.


seems to work fine now, thanks!
-jsd-



Re: stemming (maybe?) question

2009-03-16 Thread Jon Drukman

Yonik Seeley wrote:

On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote:

is it possible to make solr think that omeara and o'meara are the same
thing?


WordDelimiter would handle it if the document had o'meara (but you
may or may not want the other stuff that comes with
WordDelimiterFilter).
You could also use a PatternReplaceFilter to normalize tokens like this.


the document does have o'meara in it.  i tried creating a new field type 
based on the wiki information.


fieldType name=text_user class=solr.TextField 
positionIncrementGap=100

  fieldtype name=subword class=solr.TextField
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype
/fieldType


i reindexed everything but now any search on that field returns zero 
results.  what did i do wrong?


-jsd-



Re: stemming (maybe?) question

2009-03-16 Thread Yonik Seeley
Not sure... I just took the stock solr example, and it worked fine.

I inserted o'meara into example/exampledocs/solr.xml
 field name=featuresAdvanced o'meara Full-Text Search
Capabilities using Lucene/field

the indexed everything:  ./post.sh *.xml

Then queried in various ways:
q=o'meara
q=omeara
q=o%20meara

All of the queries found the solr doc.

-Yonik
http://www.lucidimagination.com


On Mon, Mar 16, 2009 at 8:34 PM, Jon Drukman jdruk...@gmail.com wrote:
 Yonik Seeley wrote:

 On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote:

 is it possible to make solr think that omeara and o'meara are the
 same
 thing?

 WordDelimiter would handle it if the document had o'meara (but you
 may or may not want the other stuff that comes with
 WordDelimiterFilter).
 You could also use a PatternReplaceFilter to normalize tokens like this.

 the document does have o'meara in it.  i tried creating a new field type
 based on the wiki information.

 fieldType name=text_user class=solr.TextField
 positionIncrementGap=100
  fieldtype name=subword class=solr.TextField
      analyzer type=query
          tokenizer class=solr.WhitespaceTokenizerFactory/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=0
                catenateNumbers=0
                catenateAll=0
                /
          filter class=solr.LowerCaseFilterFactory/
      /analyzer
      analyzer type=index
          tokenizer class=solr.WhitespaceTokenizerFactory/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=0
                /
          filter class=solr.LowerCaseFilterFactory/
      /analyzer
    /fieldtype
 /fieldType


 i reindexed everything but now any search on that field returns zero
 results.  what did i do wrong?

 -jsd-


stemming (maybe?) question

2009-03-12 Thread Jon Drukman
is it possible to make solr think that omeara and o'meara are the 
same thing?


-jsd-



Re: stemming (maybe?) question

2009-03-12 Thread Yonik Seeley
On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote:
 is it possible to make solr think that omeara and o'meara are the same
 thing?

WordDelimiter would handle it if the document had o'meara (but you
may or may not want the other stuff that comes with
WordDelimiterFilter).
You could also use a PatternReplaceFilter to normalize tokens like this.

-Yonik
http://www.lucidimagination.com