NGram for misspelt words
I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Re: NGram for misspelt words
You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
RE: NGram for misspelt words
Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as I need substrings not just in front or back but anywhere. You are right I put the same NGramFilterFactory in both Query and Index however now it does not return any results not even the basic one. -Original Message- From: Dikchant Sahi [mailto:contacts...@gmail.com] Sent: Wednesday, July 18, 2012 7:54 PM To: solr-user@lucene.apache.org Subject: Re: NGram for misspelt words You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR ** BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR ** BR FAFLDBR PRE
Re: NGram for misspelt words
Have you tried the analysis window to debug. I believe you are doing something wrong in the fieldType. On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as I need substrings not just in front or back but anywhere. You are right I put the same NGramFilterFactory in both Query and Index however now it does not return any results not even the basic one. -Original Message- From: Dikchant Sahi [mailto:contacts...@gmail.com] Sent: Wednesday, July 18, 2012 7:54 PM To: solr-user@lucene.apache.org Subject: Re: NGram for misspelt words You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR ** BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR ** BR FAFLDBR PRE