RE: Removing whitespace
Thanks Alireza, Steven and Koji for the quick responses! I'll read up on those and give it a shot. Devon Baumgarten
RE: Removing whitespace
Thanks Alireza, Steven and Koji for the quick responses! I'll read up on those and give it a shot. Devon Baumgarten -Original Message- From: Alireza Salimi [mailto:alireza.sal...@gmail.com] Sent: Monday, December 12, 2011 4:08 PM To: solr-user@lucene.apache.org Subject: Re: Removing whitespace That sounds strange requirement, but I think you can use CharFilters instead of implementing your own Tokenizer. Take a look at this section, maybe it helps. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories The On Mon, Dec 12, 2011 at 4:51 PM, Devon Baumgarten < dbaumgar...@nationalcorp.com> wrote: > Hello, > > I am having trouble finding how to remove/ignore whitespace when indexing. > The only answer I have found suggested that it is necessary to write my own > tokenizer. Is this true? I want to remove whitespace and special characters > from the phrase and create N-grams from the result. > > Ultimately, the effect I am after is that searching "bobdole" would match > "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better way... > can anyone lend some assistance? > > Thanks! > > Dev B > > -- Alireza Salimi Java EE Developer
Re: Removing whitespace
(11/12/13 6:51), Devon Baumgarten wrote: Hello, I am having trouble finding how to remove/ignore whitespace when indexing. The only answer I have found suggested that it is necessary to write my own tokenizer. Is this true? I want to remove whitespace and special characters from the phrase and create N-grams from the result. How about using one of existing charfilters? https://builds.apache.org/job/Solr-3.x/javadoc/org/apache/solr/analysis/PatternReplaceCharFilterFactory.html https://builds.apache.org/job/Solr-3.x/javadoc/org/apache/solr/analysis/MappingCharFilterFactory.html koji -- Check out "Query Log Visualizer" for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
RE: Removing whitespace
Hi Devon, Something like this should work for you (untested!): Steve > -Original Message- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Monday, December 12, 2011 4:52 PM > To: 'solr-user@lucene.apache.org' > Subject: Removing whitespace > > Hello, > > I am having trouble finding how to remove/ignore whitespace when indexing. > The only answer I have found suggested that it is necessary to write my > own tokenizer. Is this true? I want to remove whitespace and special > characters from the phrase and create N-grams from the result. > > Ultimately, the effect I am after is that searching "bobdole" would match > "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better > way... can anyone lend some assistance? > > Thanks! > > Dev B
Re: Removing whitespace
That sounds strange requirement, but I think you can use CharFilters instead of implementing your own Tokenizer. Take a look at this section, maybe it helps. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories The On Mon, Dec 12, 2011 at 4:51 PM, Devon Baumgarten < dbaumgar...@nationalcorp.com> wrote: > Hello, > > I am having trouble finding how to remove/ignore whitespace when indexing. > The only answer I have found suggested that it is necessary to write my own > tokenizer. Is this true? I want to remove whitespace and special characters > from the phrase and create N-grams from the result. > > Ultimately, the effect I am after is that searching "bobdole" would match > "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better way... > can anyone lend some assistance? > > Thanks! > > Dev B > > -- Alireza Salimi Java EE Developer