Find results with or without whitespace
I'm looking for a way to index/search on terms that may or may not contain spaces. An example will explain better : - Loooking for healthcare, I want to find both healthcare and health care. - Loooking for health care, I want to find both health care and healthcare. My other constraints are - I will index rather long strings (extracted from Office documents) - I want to avoid synonym lists (as they may be incomplete) - I want to avoid specific logic (i.e. query rewriting with as many OR as search terms combination requires) - I don't want to rely on uppercase/lowercase tokenizer (as users are... creative) I already tried many tokenizer/filter combination without success. I did not find any answer to this problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Find results with or without whitespace
Thank you for your answer. I agree, I can manage predictable values through synonyms. However most data in this index are company and product names, leading sometimes to rather strange syntax (mix of upper/lower case, misplaced dash or spaces). One purpose to using solr was to help in finding potential duplicates before data insertion. On another hand I could write a custom tokenizer/filter and a custom query builder that would test many combinations. I have the feeling however it is an inefficient approach. That is... Indexing : chelsea soccer club = chelsea,soccer,club,chelseasoccer,soccerclub,chelseasoccerclub Searching : chelsea soccerclub = chelsea and soccerclub or chelseasoccerclub While search expressions are generally short, indexation will be a nightmare... -- View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117581.html Sent from the Solr - User mailing list archive at Nabble.com.