Hello, 

I am trying to create a suggest search (search results are displayed while the 
user is entering the query) for names, but the search should also give results 
if the given name just sounds like an indexed name. However a perfect match 
should be ranked higher than a similar sounding match. 


I looked at the SpellChecker contrib, but this AFAIK cannot handle incomplete 
names (edge n-grams). 


So I came up with this idea and it would be great if anyone could tell me if 
that is sensible or if there is a better way: 


I create an analyzer to be run on the full names, which does the following 
- lowercase 
- build edge n-grams 
put these terms in the field (this would handle correctly spelled input) 


- run soundex on the n-grams 
put there soundexed n-grams in the field as well 


The incoming query will then also run through this analyzer with an or-default. 
So a correct spelling will match the normal n-grams plus the soundexed n-grams 
leading to a good score. A missspelled name would still match the soundexed 
n-grams, leading to a somewhat lower score. 


My current problem is that I don't know how to duplicate the tokens in the 
analyzer so I can add them as normal n-grams and soundexed n-grams. I suppose 
the TeeSinkTokenFilter will get me there, but I could not figure out how to add 
all tokens back in one stream. 


To recap, my questions are: Could this approach work to create a "fuzzy 
suggest"? How do I use the TeeSinkTokenFilter to separate and recombine the 
tokenstream. 


I hope that was clear, thanks for your help! 

        

Kai     




Regelung im Bezug auf Paragraph 37a Absatz 4 HGB: WidasConcepts GmbH,
Geschaeftsfuehrer: Thomas Widmann und Christian Kappert,
Gerichtsstand Pforzheim, Registernummer: HRB 511442, 
Umsatzsteueridentifikationsnummer: DE205851091

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht 
gestattet.

This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail in error) 
please
notify the sender immediately and destroy this e-mail.
Any unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

Reply via email to