umlauts / diacritic expansion

Michael Sokolov Tue, 16 Apr 2019 11:08:29 -0700

I'm learning how to index/search German today and understanding that
vowels with umlauts are conventionally expanded into two ASCII
characters, eg  "für" -> "fuer", so people may search for the expanded
form "fuer", but they might also search with the diacritic, and
finally they might lazily search using the stripped form "fur".


My question: is there a standard CharFilter or TokenFilter that
expands to both (ASCII) forms, for characters with umlauts and perhaps
other diacritics I might be unaware of in other languages having
similar multiple renderings in ASCII?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

umlauts / diacritic expansion

Reply via email to