Hi, I've been looking for ways to do homophone matching in Solr for CJK languages. I am digging into Chinese for a start. My inputs are words made of simplified characters, and I need to match words that use different characters, but are pronounced the same way.
My conclusion is that I need to index all the possible pinyin representations for a given word. Then at query time, generate all pinyin representations for the searched word, and match all documents containing any one of them. My question is : which components can do that in Solr? I've been looking at ICUTokenFilterFactory, but with id="Han-Latin" it seems to to do a 1 to 1 mapping, between characters and pinyin, while in reality it should be a 1 to many mapping. Do you know of any Analyzer that could do something like : - input : 长 - output : cháng | zhǎng | zháng Thanks so much for your help!