Hi All, I've indexed some docs[non-english] in unicoded utf=8 format. For both indexing as well as searching/querying I'm using simpleanalyzer. For english texts when I tried with single words its working then I thought of trying for non-english texts. So I wrote those words[multiple words] in babelmap[a unicode converter] and got the unicode for the text string and tried that as query but it din't work. Earlier I've used the same method to query solr index which use lucene at the backend. I tried say this query, \u0938\u0941\u0939\u093E\u0928\u093E\u0020\u0938\u092B\u093C\u0930 which is unicoded for some non-english text, but this give me zero search result in lucene. I want to know whats going wrong. As I know at the end of the day lucene writes my non-english texts in unicodes, so if I'm reading say the index it'll have this kind of characters on the disk, right? So when I query using the same thing it should work. This used to work perfectly well with Solr where I was indexing all docs in unicode utf-8 encoding and the query was also unicoded as show above. Can someone point me what is going wrong here? May be I've to have a look over the analyzer solr was using in the default setting[i used the default setting only, and pretty sure it was using lot many analyzers/filter factory]. Thanks for all your time and appreciation.
Thanks, KK.