How to query/search unicoded docs in lucene using unicode text as query?

KK Thu, 21 May 2009 06:27:22 -0700

Hi All,
I've indexed some docs[non-english] in unicoded utf=8 format. For both
indexing as well as searching/querying I'm using simpleanalyzer. For english
texts when I tried with single words its working then I thought of trying
for non-english texts. So I wrote those words[multiple words] in babelmap[a
unicode converter] and got the unicode for the text string and tried that as
query but it din't work. Earlier I've used the same method to query solr
index which use lucene at the backend. I tried say this query,
\u0938\u0941\u0939\u093E\u0928\u093E\u0020\u0938\u092B\u093C\u0930
which is unicoded for some non-english text, but this give me zero search
result in lucene. I want to know whats going wrong. As I know at the end of
the day lucene writes my non-english texts in unicodes, so if I'm reading
say the index it'll have this kind of characters on the disk, right? So when
I query using the same thing it should work. This used to work perfectly
well with Solr where I was indexing all docs in unicode utf-8 encoding and
the query was also unicoded as show above. Can someone point me what is
going wrong here?
May be I've to have a look over the analyzer solr was using in the default
setting[i used the default setting only, and pretty sure it was using lot
many analyzers/filter factory]. Thanks for all your time and appreciation.


Thanks,
KK.

How to query/search unicoded docs in lucene using unicode text as query?

Reply via email to