encoding question.

Mohammad Norouzi Tue, 13 Feb 2007 21:47:29 -0800

Hi
I want to index data with utf-8 encoding, so when adding field to a document
I am using the code new String(value.getBytes("utf-8"))
in the other hand, when I am going to search I was using the same snippet
code to convert to utf-8 but it did not work so finally I found somewhere
that had been said to use new String(valueToSearch.getBytes("cp1252"),"UTF8")
and it worked fine but I still has some problem.
first, some characters are weird when I get result from lucene, It seems it
is in cp1252 encoding.
second, if the java environment property "file.encoding" not been cp1252 the
result is completely in incorrect encoding. so I must change this property
using System.setProperty("file.encoding","cp1252")


is lucene neglect my utf-8 encoding and proceed indexing data using cp1252?
how can I correct weird characters I received by searching?

Thank you very much in advance.
--
Regards,
Mohammad

encoding question.

Reply via email to