Re: [pylucene-dev] Document encoding?

Jarek Zgoda Thu, 08 Mar 2007 00:13:40 -0800

Andi Vajda napisał(a):

It seems that I cann't properly store UTF-8 encoded documents usingPyLucene (by "properly" I mean the documents are searchable and can bereturned in form they have been stored). Should I use only unicodeobjects in my search/indexing machinery code, as PyLucene returnssearch result's fields as unicode objects?
PyLucene wraps Java Lucene by compiling it with gcj. Java only usesUnicode.If you pass utf-8 strings to PyLucene APIs, they are converted toUnicode before being passed to the wrapped Java Lucene APIs becausethat's all they understand.

Conversion from byte-strings to unicode assumes some knowledge of sourceencoding so I expect this to be a source of problems (not countingconfusion like mine)...


Thank you all.

--
Jarek Zgoda

"We read Knuth so you don't have to."
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] Document encoding?

Reply via email to