Andi Vajda napisaĆ(a):
It seems that I cann't properly store UTF-8 encoded documents using
PyLucene (by "properly" I mean the documents are searchable and can be
returned in form they have been stored). Should I use only unicode
objects in my search/indexing machinery code, as PyLucene returns
search result's fields as unicode objects?
PyLucene wraps Java Lucene by compiling it with gcj. Java only uses
Unicode.
If you pass utf-8 strings to PyLucene APIs, they are converted to
Unicode before being passed to the wrapped Java Lucene APIs because
that's all they understand.
Conversion from byte-strings to unicode assumes some knowledge of source
encoding so I expect this to be a source of problems (not counting
confusion like mine)...
Thank you all.
--
Jarek Zgoda
"We read Knuth so you don't have to."
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev