On Jan 21, 2005, at 4:49 AM, Eric Chow wrote:

How to create index with chinese (in utf-8 encoding ) HTML and search
with Lucene ?

Indexing and searching Chinese basically is no different than using English with Lucene. We covered a bit about it in Lucene in Action:


        http://www.lucenebook.com/search?query=chinese

And a screenshot here:

        http://www.blogscene.org/erik/LuceneInAction/i18n.html

The main issues of dealing with Chinese, and of course other languages, are encoding concerns in both indexing and querying of reading in the text and analysis (as you can see from the screenshot).

Lucene itself works with Unicode fine and you're free to index anything.

        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to