On Jun 2, 2005, at 9:06 PM, Bob Cheung wrote:
Btw, I did try running the lucene demo (web template) to index the HTML files after I added one including English and Chinese characters. I was not able to search for any Chinese in that HTML file (returned no hits).
I wonder whether I need to change some of the java programs to index
Chinese and/or accept Chinese as search term. I was able to search for
the HTML file if I used English word that appeared in the added HTML
file.

Bob - Andy provided thorough information on the StandardAnalyzer issue (in short, it deals with Unicode directly not encodings). As for the Lucene demo - you will have to adjust it to read the files in the proper encoding. The IndexFiles program indexes files using the default encoding which won't be sufficient for your purpose. The two files to check are HtmlDocument and FileDocument. These files read the HTML and text files that the demo indexes.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to