org.apache.lucene.demo.FileDocument.Document(File) is invoked from IndexFiles and does:
Reader reader = new BufferedReader(new InputStreamReader(is)); Notice that the InputStreamReader does not specify an encoding so your default encoding is being used. You should probably write your own glue to create your index that knows how to properly read your files in the appropriate encoding. Also, please don't cross post to the dev and user lists, this is an appropriate question for the user list. Eric -----Original Message----- From: Satish Kagathare [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 08, 2004 3:07 AM To: Lucene Users List; [EMAIL PROTECTED] Subject: Does Lucene support UNICODE? Hello, Does Lucene support UNICODE search and indexing of UNICODE data(especially..Devnagari unicode data)? Does it make any difference between utf-8 & utf-16 unicode docs? Bcoz java strings supports utf-16. Bcoz i tried indexing(using indexFiles & indexHTML from lucene Demo) devnagari uni data(utf-8 & utf-16) & seraching for query using tomcat, but it shows only utf-8 files and also shows files which does not contain query. Also It does not show summary of fetched docs in correct format. Also i have changed unicode range in HTMLparser.jj, StandardTokenizer.jj & QueryParser.jj and analyzer while indexing and parsing query but it does not reflect any changes in output. shall i have to write my own analyzer for devnagari unicode data or Standaranalyzer will work for any languages? Or does it require more changes? Plz mention problems and solutions. Thanks in advance Satish Kagathara, IIT Bombay. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]