=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__IndexingOtherLanguag?= =?iso-8859-1?q?es?=

lucene-cvs Thu, 08 Jul 2004 06:30:04 -0700

   Date: 2004-07-08T06:30:01
   Editor: 128.230.38.21 <>
   Wiki: Jakarta Lucene Wiki
   Page: IndexingOtherLanguages
   URL: http://wiki.apache.org/jakarta-lucene/IndexingOtherLanguages


   no comment

Change Log:

------------------------------------------------------------------------------
@@ -10,7 +10,7 @@
 
  1. Know the encoding of the documents you wish to index.  Java assumes the native 
encoding when reading in files unless you tell it otherwise.  To create a Reader that 
supports reading in other encodings, see 
[http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStreamReader.html 
InputStreamReader].  I find it easiest to convert all of my files to UTF-8 before 
indexing, and then I read them in by doing:[[BR]]
     `Reader reader = new InputStreamReader(new FileInputStream("path to file"), 
"UTF-8");`
-Note:  The demo supplied with Lucene does not support UTF-8 out of the box.  You will 
have to modify it.
+    
 
  2. Identify the Analyzer you will use or write your own if none exists.  There are 
many great analyzers available that will index a wide variety of languages.  See 
[http://jakarta.apache.org/lucene/docs/lucene-sandbox/ Sandbox] for some.  Otherwise, 
look around the web.  If you are writing your own, consider donating it to the Lucene 
Sandbox so that others can benefit from your brilliance.  See item 3. below for what 
is needed in a custom analyzer.
      'Put example of writing an Analyzer here'

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__IndexingOtherLanguag?= =?iso-8859-1?q?es?=

Reply via email to