Hi Youngho, With regard to Japanese, using StandardAnalyzer, I can search a word/phase.
Did you use QueryParser? StandardAnalyzer tokenizes CJK characters into a stream of single character. Use QueryParser to get a PhraseQuery and search the query. Please see the following sample code. Replace Japanese "contents" and (search target) "phrase" with Korean in the program and run. regards, Koji ============================================= import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.cjk.CJKAnalyzer; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Hits; import org.apache.lucene.search.Query; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.queryParser.ParseException; public class JapaneseByStandardAnalyzer { private static final String FIELD_CONTENT = "content"; private static final String[] contents = { "東京にはおいしいラーメン屋がたくさんあります。", "北海道にもおいしいラーメン屋があります。" }; private static final String phrase = "ラーメン屋"; //private static final String phrase = "屋"; private static Analyzer analyzer = null; public static void main( String[] args ) throws IOException, ParseException { Directory directory = makeIndex(); search( directory ); directory.close(); } private static Analyzer getAnalyzer(){ if( analyzer == null ){ analyzer = new StandardAnalyzer(); //analyzer = new CJKAnalyzer(); } return analyzer; } private static Directory makeIndex() throws IOException { Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter( directory, getAnalyzer(), true ); for( int i = 0; i < contents.length; i++ ){ Document doc = new Document(); doc.add( new Field( FIELD_CONTENT, contents[i], Field.Store.YES, Field.Index.TOKENIZED ) ); writer.addDocument( doc ); } writer.close(); return directory; } private static void search( Directory directory ) throws IOException, ParseException { IndexSearcher searcher = new IndexSearcher( directory ); QueryParser parser = new QueryParser( FIELD_CONTENT, getAnalyzer() ); Query query = parser.parse( phrase ); System.out.println( "query = " + query ); Hits hits = searcher.search( query ); for( int i = 0; i < hits.length(); i++ ) System.out.println( "doc = " + hits.doc( i ).get( FIELD_CONTENT ) ); searcher.close(); } } > -----Original Message----- > From: Youngho Cho [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 27, 2005 8:18 AM > To: java-user@lucene.apache.org; Cheolgoo Kang > Subject: Re: korean and lucene > > > Hello Cheolgoo, > > Now I updated my lucene version to 1.9 for using StandardAnalyzer > for Korean. > And tested your patch which is already adopted in 1.9 > > http://issues.apache.org/jira/browse/LUCENE-444 > > But Still I have no good results with Korean compare with CJKAnalyzer. > > Single character is good match but more two character word > doesn't match at all. > > Am I something missing or still there need some more works ? > > > Thanks, > > Youngho. > > > ----- Original Message ----- > From: "Cheolgoo Kang" <[EMAIL PROTECTED]> > To: <java-user@lucene.apache.org>; "John Wang" <[EMAIL PROTECTED]> > Sent: Tuesday, October 04, 2005 10:11 AM > Subject: Re: korean and lucene > > > > StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read > > Korean part of Unicode character blocks. > > > > You should 1) use CJKAnalyzer or 2) add Korean character > > block(0xAC00~0xD7AF) to the CJK token definition on the > > StandardTokenizer.jj file. > > > > Hope it helps. > > > > > > On 10/4/05, John Wang <[EMAIL PROTECTED]> wrote: > > > Hi: > > > > > > We are running into problems with searching on korean > documents. We are > > > using the StandardAnalyzer and everything works with Chinese > and Japanese. > > > Are there known problems with Korean with Lucene? > > > > > > Thanks > > > > > > -John > > > > > > > > > > > > -- > > Cheolgoo > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]