Hello, Ok , I've attached my test code for Korean which is slitely modified Koji's code.
Just put into the lia.analysis.i18n package at LuceneInAction and run ant. Hopely someone is helped. -------- build.xml --------- <target name="JapaneseDemo" depends="prepare" description="Examples of Jananese analysis"> <info> Japanese Test... </info> <run-main class="lia.analysis.i18n.JapaneseDemo"/> </target> <target name="KoreanDemo" depends="prepare" description="Examples of Korean analysis"> <info> Korean Test... </info> <run-main class="lia.analysis.i18n.KoreanDemo"/> </target> Thanks, Youngho ----- Original Message ----- From: "Youngho Cho" <[EMAIL PROTECTED]> To: <java-user@lucene.apache.org>; "Youngho Cho" <[EMAIL PROTECTED]> Sent: Thursday, October 27, 2005 12:47 PM Subject: Re: korean and lucene > Hello all > Plese forgive me pervious my stupid message > > [echo] Running lia.analysis.i18n.KoreanDemo... > [java] [경] [기] analyzer = > org.apache.lucene.analysis.standard.StandardAnalyzer > [java] phrase = 경기 > [java] query = "경 기" > > I got the good result. > > When I compile I just rename old version lucene-1.4.3.jar to > lucene-1.4.3.jar_bak > and all new 1.9 lucene. and build the test package. > After I remove lucene-1.4.3.jar_bak in lib directory completely > I got the expected result !!!. > > I don't know the reason... ( looks like my finger make some trouble... ) > > Anyway thanks Koji and Cheolgoo > I will further test now... > > Youngho > > > > > ----- Original Message ----- > From: "Youngho Cho" <[EMAIL PROTECTED]> > To: <java-user@lucene.apache.org> > Sent: Thursday, October 27, 2005 12:28 PM > Subject: Re: korean and lucene > > > > Hello Koji > > > > Here is test result. > > Japanese is OK !. > > maybe ant clean did some effect. > > > > Anyway please refer to the result using 1.9 > > > > [echo] Running lia.analysis.i18n.JapaneseDemo... > > [java] [ラ] [メ] [ン] [屋] analyzer = > > org.apache.lucene.analysis.standard.StandardAnalyzer > > [java] phrase = ラ?メン屋 > > [java] query = content:ラ?メン屋 > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > [java] analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer > > [java] phrase = 경 > > [java] query = > > > > [echo] Running lia.analysis.i18n.JapaneseDemo... > > [java] [ラ] [メン] [ン屋] analyzer = > > org.apache.lucene.analysis.cjk.CJKAnalyzer > > [java] phrase = ラ?メン屋 > > [java] query = content:ラ?メン屋 > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > [java] [경] analyzer = org.apache.lucene.analysis.cjk.CJKAnalyzer > > [java] phrase = 경 > > [java] query = 경 > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > [java] analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer > > [java] phrase = 경기 > > [java] query = > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > [java] [경기] analyzer = org.apache.lucene.analysis.cjk.CJKAnalyzer > > [java] phrase = 경기 > > [java] query = 경기 > > > > > > Standard analyzer didn't tokenized the Korean Character at all.... > > > > Ug.... look like > > http://issues.apache.org/jira/browse/LUCENE-444 > > didn't effect at all for Korean. > > > > > > Thanks > > > > Youngho > > > > ----- Original Message ----- > > From: "Koji Sekiguchi" <[EMAIL PROTECTED]> > > To: <java-user@lucene.apache.org>; "Youngho Cho" <[EMAIL PROTECTED]> > > Sent: Thursday, October 27, 2005 11:47 AM > > Subject: RE: korean and lucene > > > > > > > Hello Youngho, > > > > > > I don't understand why you couldn't get hits result in Japanese, > > > though, you had better check why the query was empty with Korean data: > > > > > > > For Korean > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > > > [java] phrase = 경 > > > > [java] query = > > > > > > The last line should be query = 경 > > > to get hits result. Can you check why StandardAnalyzer > > > removes "경" during tokenizing? > > > > > > Koji > > > > > > > -----Original Message----- > > > > From: Youngho Cho [mailto:[EMAIL PROTECTED] > > > > Sent: Thursday, October 27, 2005 11:37 AM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: korean and lucene > > > > > > > > > > > > Hello Koji, > > > > > > > > Thanks for your kind reply. > > > > > > > > Yes, I used QueryParser. normaly I used > > > > Query = QueryParser.parse( ) method. > > > > > > > > I put your sample code into lia.analysis.i18n package in LuceneAction > > > > and run JapaneseDemo using 1.4 and 1.9 > > > > > > > > results are > > > > > > > > [echo] Running lia.analysis.i18n.JapaneseDemo... > > > > [java] query = content:ラ?メン屋 > > > > > > > > I can't get hits result. > > > > > > > > For Korean > > > > [echo] Running lia.analysis.i18n.KoreanDemo... > > > > [java] phrase = 경 > > > > [java] query = > > > > > > > > I can't get query parse result. > > > > > > > > Thanks, > > > > > > > > Youngho > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Koji Sekiguchi" <[EMAIL PROTECTED]> > > > > To: <java-user@lucene.apache.org>; "Youngho Cho" <[EMAIL PROTECTED]> > > > > Sent: Thursday, October 27, 2005 9:48 AM > > > > Subject: RE: korean and lucene > > > > > > > > > > > > > Hi Youngho, > > > > > > > > > > With regard to Japanese, using StandardAnalyzer, > > > > > I can search a word/phase. > > > > > > > > > > Did you use QueryParser? StandardAnalyzer tokenizes > > > > > CJK characters into a stream of single character. > > > > > Use QueryParser to get a PhraseQuery and search the query. > > > > > > > > > > Please see the following sample code. Replace Japanese > > > > > "contents" and (search target) "phrase" with Korean in the > > > > program and run. > > > > > > > > > > regards, > > > > > > > > > > Koji > > > > > > > > > > ============================================= > > > > > import java.io.IOException; > > > > > import org.apache.lucene.analysis.Analyzer; > > > > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > > > > > import org.apache.lucene.analysis.cjk.CJKAnalyzer; > > > > > import org.apache.lucene.store.Directory; > > > > > import org.apache.lucene.store.RAMDirectory; > > > > > import org.apache.lucene.index.IndexWriter; > > > > > import org.apache.lucene.document.Document; > > > > > import org.apache.lucene.document.Field; > > > > > import org.apache.lucene.search.IndexSearcher; > > > > > import org.apache.lucene.search.Hits; > > > > > import org.apache.lucene.search.Query; > > > > > import org.apache.lucene.queryParser.QueryParser; > > > > > import org.apache.lucene.queryParser.ParseException; > > > > > > > > > > public class JapaneseByStandardAnalyzer { > > > > > > > > > > private static final String FIELD_CONTENT = "content"; > > > > > private static final String[] contents = { > > > > > "東京にはおいしいラーメン屋がたくさんあります。", > > > > > "北海道にもおいしいラーメン屋があります。" > > > > > }; > > > > > private static final String phrase = "ラーメン屋"; > > > > > //private static final String phrase = "屋"; > > > > > private static Analyzer analyzer = null; > > > > > > > > > > public static void main( String[] args ) throws > > > > IOException, ParseException { > > > > > Directory directory = makeIndex(); > > > > > search( directory ); > > > > > directory.close(); > > > > > } > > > > > > > > > > private static Analyzer getAnalyzer(){ > > > > > if( analyzer == null ){ > > > > > analyzer = new StandardAnalyzer(); > > > > > //analyzer = new CJKAnalyzer(); > > > > > } > > > > > return analyzer; > > > > > } > > > > > > > > > > private static Directory makeIndex() throws IOException { > > > > > Directory directory = new RAMDirectory(); > > > > > IndexWriter writer = new IndexWriter( directory, getAnalyzer(), true > > > > > ); > > > > > for( int i = 0; i < contents.length; i++ ){ > > > > > Document doc = new Document(); > > > > > doc.add( new Field( FIELD_CONTENT, contents[i], > > > > Field.Store.YES, Field.Index.TOKENIZED ) ); > > > > > writer.addDocument( doc ); > > > > > } > > > > > writer.close(); > > > > > return directory; > > > > > } > > > > > > > > > > private static void search( Directory directory ) throws > > > > IOException, ParseException { > > > > > IndexSearcher searcher = new IndexSearcher( directory ); > > > > > QueryParser parser = new QueryParser( FIELD_CONTENT, getAnalyzer() ); > > > > > Query query = parser.parse( phrase ); > > > > > System.out.println( "query = " + query ); > > > > > Hits hits = searcher.search( query ); > > > > > for( int i = 0; i < hits.length(); i++ ) > > > > > System.out.println( "doc = " + hits.doc( i ).get( FIELD_CONTENT ) > > > > > ); > > > > > searcher.close(); > > > > > } > > > > > } > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Youngho Cho [mailto:[EMAIL PROTECTED] > > > > > > Sent: Thursday, October 27, 2005 8:18 AM > > > > > > To: java-user@lucene.apache.org; Cheolgoo Kang > > > > > > Subject: Re: korean and lucene > > > > > > > > > > > > > > > > > > Hello Cheolgoo, > > > > > > > > > > > > Now I updated my lucene version to 1.9 for using StandardAnalyzer > > > > > > for Korean. > > > > > > And tested your patch which is already adopted in 1.9 > > > > > > > > > > > > http://issues.apache.org/jira/browse/LUCENE-444 > > > > > > > > > > > > But Still I have no good results with Korean compare with > > > > CJKAnalyzer. > > > > > > > > > > > > Single character is good match but more two character word > > > > > > doesn't match at all. > > > > > > > > > > > > Am I something missing or still there need some more works ? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Youngho. > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Cheolgoo Kang" <[EMAIL PROTECTED]> > > > > > > To: <java-user@lucene.apache.org>; "John Wang" <[EMAIL PROTECTED]> > > > > > > Sent: Tuesday, October 04, 2005 10:11 AM > > > > > > Subject: Re: korean and lucene > > > > > > > > > > > > > > > > > > > StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read > > > > > > > Korean part of Unicode character blocks. > > > > > > > > > > > > > > You should 1) use CJKAnalyzer or 2) add Korean character > > > > > > > block(0xAC00~0xD7AF) to the CJK token definition on the > > > > > > > StandardTokenizer.jj file. > > > > > > > > > > > > > > Hope it helps. > > > > > > > > > > > > > > > > > > > > > On 10/4/05, John Wang <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi: > > > > > > > > > > > > > > > > We are running into problems with searching on korean > > > > > > documents. We are > > > > > > > > using the StandardAnalyzer and everything works with Chinese > > > > > > and Japanese. > > > > > > > > Are there known problems with Korean with Lucene? > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > -John > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Cheolgoo > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED]
KoreanDemo.java
Description: java/
JapaneseDemo.java
Description: java/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]