I want that Chinese Anayzer !!

On Fri, 21 Jan 2005 17:36:17 +0100, Safarnejad, Ali (AFIS)
<[EMAIL PROTECTED]> wrote:
> I've written a Chinese Analyzer for Lucene that uses a segmenter written by
> Erik Peterson. However, as the author of the segmenter does not want his code
> released under apache open source license (although his code _is_
> opensource), I cannot place my work in the Lucene Sandbox.  This is
> unfortunate, because I believe the analyzer works quite well in indexing and
> searching chinese docs in GB2312 and UTF-8 encoding, and I like more people
> to test, use, and confirm this.  So anyone who wants it, can have it. Just
> shoot me an email.
> BTW, I also have written an arabic analyzer, which is collecting dust for
> similar reasons.
> Good luck,
> 
> Ali Safarnejad
> 
> 
> -----Original Message-----
> From: Eric Chow [mailto:[EMAIL PROTECTED]
> Sent: 21 January 2005 11:42
> To: Lucene Users List
> Subject: Re: Search Chinese in Unicode !!!
> 
> Search not really correct with UTF-8 !!!
> 
> The following is the search result that I used the SearchFiles in the lucene
> demo.
> 
> d:\Downloads\Softwares\Apache\Lucene\lucene-1.4.3\src>java
> org.apache.lucene.demo.SearchFiles c:\temp\myindex
> Usage: java SearchFiles <idnex>
> Query: ç
> Searching for: g                                <<<<<<<<<<<<      strange ??
> 3 total matching documents
> 0. ../docs/ChineseDemo.html            <<<<<<<<<<<<    this files contains
> the ç
>   -
> 1. ../docs/luceneplan.html
>   - Jakarta Lucene - Plan for enhancements to Lucene
> 2. ../docs/api/index-all.html
>   - Index (Lucene 1.4.3 API)
> Query:
> 
> From the above result only the ChineseDemo.html includes the character that I
> want to search !
> 
> The modified code in SearchFiles.java:
> 
> BufferedReader in = new BufferedReader(new InputStreamReader(System.in,
> "UTF-8"));
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to