I want that Chinese Anayzer !!
On Fri, 21 Jan 2005 17:36:17 +0100, Safarnejad, Ali (AFIS) <[EMAIL PROTECTED]> wrote: > I've written a Chinese Analyzer for Lucene that uses a segmenter written by > Erik Peterson. However, as the author of the segmenter does not want his code > released under apache open source license (although his code _is_ > opensource), I cannot place my work in the Lucene Sandbox. This is > unfortunate, because I believe the analyzer works quite well in indexing and > searching chinese docs in GB2312 and UTF-8 encoding, and I like more people > to test, use, and confirm this. So anyone who wants it, can have it. Just > shoot me an email. > BTW, I also have written an arabic analyzer, which is collecting dust for > similar reasons. > Good luck, > > Ali Safarnejad > > > -----Original Message----- > From: Eric Chow [mailto:[EMAIL PROTECTED] > Sent: 21 January 2005 11:42 > To: Lucene Users List > Subject: Re: Search Chinese in Unicode !!! > > Search not really correct with UTF-8 !!! > > The following is the search result that I used the SearchFiles in the lucene > demo. > > d:\Downloads\Softwares\Apache\Lucene\lucene-1.4.3\src>java > org.apache.lucene.demo.SearchFiles c:\temp\myindex > Usage: java SearchFiles <idnex> > Query: ç > Searching for: g <<<<<<<<<<<< strange ?? > 3 total matching documents > 0. ../docs/ChineseDemo.html <<<<<<<<<<<< this files contains > the ç > - > 1. ../docs/luceneplan.html > - Jakarta Lucene - Plan for enhancements to Lucene > 2. ../docs/api/index-all.html > - Index (Lucene 1.4.3 API) > Query: > > From the above result only the ChineseDemo.html includes the character that I > want to search ! > > The modified code in SearchFiles.java: > > BufferedReader in = new BufferedReader(new InputStreamReader(System.in, > "UTF-8")); > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]