Search Chinese in Unicode !!!

2005-01-21 Thread Eric Chow
How to create index with chinese (in utf-8 encoding ) HTML and search
with Lucene ?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Chinese in Unicode !!!

2005-01-21 Thread Eric Chow
Search not really correct with UTF-8 !!!


The following is the search result that I used the SearchFiles in the
lucene demo.

d:\Downloads\Softwares\Apache\Lucene\lucene-1.4.3\srcjava
org.apache.lucene.demo.SearchFiles c:\temp\myindex
Usage: java SearchFiles idnex
Query: 
Searching for: g  strange ??
3 total matching documents
0. ../docs/ChineseDemo.htmlthis files contains the 

   -
1. ../docs/luceneplan.html
   - Jakarta Lucene - Plan for enhancements to Lucene
2. ../docs/api/index-all.html
   - Index (Lucene 1.4.3 API)
Query: 



From the above result only the ChineseDemo.html includes the character
that I want to search !




The modified code in SearchFiles.java:


BufferedReader in = new BufferedReader(new
InputStreamReader(System.in, UTF-8));

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search Chinese in Unicode !!!

2005-01-21 Thread Eric Chow
I want that Chinese Anayzer !!


On Fri, 21 Jan 2005 17:36:17 +0100, Safarnejad, Ali (AFIS)
[EMAIL PROTECTED] wrote:
 I've written a Chinese Analyzer for Lucene that uses a segmenter written by
 Erik Peterson. However, as the author of the segmenter does not want his code
 released under apache open source license (although his code _is_
 opensource), I cannot place my work in the Lucene Sandbox.  This is
 unfortunate, because I believe the analyzer works quite well in indexing and
 searching chinese docs in GB2312 and UTF-8 encoding, and I like more people
 to test, use, and confirm this.  So anyone who wants it, can have it. Just
 shoot me an email.
 BTW, I also have written an arabic analyzer, which is collecting dust for
 similar reasons.
 Good luck,
 
 Ali Safarnejad
 
 
 -Original Message-
 From: Eric Chow [mailto:[EMAIL PROTECTED]
 Sent: 21 January 2005 11:42
 To: Lucene Users List
 Subject: Re: Search Chinese in Unicode !!!
 
 Search not really correct with UTF-8 !!!
 
 The following is the search result that I used the SearchFiles in the lucene
 demo.
 
 d:\Downloads\Softwares\Apache\Lucene\lucene-1.4.3\srcjava
 org.apache.lucene.demo.SearchFiles c:\temp\myindex
 Usage: java SearchFiles idnex
 Query: 
 Searching for: g  strange ??
 3 total matching documents
 0. ../docs/ChineseDemo.htmlthis files contains
 the 
   -
 1. ../docs/luceneplan.html
   - Jakarta Lucene - Plan for enhancements to Lucene
 2. ../docs/api/index-all.html
   - Index (Lucene 1.4.3 API)
 Query:
 
 From the above result only the ChineseDemo.html includes the character that I
 want to search !
 
 The modified code in SearchFiles.java:
 
 BufferedReader in = new BufferedReader(new InputStreamReader(System.in,
 UTF-8));
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Genereate index files into database ?

2005-01-19 Thread Eric Chow
Hello,

How can generate Lucene index files into database tables ??

Eric

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search PDF ???

2004-10-24 Thread Eric Chow
Hello,

1. Is it possibleto use Lucene to search PDF contents ?

2. Can it search Chinese contents PDF files ???


Eric

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search PDF, Excel, Word, RTF files

2002-12-18 Thread Eric Chow
Hello,

Is it possible to search PDF, Excel, Word, RTF files in Lucene ?


Would you please to give me a simple example?

Best regards,
Eric

==
If you know what you are doing, 
it is not called RESEARCH!
==