I'm not aware of any Java library that can reliably extract Chinese text from PDF documents. We're planning on supporting Chinese, Japanese, and Korean in version 2 of PDFTextStream, but there's no doubt that it's a huge challenge.

Chas Emerick   |   [EMAIL PROTECTED]

PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/

On Sep 8, 2004, at 5:58 AM, [EMAIL PROTECTED] wrote:

it is not about analyzer ,i  need to read text from pdf file first.

----- Original Message -----
From: "Chandan Tamrakar" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 4:15 PM
Subject: Re: pdf in Chinese


which analyzer you are using to index chinese pdf documents ?
I think you should use cjkanalyzer
----- Original Message -----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 11:27 AM
Subject: pdf in Chinese


Hi all,
    i use pdfbox to parse pdf file to lucene document.when i parse
Chinese
pdf file,pdfbox is not always success.
    Is anyone have some advice?


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to