Re: pdf in Chinese

Chas Emerick Wed, 08 Sep 2004 07:19:51 -0700

I'm not aware of any Java library that can reliably extract Chinese text from PDF documents. We're planning on supporting Chinese, Japanese, and Korean in version 2 of PDFTextStream, but there's no doubt that it's a huge challenge.

Chas Emerick   |   [EMAIL PROTECTED]

PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/

On Sep 8, 2004, at 5:58 AM, [EMAIL PROTECTED] wrote:

it is not about analyzer ,i  need to read text from pdf file first.

----- Original Message -----
From: "Chandan Tamrakar" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 4:15 PM
Subject: Re: pdf in Chinese

which analyzer you are using to index chinese pdf documents ?
I think you should use cjkanalyzer
----- Original Message -----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 11:27 AM
Subject: pdf in Chinese

Hi all,
    i use pdfbox to parse pdf file to lucene document.when i parse

Chinese

pdf file,pdfbox is not always success.
    Is anyone have some advice?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdf in Chinese

Reply via email to