I'm not aware of any Java library that can reliably extract Chinese
text from PDF documents. We're planning on supporting Chinese,
Japanese, and Korean in version 2 of PDFTextStream, but there's no
doubt that it's a huge challenge.
Chas Emerick | [EMAIL PROTECTED]
PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/
On Sep 8, 2004, at 5:58 AM, [EMAIL PROTECTED] wrote:
it is not about analyzer ,i need to read text from pdf file first.
----- Original Message -----
From: "Chandan Tamrakar" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 4:15 PM
Subject: Re: pdf in Chinese
which analyzer you are using to index chinese pdf documents ?
I think you should use cjkanalyzer
----- Original Message -----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 08, 2004 11:27 AM
Subject: pdf in Chinese
Hi all,
i use pdfbox to parse pdf file to lucene document.when i parse
Chinese
pdf file,pdfbox is not always success.
Is anyone have some advice?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]