[iText-questions] Problem when extracting CJK chars from PDF files

Mophy Xiong Sat, 20 Aug 2011 19:12:26 -0700

Hi all,

I'm using iText 5.1.2 to extract text from PDF files. But it just returns me
two spaces (#32#32) when it encounters a chinese char. An example PDF file
is attached.


I have downloaded iTextAsian.jar and iTextAsianCmaps.jar, and corrected the
namespace of both jar files (from lowagie to itextpdf).

And I added them to the class path when executing (via "java -cp
.;.\iTextAsian.jar;.\iTextAsianCmaps.jar ..."), without success (still got
two spaces instead of a chinese char).

After that, I tried to extract all the contents of the two jar files (except
META-INF), and pack them back to itextpdf-5.1.2.jar, without success too :-(

By the way, PDFBox can extract text correctly from this PDF file.

Any suggestions?

Thank in Advance.

Regards,
Mophy

http://itext-general.2136553.n4.nabble.com/file/n3757883/33.pdf 33.pdf 

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Problem-when-extracting-CJK-chars-from-PDF-files-tp3757883p3757883.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

[iText-questions] Problem when extracting CJK chars from PDF files

Reply via email to