Hi all, I'm using iText 5.1.2 to extract text from PDF files. But it just returns me two spaces (#32#32) when it encounters a chinese char. An example PDF file is attached.
I have downloaded iTextAsian.jar and iTextAsianCmaps.jar, and corrected the namespace of both jar files (from lowagie to itextpdf). And I added them to the class path when executing (via "java -cp .;.\iTextAsian.jar;.\iTextAsianCmaps.jar ..."), without success (still got two spaces instead of a chinese char). After that, I tried to extract all the contents of the two jar files (except META-INF), and pack them back to itextpdf-5.1.2.jar, without success too :-( By the way, PDFBox can extract text correctly from this PDF file. Any suggestions? Thank in Advance. Regards, Mophy http://itext-general.2136553.n4.nabble.com/file/n3757883/33.pdf 33.pdf -- View this message in context: http://itext-general.2136553.n4.nabble.com/Problem-when-extracting-CJK-chars-from-PDF-files-tp3757883p3757883.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
