OK - that means that either my code is bad, or the strategy of using UniJIS-UTF16-H when no ToUnicode map is provided is flawed.
At this stage, one of the iText developers is going to need to take a look at things. I'll send an email and see how best to proceed on this. It would be much easier if we had a significantly simpler source PDF to work from that uses that font. - K ----------------------- Original Message ----------------------- From: "Hoppe, Michael" <[email protected]> To: "Post all your questions about iText here" <[email protected]> Cc: Date: Mon, 12 Jan 2009 12:46:55 +0100 Subject: Re: [iText-questions] extracting text from pdfs with japanese data Hi Kevin, Also sorry for the delay. I was in vacation until today. The txt-file you attached to your last mail does not show any japanese characters but only gibberish (i am using a unicode editor, so it should show up correctly). The output should look like the txt-file i attached to this mail. Or didnt i get you correctly? Thanks + greetings Michael Dr. Michael Hoppe ePublishing & eScience Development & Applied Research Phone +49 7247 808-251 Fax +49 7247 808-133 [email protected] FIZ Karlsruhe Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen, Germany www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
