CMapAwareDocumentFont has this parsing via the CMap class - this encapsulates the parsing behind an object, and makes it a lot easier to deal with.
 
I think that the biggest thing here is actually finding the appropriate CMap data byte stream (either from embedded data in the PDF, or from the file system) - right now, locating the CMap information is a weak point in the content parser.
 
If the cmap data is included in a jar on the classpath, then the CMap could absolutely be read from the jar.
 
Can the OP please send a PDF that demonstrates the issue?  I'll take a look at the font information and see how tough it would be to add this type of lookup if TOUNICODE isn't available.
 
- K
 
----------------------- Original Message -----------------------
  
From: "Paulo Soares" <[email protected]>
To: "Post all your questions about iText here" <[email protected]>
Cc: 
Date: Tue, 16 Dec 2008 09:55:36 -0000
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
  
There's code in PdfEncodings to parse and convert to/from Unicode the cmaps.
The font contains the cmap name.

Paulo

----- Original Message -----
From: "1T3XT info" <[email protected]>
To: "Post all your questions about iText here"
<[email protected]>
Sent: Tuesday, December 16, 2008 9:19 AM
Subject: Re: [iText-questions] extracting text from pdfs with japanese data


Hoppe, Michael wrote:
> The CMap-files are included in the iTextAsianCmaps.jar. So couldn’t they
> be read from that jar in case there is no font information in the pdf?

I'm just thinking out loud here, I didn't dive into the problem yet,
but: do you think it's possible for iText to find which CMap-file is to
be inspected based on the font information availa ble in the PDF?

As Kevin already said: this part of iText is pretty new. We're all
excited about it, but for the moment it's all highly experimental.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to