Re: [PDFdev] Text extraction and /Differences

Mike Bremford Fri, 03 Oct 2003 03:05:44 -0700


PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

The names are either entries in the POST table (for TrueType fonts) or names of the glyphs (for Type 1 fonts). With Type 1 fonts, unless they use standard glyph names there is no way to map back to Unicode characters. With TrueTypes, you need to parse the embedded TT font to extract the information you need.

Cheers... Mike
--
----------------------------------------------------
Mike Bremford - CTO            [EMAIL PROTECTED]
Big Faceless Organization    http://big.faceless.org

Peter Persits wrote:

PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

Hi all,

This may be a basic question, so please bear with me...

For content extraction purposes, I need to parse the /Differences array
(unless the font has a ToUnicode entry, I assume).

But what if this array contains non-standard glyph names? By "standard" I
mean those 1051 names published by Adobe ( "A",  "AE",  "AEacute",
 "AEsmall", "Aacute", "Aacutesmall", ...,  "zerooldstyle", "zerosuperior",
"zeta").

Quite often, a /Differences array contains names like /a12 or /G8, etc. What
are those and what do I do with them? How do I convert these into Unicode?

Thanks in advance.

Peter


To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html


To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Re: [PDFdev] Text extraction and /Differences

Reply via email to