PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

On Fri, 2004-04-16 at 08:38, Nicola Righetti wrote:
> PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
> _____________________________________________________________
> 
> >>What do you mean by view the character? You will find viewing software
> >>at the links I gave you. If you want to display it you will need to use
> >>an existing parser or write your own to convert the glyph data (which is
> >>a set of cubic points) into a glyph.
> >>A truetype is a complex file format consisting of multiple inter-related
> >>tables - I am not clear what exactly you are trying to do.
> 
> OK i'll try to explain better.
> I need to extract text from this PDF.
> If i extract the text as it is a string that shows as "DUMMY" in Adobe
> Reader is coded as "\x04\x06\x05\x05\x06"(hex string) in the PDF content
> stream. Analizing this assetion i can build part of my mapping
> 
> D = 0x04
> U = 0x06
> M = 0x05
> Y = 0x08

You should not need to look inside the truetyoe font for this. The PDF
should have a set of commands of the format

(DUMMY)TJ

or 

(xyz)TJ

or 

(\x04\x05)TJ

The encoding settings allow you to convert these indices into text
values. Depending on the file, this encoding will be derived from CMAP,
encoding, differences. Sometimes, the indices will look the same as the
text (ie Standard encoding).

Some PDF files do not contain enough information to extract text - just
display it (certain versions of Ghostscript for example).

My hunch is that you have not allowed for the differences table or the
text is not extractable. If you send me the file I will have a quick
look.

> 
> Using this process i need to parse the entire PDF and hope to encounter all
> the possibile letters/digits just to see their mapping.
> My idea is to save the fontfile2 data to a file , use a viewer to view it
> and then extract my mapping from what i see.
> Suppose that this viewer shows me that the first symbol is A, the second F,
> the third G, the fourth B and i expect the fifth to be D and sixth to be M
> then U then K(for example) and then Y because i've already their mapping
> deducted from the parsing of the string DUMMY.
> SO the final map table is
> 
> A = 0x00
> F = 0x01
> G = 0x02
> B = 0x03
> D = 0x04
> M = 0x05
> U = 0x06
> K = 0x07
> Y = 0x08
> ....
> 
> Is this correct? What can i do to renderize the glyphs contained in the
> fontfile2 data?

The links I pointed to give you all the documentation. Basically you
need to extract all the data from several different subtables and use
that to build up a set of cubic points which define the glyph. Are you
sure you would not be better off using some off the shelf software for
this. What language are you working in?

> 
> 
> To change your subscription:
> http://www.pdfzone.com/discussions/lists-pdfdev.html
> 
> 


To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Reply via email to