Connecting up Poppler to PoDoFo wouldn't be that difficult - I've sat Xpdf (on 
which Poppler is based) on top of other PDF libraries in the past to use its 
rendering facilities in conjunction with other PDF reading/writing needs.  
Basically you just need to replace parts of the core Object/Dict/Array/Stream 
classes.

I would think, however, that the big issue would be the licensing concerns.

Leonard 

-----Original Message-----
From: Craig Ringer [mailto:cr...@postnewspapers.com.au] 
Sent: Wednesday, July 22, 2009 7:54 PM
To: Trevor Kaufman
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Margins and Fonts

On Wed, 2009-07-22 at 17:09 -0400, Trevor Kaufman wrote:

> Margins.. Most PDFs I've seen leave some space between the edge of the
> "body" text and the edge of the page. This space I will call a margin
> ( I don't know if there is an official PDF term..). I'd like to figure
> out the width of the margins. As far as I can tell, there is no
> PdfPage.GetMargins() and the quick look through I did on the PDF spec
> didn't seem to mention any margins as I have defined them. The PDFs I
> want to work with with either have a bunch of body text, or be a slide
> (power point, etc) converted to PDF. I'd like to be able to find the
> size of the margins in order to draw inside them.

As Leonard noted, there's no easy way to find the bounding box of the
PDF content. PDF does actually have definitions for boxes that you might
call margins of various sorts (PDF Reference 10.10.1, "page boundaries)
- but they're frequently left unset or incorrectly set, and are defined
to include "meaningful whitespace" anyway.

Getting the bounding box of the content involves processing the PDF
content stream(s) and tracking parts of the the graphics state while
checking where each drawing operator would draw. At present PoDoFo
doesn't do this - and in fact knows nothing about what the operators in
content streams do.

I'd really like to see a way to use Poppler's PDF content stream
processing with PoDoFo as the PDF file structure access backend, so
things like this and thumbnail generation could be handled. Right now,
though, there's nothing like that, and I haven't looked at Poppler in
detail to see what doing it would involve and how maintainable such a
modification would be.

> Fonts. I did a quick modification to the concept of the hello world
> example that takes an existing PDF and adds some text to it. The size
> of the PDF file tripled for a short line of text. I am assuming this
> is because I embedded a new font when I added the text. If this is not
> true... maybe this doesn't matter so much. Anyways, is there a way to
> get the font(s) that are already embedded in a existing PDF and reuse
> them?

The trouble there is that most PDFs contain fonts embedded as subsets.
These fonts only contain the glyphs that are actually used in the PDF
document. For a PDF with the text "aardvark" the glyphs for "a", "d",
"k", "r" and "v" would be included. If you wanted to add the text "hello
world" you'd have to re-embed the font (if you had an _identical_ copy
and could identify the embedded font), embed a different subset, or use
a different font, since it would be lacking the glyphs you needed.

Right now PoDoFo doesn't support subsetting during embedding. Dom's done
some work on this but I don't know what the current status of it is.

So - you probably could re-use an already embedded font, IF you could
determine that it was fully embedded or contained all the glyphs you
needed. Right now, though, PoDoFo doesn't offer you any help with this
so you'd have to do it using the low-level document structure.

> If this stuff is possible but not currently doable in the code, I am
> open to helping provided it with some guidance.

I'm not sure how much help I can be right now. Looking at PdfFont.h it
appears that there's some facility for using existing fonts already, but
you'd need to be able to find the font in the document structure and
determine that it was (a) the font you needed and (b) not a subset, or a
subset containing the glyphs you need.

A class to enumerate fonts in a PDF and provide some information about
them (like subset glyphs, etc) would be a good thing to tackle.

-- 
Craig Ringer



------------------------------------------------------------------------------
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

------------------------------------------------------------------------------
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to