Re: Fwd: Google Summer of Code

mehdi houshmand Tue, 06 Mar 2012 03:08:41 -0800

Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources> tag, as a post-process action.

Also, the requirements aren't clear here, what is it we want here? Let
me validate that, this shouldn't change the (I guess we can call it)
"canonical" PDF document. By that I mean if you rasterized a PDF
before and after this change they should be identical,
pixel-for-pixel. When Acrobat does the font de-duping (I don't
remember how much control it gives you, but if there are levels of
de-duping I would have chosen the most aggressive), the documents
aren't identical. There are aberrations caused by slight kerning
differences between various verisons of Arial. This may seem trivial
when compared to bloated PDFs, but it looks tacky and lowers the high
standard of documents. You could argue this could be configurable...
But then I'd re-iterate my first argument, this is a post-process
action, not the concern of FOP or the pdf-image-plugin.

The other issue is you have subset fonts created by FOP as well as
those imported by the pdf-image-plugin. You'd have to create some
bridge between the image loading framework and the font loading system
*cough* HACK *cough*. Alternatively, just thinking aloud here, if this
was done as a post-process *wink* *wink* *wry smile*...

Apologies if I may seem to be argumentative here, it's not my
intention, but I feel this is would be serious scope creep. I see the
pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
If you want to stitch together PDFs, PDFBox is designed just for that.

Mehdi

On 6 March 2012 10:36, Chris Bowditch <bowditch_ch...@hotmail.com> wrote:
> On 06/03/2012 10:12, mehdi houshmand wrote:
>>
>> I fat-fingered the reply button instead of reply-to-all... *face-palm*
>
>
> Mehdi, Craig,
> <snip/>
>
>
>
>>> - Anything in the proposed XSL-FO 2.0 feature list (though most of it
>>> won't
>>> be realistic for GSoC projects);
>>>
>>> - Merge fop-pdf-image and implement smart merging of font, profile, and
>>> image resources. I'm working on this one at the moment, but slowly and
>>> only
>>> amid other projects.
>>
>> I really don't think that's a suitable project, I responded to your
>> post so maybe we could take this conversation else where, but this
>> really isn't FOPs responsibilty, or for that matter the
>> pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
>> Adobe Acrobat Pro does this kind of thing (badly may I add) as a
>> post-process action and I think that's the correct way to do it. The
>> other thing to say is that a new comer may not appreciate the
>> importance of fidelity when fonts are concerned. Basically it's too
>> difficult for a student given a few months and no previous experience.
>
> Sorry Mehdi I don't agree. I think this would be a great project. Craig
> already outlined what needs to be done and theres a lot of stuff in XGC and
> FOP as well as the plug-in. I'm not sure anything is needed in PDF-Box, but
> even if it then is an Apache project too and the student can submit patches
> there. Adobe Acrobat may make some assumptions that don't always hold true,
> but our customers are crying out for FOP to create smaller PDF files when
> importing multiple PDF images with embedded fonts. This also feels
> reasonable well defined thanks to Craig's list of TODOs and feels like it
> can be done in 3 months. It gets a +1 from me.
>
> Thanks,
>
> Chris
>
>>> --
>>> Craig Ringer
>>
>>
>

Re: Fwd: Google Summer of Code

Reply via email to