On Thu, 2 Jun 2016 21:09:51 -0500
Jeremy Volkening <j...@base2bio.com> wrote:

> This problem is a bit hard to grasp. More detailed tests on another
> batch of PDFs gave this:
> 
> 1. Most (126) failed to open with the "PDF document is damaged" error
> 
> 2. A few (4) opened but failed to load the first page ( e.g.
> get_page(0) returned undef )
> 
> 3. The rest (5) seemed fine - opened, read metadata, rendered to png
> 
> The only obvious common denominators I could find between the five
> files that worked was:
> 
> a. all were marked as PDF 1.1 compliant (all others at least PDF 1.4).
> b. all were a single page
> c. none seem to use any internal compression
> 
> (b) is not relevant since other single-page files failed. Given that
> five files worked fine, the problem appears not in the argument type
> but in the way Perl is passing the data to the poppler libs. All 135
> files passed all tests using the "new_from_file()" constructor
> (letting poppler do the reads). (c) might be relevant since passing
> those files through ghostscript with all default settings introduces
> compressed objects and the files then fail testing. I strongly
> suspect that Perl is mangling the binary data blobs somehow before
> they arrive at the C libs but am clueless as to how to track this
> down.

I see that the data argument to "new_from_data()" is specified as a utf-8 char
array, but the contents of the PDF files (those that use compressed blocks)
are not actually straight UTF-8. Is it possible that perl-G:I:O sees the specs
and is forcing UTF-8 encoding on the data at some point?

Jeremy

_______________________________________________
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list

Reply via email to