[
https://issues.apache.org/jira/browse/PDFBOX-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954413#comment-13954413
]
Tilman Hausherr commented on PDFBOX-457:
----------------------------------------
There was a pdfbox bug (which I corrected) that the ccitt filter got the wrong
length if it isn't the first filter. However there's still an unsolved problem,
i.e. rendering that file. My current theory is that the ccitt stream that I get
after the flate filter is applied from 580505.PR00003.000003.PDF is broken
(because I get a perfect image by skipping 6 bytes), but that the decoders of
pdf.js and gs (which have source code completely different than ours) are
lenient.
The bug I corrected didn't have a big impact, because normally ccitt files
aren't compressed a second time because the algorithm is really good for most
bitonal files. The bug would just result in the ccitt image file being cut off.
> Invalid code encountered while decoding CCITT
> ---------------------------------------------
>
> Key: PDFBOX-457
> URL: https://issues.apache.org/jira/browse/PDFBOX-457
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 0.8.0-incubator
> Reporter: Marcelo Tavares
> Assignee: Daniel Wilson
> Labels: CCITTFaxDecode, TIFF, ccitt
> Attachments: 580505.PR00003.000003.PDF,
> pdfbox-457-Scan_from_a_Xerox_WorkCentre_Pro.PDF, pdfbox-457-as_fax.pdf,
> pdfbox-457.PNG, testPDFToImage1.png
>
>
> I tried to convert the following document to image, but I got the attached
> result.
> It parsed just the text. I also tried different formats like JPG. I ran it
> using the PDFToImage class passing the document path as parameter.
> I've read that sometimes the document is not created respecting the PDF
> standard. But, is there a possibility to ignore it?! In fact, it's very
> important to me, so, could I use PDF Box despite of those "errors"?
> Thank you
> Marcelo
--
This message was sent by Atlassian JIRA
(v6.2#6252)