[jira] [Commented] (PDFBOX-457) Invalid code encountered while decoding CCITT

Tilman Hausherr (JIRA) Sat, 29 Mar 2014 12:38:07 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954413#comment-13954413
 ]


Tilman Hausherr commented on PDFBOX-457:
----------------------------------------

There was a pdfbox bug (which I corrected) that the ccitt filter got the wrong 
length if it isn't the first filter. However there's still an unsolved problem, 
i.e. rendering that file. My current theory is that the ccitt stream that I get 
after the flate filter is applied from 580505.PR00003.000003.PDF is broken 
(because I get a perfect image by skipping 6 bytes), but that the decoders of 
pdf.js and gs (which have source code completely different than ours) are 
lenient.

The bug I corrected didn't have a big impact, because normally ccitt files 
aren't compressed a second time because the algorithm is really good for most 
bitonal files. The bug would just result in the ccitt image file being cut off.

> Invalid code encountered while decoding CCITT
> ---------------------------------------------
>
>                 Key: PDFBOX-457
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-457
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 0.8.0-incubator
>            Reporter: Marcelo Tavares
>            Assignee: Daniel Wilson
>              Labels: CCITTFaxDecode, TIFF, ccitt
>         Attachments: 580505.PR00003.000003.PDF, 
> pdfbox-457-Scan_from_a_Xerox_WorkCentre_Pro.PDF, pdfbox-457-as_fax.pdf, 
> pdfbox-457.PNG, testPDFToImage1.png
>
>
> I tried to convert the following document to image, but I got the attached 
> result. 
> It parsed just the text. I also tried different formats like JPG.  I ran it 
> using the PDFToImage class passing the document path as parameter. 
> I've read that sometimes the document is not created respecting the PDF 
> standard. But, is there a possibility to ignore it?! In fact, it's very 
> important to me, so, could I use PDF Box despite of those "errors"? 
> Thank you
> Marcelo



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-457) Invalid code encountered while decoding CCITT

Reply via email to