[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

Petr Slaby (JIRA) Wed, 04 May 2016 09:42:12 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270954#comment-15270954
 ]


Petr Slaby commented on PDFBOX-3338:
------------------------------------

You mean to test my solution using the Twelve Monkeys implementation? 
Unfortunately, the decoder class in that library is not public, so for my quick 
and dirty test I have just copied it with some minor tweaks to avoid copying 
too many classes. Then, I have used it for the K>1 path only as it was used in 
my PDF. I believe this is the G3 and G32D variant, depending on the value of 
tiffOptions. As for G4, it would not be a big deal, except that I do not see a 
flag for the byte align option in the Twelve Monkeys library. Not sure whether 
it is not supported there or whether this is just lack of knowledge on my side.

Apart from that, I could probably do this. The license of Twelve Monkeys allows 
copying provided that the copyright notice remains in the copied file. (At 
least this is how I understand it, but I am not a lawyer) This is no problem 
for a testing patch. But I do not know whether you could use it if you decide 
to take the solution instead of the current decoder implementation (which 
originally comes from Sun ImageIO and was made freely available by Sun some 
years ago).


> CCITT Fax decoder fails
> -----------------------
>
>                 Key: PDFBOX-3338
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3338
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.8.12, 2.0.1
>            Reporter: Petr Slaby
>         Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

Reply via email to