[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

Petr Slaby (JIRA) Wed, 04 May 2016 10:36:46 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271045#comment-15271045
 ]


Petr Slaby commented on PDFBOX-3338:
------------------------------------

{quote}
> It has an Apache license, so this isn't a problem.
{quote}
Cool, that saves me some sorrows.

{quote}
I suspect that the encodedByteAlign option isn't supported one would have to 
implement it. See in rev 1581603 and 1581602 / PDFBOX-1074.
{quote}
I can try, seems to be quite straightforward at a first glance.

{quote}
Another problem in that code is "continue" with label. I've never seen that one 
before, ever. When was this added to java?
{quote}
It is there since ever. See e.g. some examples at 
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/branch.html. I hope 
you are just exaggerating with the word "problem"? I find the code much better 
and more readable than the current decoder class in PDFBox. To the least, it 
does not need to jump hence and forth in the input and reads it byte by byte 
instead. Not that I would really understand what is going on in detail in 
either of the implementations. For that, one would have to study the standard 
first. 


> CCITT Fax decoder fails
> -----------------------
>
>                 Key: PDFBOX-3338
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3338
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.8.12, 2.0.1
>            Reporter: Petr Slaby
>         Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

Reply via email to