[jira] [Comment Edited] (PDFBOX-2120) Regression: Type 1 font corrupted

Tilman Hausherr (JIRA) Mon, 09 Jun 2014 04:51:30 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021805#comment-14021805
 ]


Tilman Hausherr edited comment on PDFBOX-2120 at 6/9/14 11:50 AM:
------------------------------------------------------------------

One good news: it does not happen when using the -nonSeq option.

The sequential parser doesn't handle indirect lengths when these come later in 
the file, so it reads the streams sequentially looking for "endstream" which is 
NOT correct and at most, heuristic. PDFBOX-2079 removed a final CR LF or LF 
from such a stream, because of the assumption that a PDF writing application 
will append CR LF or LF before writing "endstream". In your file, when taking 
the length into account, the stream ends with a CR. And then there's an LF. I 
remove both so the font file ends with "cleartomark" without a final CR LF or 
LF and Adobe doesn't like this.

The sequential parser is a dead end and nobody should use it but it is used, 
because the sequential parser needs an extra parameter :-(

I am adding an exception for assumed ASCII streams by testing the beginning of 
the stream like in PDFBOX-1164, i.e. that these won't get the filtering that I 
introduced with PDFBOX-2079 and thus will keep a final CR LF or LF.

This works for your file, but I will do some more tests.

In the future, the sequential parser should be deleted, and so should the 
filter that I introduced with PDFBOX-2079.


was (Author: tilman):
One good news: it does not happen when using the -nonSeq option.

The sequential parser doesn't handle indirect lengths when these come later in 
the file, so it reads the streams sequentially looking for "endstream" which is 
NOT correct and at most, heuristic. PDFBOX-2079 removed a final CR LF or LF 
from such a stream, because of the assumption that a PDF writing application 
will append CR LF or LF before writing "endstream". In your file, when taking 
the length into account, the stream ends with a CR. And then there's an LF. I 
remove both so the font file ends with "cleartomark" without a final CR LF or 
LF and Adobe doesn't like this.

The sequential parser is a dead end and nobody should use it but it is used, 
because the sequential parser needs an extra parameter :-(

I am adding an exception for PostScript streams (assuming that these start with 
"%!PS"), i.e. that these won't get the filtering that I introduced with 
PDFBOX-2079 and thus will keep a final CR LF or LF.

This works for your file, but I will do some more tests.

In the future, the sequential parser should be deleted, and so should the 
filter that I introduced with PDFBOX-2079.

> Regression: Type 1 font corrupted
> ---------------------------------
>
>                 Key: PDFBOX-2120
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2120
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: simon steiner
>            Assignee: Tilman Hausherr
>              Labels: regression
>         Attachments: t1subset2.pdf
>
>
> You get a warning when opening output in adobe reader
> Blank line after "cleartomark" missing in fontfile
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
> WriteDecodedDoc t1subset2.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2120) Regression: Type 1 font corrupted

Reply via email to