[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144870#comment-14144870
 ] 

Daniel Scheibe commented on PDFBOX-2350:
----------------------------------------

I think i nailed it down or at least i might have found out what potentially 
goes wrong:

Inside the {code}org.apache.pdfbox.pdmodel.font.PDType1Font{code} constructor 
{code}public PDType1Font(COSDictionary fontDictionary) {code} the try catch 
block consisting of

{code:java}
COSStream stream = fontFile.getStream();
int length1 = stream.getInt(COSName.LENGTH1);
int length2 = stream.getInt(COSName.LENGTH2); 

// the PFB embedded as two segments back-to-back
byte[] bytes = fontFile.getByteArray();
byte[] segment1 = Arrays.copyOfRange(bytes, 0, length1);
byte[] segment2 = Arrays.copyOfRange(bytes, length1, length1 + length2);

t1 =  Type1Font.createWithSegments(segment1, segment2);
{code}

is either in my case reporting an incorrect value for length1 or more likely 
length1 is taken into account incorrectly, i guess the following should be 
correct instead:

{code}
byte[] segment1 = Arrays.copyOfRange(bytes, 0, length1 - 1);
byte[] segment2 = Arrays.copyOfRange(bytes, length1 - 1, length1 + length2);
{code}

I mean isn't the 0 .. length1 one byte too much?

While debugging and dumping out the arrays i noticed that the byte array 
segment2 is always missing the first byte and when i changed it as shown above 
it works fine, all embedded fonts are rendered correctly and the image looks 
just fine.

Anyways, i have no clue if my change makes sense or might have a larger impact 
and something else goes south by that but it might help you guys to come up 
with a suitable fix?

> Type1 Parser hangs indefinitely
> -------------------------------
>
>                 Key: PDFBOX-2350
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.0
>         Environment: Windows 7, JDK 1.7.0_51-b13
>            Reporter: Daniel Scheibe
>         Attachments: PDFBOX-2350-289451-endless.pdf
>
>
> When rendering the first page of my pdf document the Type1Parser 
> (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
> {{parseBinary(byte[] bytes) throws IOException}}
> and "kills" our rendering pipeline. Please find the loop that hangs below:
>         // find /Private dict
>         while (!lexer.peekToken().getText().equals("Private"))
>         {
>             lexer.nextToken();
>         }
> There is no token named "Private" ever in the list of returned tokens 
> (they're empty all the time).  
> Furthermore going deeper into the source code it seems the class reading the 
> tokens (Type1Lexer) does never finally advance the buffer position and always 
> returns an empty name token in the readToken(Token prevToken) method.
> Looking at the decrypted buffer i cannot get something useful out of it based 
> on my current understanding.
> Unfortunately i cannot provide the pdf in question as it contains confidental 
> data.
> Acrobat Reader XI Version 11.0.08 renders the document just fine.
> In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
> password and says it's pdf version 1.5.
> Does this provide enough information or can i do anything else to help 
> nailing this one down?
> I guess this might be a pdf document structure/feature that is not yet 
> supported completely but at least pdfbox should throw an exception instead of 
> failing "silently"...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to