[jira] [Commented] (PDFBOX-3130) Recent regression in PDFTextStripper, text getting garbled

Tilman Hausherr (JIRA) Tue, 24 Nov 2015 11:48:28 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025206#comment-15025206
 ]


Tilman Hausherr commented on PDFBOX-3130:
-----------------------------------------

You have used the sort option. Without, it would all appear on one line (which 
is still wrong).

The root cause is that your file has an invalid font BBox. See at 
{{Root/Pages/Kids/\[0]/Resources/Font/F0/FontDescriptor/FontBBox}}. I remember 
having seen such a weird BBox before - in PDFBOX-2158.

It is not really a regression, although it appeared recently due to using the 
BBox of the PDF and not of the font.

> Recent regression in PDFTextStripper, text getting garbled
> ----------------------------------------------------------
>
>                 Key: PDFBOX-3130
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3130
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Fred Andrews
>         Attachments: garbled text.pdf
>
>
> Text extraction using PrintTextLocations is getting garbled characters in the 
> attached snippet. 
> For this file it is getting one string of "2O(Er4env vqeheurosriAurseirueeass 
> ss/Ct:7:rh adaliaargynse csr eadc+cit6e l1ipc te+2en 6d9c1)9e 91 2933"
> This test case is about as small as I could make it and still show the 
> problem; when I reduced the file to just one line of text, then the text came 
> though correctly.
> This problem shows up in RC2 and the latest development build.  I believe it 
> was OK in the development build from Nov 4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3130) Recent regression in PDFTextStripper, text getting garbled

Reply via email to