[ 
https://issues.apache.org/jira/browse/PDFBOX-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-1561.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.9.0
         Assignee: Andreas Lehmkühler

I fixed the issue in revision 1469558.

The pdf contains a lot of inline images. Obviously only "space" and "carriage 
return" are allowd as whitespace character when parsing those inline images. 
Otherwise the parser might prematurely stop reading the image data which leads 
to a corrupted stream.
                
> PDFBox throws exception with PDFTextStripper.getText 
> -----------------------------------------------------
>
>                 Key: PDFBOX-1561
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1561
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.0
>            Reporter: Markus Griesser
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.9.0
>
>         Attachments: energieausweis.zip
>
>
> I am using the .NET port of PDFBox 1.7.0. Calling PDFTextStripper::getText 
> throws exception
> java.io.IOException: Not a number: +
> with callstack
>    bei org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext()
>    bei org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext()
>    bei org.apache.pdfbox.util.PDFStreamEngine.processSubStream(COSStream )
>    bei org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDPage pdp, 
> PDResources pdr, COSStream coss)
>    bei org.apache.pdfbox.util.PDFStreamEngine.processStream(PDPage pdp, 
> PDResources pdr, COSStream coss)
>    bei org.apache.pdfbox.util.PDFTextStripper.processPage(PDPage pdp, 
> COSStream coss)
>    bei org.apache.pdfbox.util.PDFTextStripper.processPages(List l)
>    bei org.apache.pdfbox.util.PDFTextStripper.writeText(PDDocument pdd, 
> Writer w)
>    bei org.apache.pdfbox.util.PDFTextStripper.getText(PDDocument pdd)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to