[ 
https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520164#comment-17520164
 ] 

Andreas Lehmkühler commented on PDFBOX-5413:
--------------------------------------------

I've added another check which ignores unknown objects and don't trigger the 
brute force search. In this case the expected object at the given offset is 
{{11 0}}. The check added in PDFBOX-5399 detects the trailing {{5}} and assumes 
something has to be wrong and triggers the brute force search. In the end the 
object is read as {{511 0}} and is missing. 

In this case there isn't any definition for an object with the number 511, so 
that fixing the obvious malformed pdf by replacing the number {{11 0}} with 
{{511 0}} leads to missing content. Let's assume that the offset, the found 
object itself is correct and the found digit {{5}} belongs to some garbage of 
the previous object. 




> Field text missing
> ------------------
>
>                 Key: PDFBOX-5413
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5413
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.26, 3.0.0 PDFBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>             Fix For: 2.0.26, 3.0.0 PDFBox
>
>         Attachments: CZIB6B5RY5HQDSEXXWSGUHSAP75CAI7Q.pdf
>
>
> The bottom field on page 2 ("AREA OF CONSIDERATION") is missing.
> This worked in 2.0.25. This is a weird case: incrementally written object 11 
> points to 0000102796. However there is a "5" just before the 11.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to