[ 
https://issues.apache.org/jira/browse/PDFBOX-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Balistreri updated PDFBOX-2402:
---------------------------------------
    Description: 
The NonSequentialPDFParser fails if an object has a spurious closing tag (for 
example, a PDFArray with two closing brackets). In lenient mode, it would be 
good to at least attempt recovering from that. The attached patch, instead of 
throwing an exception in case the endObject string is not "endobj" or " obj", 
skips a character (the spurious character) and tries reading a string. It 
continues until either the file ends or an "endobj" is found.

I have a document where this worked but I am not allowed to upload it, 
unfortunately. In any case the patch cannot make things worse, since it 
replaces throwing an exception with at least attempting to recover from it.

  was:
The NonSequentialParser fails if an object has a spurious closing tag (for 
example, a PDFArray with two closing brackets). In lenient mode, it would be 
good to at least attempt recovering from that. The attached patch, instead of 
throwing an exception in case the endObject string is not "endobj" or " obj", 
skips a character (the spurious character) and tries reading a string. It 
continues until either the file ends or an "endobj" is found.

I have a document where this worked but I am not allowed to upload it, 
unfortunately. In any case the patch cannot make things worse, since it 
replaces throwing an exception with at least attempting to recover from it.


> NonSequentialPDFParser cannot recover from spurious closing brackets
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-2402
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2402
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Michele Balistreri
>         Attachments: NonSequentialPDFParser.patch
>
>
> The NonSequentialPDFParser fails if an object has a spurious closing tag (for 
> example, a PDFArray with two closing brackets). In lenient mode, it would be 
> good to at least attempt recovering from that. The attached patch, instead of 
> throwing an exception in case the endObject string is not "endobj" or " obj", 
> skips a character (the spurious character) and tries reading a string. It 
> continues until either the file ends or an "endobj" is found.
> I have a document where this worked but I am not allowed to upload it, 
> unfortunately. In any case the patch cannot make things worse, since it 
> replaces throwing an exception with at least attempting to recover from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to