[ 
https://issues.apache.org/jira/browse/PDFBOX-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655671#comment-17655671
 ] 

Andreas Lehmkühler edited comment on PDFBOX-5178 at 1/7/23 10:51 AM:
---------------------------------------------------------------------

I've added support for the index of the objects within a compressed object 
stream. But it is limited to those streams where the object numbers aren't 
unique as in the given file. But this doesn't fix the issue. PDFBox 
accidentally refers to the correct object. The issue was introduced when I 
"optimized" the parser to stop reading malformed dictionaries to avoid endless 
loops. I'm still searching for the correct ticket. BTW, 2.0.24 is affected as 
well.

UPDATE: looks like PDFBOX-5163 is the root cause. I'm investigating

UPDATE2: It seems tzo be more complicated. The definition of the objects within 
the object stream looks broken starting with object number 6

{code}

2 0 3 37 6 214 14 241 14 299 22 604 24 939 26 1912 27 2947 29 3084 30 3138

{code}

The offset for object 6 should be 289 (214 + 75) but is 288. The first entry 
for object 14 belongs to object 13 and after that is getting confusing

 


was (Author: lehmi):
I've added support for the index of the objects within a compressed object 
stream. But it is limited to those streams where the object numbers aren't 
unique as in the given file. But this doesn't fix the issue. PDFBox 
accidentally refers to the correct object. The issue was introduced when I 
"optimized" the parser to stop reading malformed dictionaries to avoid endless 
loops. I'm still searching for the correct ticket. BTW, 2.0.24 is affected as 
well.

UPDATE: looks like PDFBOX-5163 is the root cause. I'm investigating

> Parsing differences between 2.0.23 and 2.0.24/3.0
> -------------------------------------------------
>
>                 Key: PDFBOX-5178
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5178
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.23, 3.0.0 PDFBox
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>         Attachments: poppler-704-0.pdf
>
>
> There are some weird differences in parsing the attached file, 2.0.23 shows 
> "BigTIFF.tif" in the /Contents of the first annotation and a loop at 
> Root/Pages/Kids/[0]/Annots/[0]/FS (always 14 0 R), while 3.0 doesn't have 
> that, but doesn't have "BigTIFF.tif". I'm not sure which one (if any) is 
> wrong.
>  
> UPDATE
> 2.0.24 shows the same behaviour as 3.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to