[ 
https://issues.apache.org/jira/browse/PDFBOX-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733612#comment-17733612
 ] 

Michael Klink edited comment on PDFBOX-5623 at 6/18/23 10:05 AM:
-----------------------------------------------------------------

{quote}The issue is in PDFXrefStreamParser's ObjectNumbers constructor, as it 
assumes that the COSInteger objects in the COSArray are necessarily sorted. In 
the case of the attached pdf, they are not, and this causes the parser to abort 
browsing the array too soon.{quote}
Strictly speaking, according to the specification that assumption is correct:

||Key||Type||Value||
|*Index*|array|(Optional) An array containing a pair of integers for each 
subsection in this section. The first integer shall be the first object number 
in the subsection; the second integer shall be the number of entries in the 
subsection
*The array shall be sorted in ascending order by object number.*
Subsections cannot overlap; an object number shall have no more than one entry 
in a section.
Default value: [0 Size].|
_(ISO 32000-2:2020 Table 17 — Additional entries specific to a cross-reference 
stream dictionary)_

Thus, the issue actually is that the PDF is broken.

So, even if the PDFBox developers decide to enable PDFBox to process your PDF, 
your customers are likely to run into problems again and again if they do not 
fix those PDFs.


was (Author: mkl):
{quote}The issue is in PDFXrefStreamParser's ObjectNumbers constructor, as it 
assumes that the COSInteger objects in the COSArray are necessarily sorted. In 
the case of the attached pdf, they are not, and this causes the parser to abort 
browsing the array too soon.{quote}
Strictly speaking, according to the specification that assumption is correct:

||Key||Type||Value||
|*Index*|array|(Optional) An array containing a pair of integers for each 
subsection in this section. The first integer shall be the first object number 
in the subsection; the second integer shall be the number of entries in the 
subsection
*The array shall be sorted in ascending order by object number.*
Subsections cannot overlap; an object number shall have no more than one entry 
in a section.
Default value: [0 Size].|
_(ISO 32000-2:2020 Table 17 — Additional entries specific to a cross-reference 
stream dictionary)_

Thus, the issue actually is that the PDF is broken.

So, even if PDFBox decides to enable PDFBox to process your PDF, your customers 
are likely to run into problems again and again if they do not fix those PDFs.

> Signature Image not Rendered starting with PDFBox 2.0.23 + patch provided
> -------------------------------------------------------------------------
>
>                 Key: PDFBOX-5623
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5623
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.23, 2.0.24, 2.0.25, 2.0.26, 2.0.27, 2.0.28
>         Environment: Java 8, Windows 10 and Ubuntu 22
>            Reporter: Lionel Fradin
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>         Attachments: 
> Fixing_the_problem_when_the_COSArray_is_not_sorted_in_increasing_order_.patch,
>  PDFBOX-issue-rendering-signature.pdf, pdfbox22-page9-br.jpg, 
> pdfbox23-page9-br.jpg
>
>
> We have an online service where our customers post their PDF files so that we 
> can render them. 
> One of our customer noticed recently that one of its signed document did not 
> show the image associated with the signature. They gave me the right to share 
> this document and you will find it attached 
> ([^PDFBOX-issue-rendering-signature.pdf]).
> The problem is in the last page, page 9. The issue can easily be reproduced 
> using pdfbox-app-2.0*.jar PDFToImage.
> Result with pdfbox 2.0.22 is:
> !pdfbox22-page9-br.jpg!
> Result with pdfbox 2.0.23 or later is:
> !pdfbox23-page9-br.jpg!
> The regression was introduced with commit (seen in git) 
> [f34a33824c4363b9b683245cb582328dc92b79ca|https://github.com/apache/pdfbox/commit/f34a33824c4363b9b683245cb582328dc92b79ca],
>  dated 2021-03-02 07:12:11+0000. The associated ticket was PDFBOX-5112.
> The issue is in PDFXrefStreamParser's ObjectNumbers constructor, as it 
> assumes that the COSInteger objects in the COSArray are necessarily sorted. 
> In the case of the attached pdf, they are not, and this causes the parser to 
> abort browsing the array too soon.
> I have a patch for that on branch 2.0: 
> [^Fixing_the_problem_when_the_COSArray_is_not_sorted_in_increasing_order_.patch]
> With this patch the image is created successfully. However, there are warning 
> that appear, that did not exist in version 2.0.22:
> {noformat}
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6789] found [6791]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6790] found [5327]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6791] found [6485]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6485] found [6789]
> {noformat}
> There may be additional fixes to be made in order to fully support this PDF. 
> I did not have time to investigate, and also my knowledge of the codebase if 
> fairly limited. So help would be appreciated here.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to