[
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165864#comment-17165864
]
Christian Appl commented on PDFBOX-4723:
----------------------------------------
I am aware, that COSName and COSOperator already handled things a little
differently, as both were intended to represent "Singleton" like constructs. As
in: There is only one COSName.CONTENTS. I never gave that much of a thought, as
indeed this is a syntax element, that is always meaning exactly the same thing,
so all instances should also refer to this same thing.
In case of COSDictionary, COSArray and COSStream however I am claiming, that
even though contents may appear to be identical, this does not necessarily mean
the identity of instances of those classes. As I claim, that the contents of
page 1 describe another thing entirely, than the contents of page 2 - no matter
how similar they may seem, each page requires it's own content streams. Why is
that?
This is becoming obvious, when trying to customize the COSWriter in a rather
crude approach (...maybe something I admittedly shouldn't do...), but still:
When traversing the COS-structure - while skipping entries, that have already
been traversed - you will create pages without content streams for such
documents. This could be prevented by using an indirect reference to the same
COSObject for all pages, that seem to refer to the same contentstream
(following the content equality logic)... but would this solution still work,
when I attempted to alter a hereby created COSPage later? Would it change the
contents of all pages, as all are referring to the same Object? I think so.
When searching for solutions for this problem, I reached the conclusion: No
those Instances must be separate entities, as this would result in complicated
issues otherwise.
It may not be wise to touch the COSWriter at all, or even to dive too deep into
the inner workings of PDF / PDFBox (COS structure and the like) - I'm aware of
that... My changes will collide with yours at some point. However, this is why
I'm invested in this issue and why I think, that the changes discussed here,
will cause issues later.
> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
> Issue Type: Sub-task
> Components: PDModel
> Affects Versions: 2.0.18
> Reporter: Maruan Sahyoun
> Assignee: Maruan Sahyoun
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: bird_burst.heic.pdf, screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to
> detect if entries are in fact duplicates of others. This currently fails as
> even though one might add the same instance of an annotation object multiple
> times to setAnnotations getting the annotations will have individual
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality
> where an object is equal if the underlying COSDictionary has the same entries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]