[
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189176#comment-17189176
]
Christian Appl commented on PDFBOX-4723:
----------------------------------------
I quote:
"there is a difference between identity and equality. e.g. two number objects
with a content of 12 might be equal but these are still different objects."
This statement is absolutely correct - while it's opposite is also true:
When an object changed and it's content is now 13 instead of 12, it remains the
same identical object.
The native implementation of the Object#hashCode() Method states the following
contract:
{color:#8c8c8c}The general contract of \{@code hashCode} is:
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<ul>{color}{color:#8c8c8c}
{color}{color:#8c8c8c}*
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}Whenever it is invoked on the
same object more than once during
{color}{color:#8c8c8c}* an execution of a Java application, the \{@code
hashCode} method
{color}{color:#8c8c8c}* must consistently return the same integer, provided no
information
{color}{color:#8c8c8c}* used in \{@code equals} comparisons on the object is
modified.
{color}{color:#8c8c8c}* This integer need not remain consistent from one
execution of an
{color}{color:#8c8c8c}* application to another execution of the same
application.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}If two
objects are equal according to the \{@code equals(Object)}
{color}{color:#8c8c8c}* method, then calling the \{@code hashCode} method on
each of
{color}{color:#8c8c8c}* the two objects must produce the same integer result.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}It is
{color}{color:#8c8c8c}<em>{color}{color:#8c8c8c}not{color}{color:#8c8c8c}</em>{color}{color:#8c8c8c}
required that if two objects are unequal
{color}{color:#8c8c8c}* according to the {@link
{color}{color:#000000}java.lang.Object{color}{color:#8c8c8c}#equals({color}{color:#000000}java.lang.Object{color}{color:#8c8c8c})}
{color}{color:#8c8c8c}* method, then calling the \{@code hashCode} method on
each of the
{color}{color:#8c8c8c}* two objects must produce distinct integer results.
However, the
{color}{color:#8c8c8c}* programmer should be aware that producing distinct
integer results
{color}{color:#8c8c8c}* for unequal objects may improve the performance of hash
tables.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}</ul>{color}
Following Revision
7c4abe695fbc73e9ce71ed37218aa608db4180ed 30.01.2020 "port equals and hashCode
from trunk to 2.0; add SmallMap for COSDictionary; fix PDPageTree"
the class COSStream overrides the methods equals and hashCode.
If an object contained in a HashTable/Map (as seen in COSWriter) is searched
using:
key = {color:#871094}objectKeys{color}.get(actual);
then, said HashTable is using the following method, to find the value for said
object:
{color:#0033b3}public synchronized {color}{color:#007e8a}V
{color}{color:#00627a}get{color}({color:#000000}Object {color}key) {
{color:#000000}Entry{color}<?,?> {color:#000000}tab{color}[] =
{color:#871094}table{color};
{color:#0033b3}int {color}{color:#000000}hash {color}= key.hashCode();
{color:#0033b3}int {color}{color:#000000}index {color}= ({color:#000000}hash
{color}& {color:#1750eb}0x7FFFFFFF{color}) %
{color:#000000}tab{color}.{color:#871094}length{color};
{color:#0033b3}for {color}({color:#000000}Entry{color}<?,?> e =
{color:#000000}tab{color}[{color:#000000}index{color}] ; e !=
{color:#0033b3}null {color}; e = e.{color:#871094}next{color}) {
{color:#0033b3}if {color}((e.{color:#871094}hash {color}==
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
{color:#0033b3}return
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
}
}
{color:#0033b3}return null{color};
}
COSStream is implementing the hashCode method as:
{color:#8c8c8c}/**
{color}{color:#8c8c8c} * \{@inheritDoc}
{color}{color:#8c8c8c} */
{color}{color:#9e880d}@Override
{color}{color:#0033b3}public int {color}{color:#00627a}hashCode{color}() {
{color:#000000}Object{color}[] {color:#000000}members {color}=
{{color:#871094}items{color}, {color:#871094}randomAccess{color},
{color:#871094}scratchFile{color}, {color:#871094}isWriting{color}};
{color:#0033b3}return
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
}
Where "randomAccess" is a non final field that may change at any time. A
constant and consistent hash value can not be reached using this
implementation, whenever the hashvalue of one of the fields changes (or one of
those values changes), then also the hash value of the given COSStream, will
change.
Resulting in a situation, where the identical object may be recognized as equal
according to:
{color:#0033b3}public boolean
{color}{color:#00627a}equals{color}({color:#000000}Object {color}o) {
{color:#0033b3}if {color}(o == {color:#0033b3}this{color})
{
{color:#0033b3}return true{color};
}
[...]
But it's Hashvalue will still differ.
This can lead to a paradoxical situation, where a HashTable clearly contains an
object, that is identical to the searched object (==), but the HashTable will
still claim, that such an object is neither contained, nor can it's value be
found in the HashTable, when the hash value of said object has changed in the
meantime - for some reason.
According to:
"{color:#8c8c8c}Whenever it is invoked on the same object more than once during
an execution of a Java application, the \{@code hashCode} method must
consistently return the same integer, provided no information
used in \{@code equals} comparisons on the object is modified.{color}"
Conclusion:
Either the implementation of "equals()" is erroneous, as the information
determining the hash value has changed and such objects may not be considered
as equal. (Even when poiniting to the same exact identical instance.) Which
would lead to the conclusion, that even though this == obj is true, the objects
must and can not be equal, as the object's state (content) has changed!
Or the implementation of "hashCode()" is erroneous, because information are
considered for the hashcode calculation, that are not tested for and are not
checked in the implementation of equals.
Or the implementation of "hashCode()" is erroneous and the hashcode of said
instance should be constant no matter what. (Which is implied by using "o ==
this" in equals).
As (according to contract) the hash value of an object should be consistent
when no information changed, that would be used to determine the equality of
this and another object.
However you want to define the role and purpose of equals, this behaviour is
inconsistent and violates the contract given above.
Please reevaluate and rethink overriding equals and hashCode! It is easy to
provoke situations where this implementation leads to paradox results for
java.util classes such as Collections and Maps.
The intended changes are highly problematic. (as this behaviour of COSStream is
- should it still be contained in the current revision.)
Theoretically:
A new ContentStream is created (let us call this instance at this exact point
in time A)
Now "createOutputStream" is called for this instance, resulting in:
{color:#000000}IOUtils{color}.closeQuietly({color:#871094}randomAccess{color});
{color:#871094}randomAccess {color}=
{color:#871094}scratchFile{color}.createBuffer(); (the instance at this point
in time is called B)
This results in a change to the Hashcode of that object. The hashcodes of
object A and object B are not equal.
If the "equals" method was adapted accordingly to also consider the content
information, that is checked for in "hashCode", then this would mean that A
does not equal B even though A is identical to B. (identical instance.)
Is this the intended behaviour?
Please revert this!
> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
> Issue Type: Sub-task
> Components: PDModel
> Affects Versions: 2.0.18
> Reporter: Maruan Sahyoun
> Assignee: Maruan Sahyoun
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: bird_burst.heic.pdf, screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to
> detect if entries are in fact duplicates of others. This currently fails as
> even though one might add the same instance of an annotation object multiple
> times to setAnnotations getting the annotations will have individual
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality
> where an object is equal if the underlying COSDictionary has the same entries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]