[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189176#comment-17189176
 ] 

Christian Appl edited comment on PDFBOX-4723 at 9/2/20, 11:41 AM:
------------------------------------------------------------------

I quote:
 "there is a difference between identity and equality. e.g. two number objects 
with a content of 12 might be equal but these are still different objects."

This statement is absolutely correct - while it's opposite is also true:
 When an object changed and it's content is now 13 instead of 12, it remains 
the same identical object.

The native implementation of the Object#hashCode() Method states the following 
contract:

{color:#8c8c8c}The general contract of \{@code hashCode} 
is:{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<ul>{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}Whenever it is invoked on the 
same object more than once during{color}{color:#8c8c8c}* an execution of a Java 
application, the \{@code hashCode} method{color}{color:#8c8c8c}* must 
consistently return the same integer, provided no 
information{color}{color:#8c8c8c}* used in \{@code equals} comparisons on the 
object is modified.{color}{color:#8c8c8c}* This integer need not remain 
consistent from one execution of an{color}{color:#8c8c8c}* application to 
another execution of the same application.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}If two objects are equal 
according to the \{@code equals(Object)}{color}{color:#8c8c8c}* method, then 
calling the \{@code hashCode} method on each of{color}{color:#8c8c8c}* the two 
objects must produce the same integer result.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}It is 
{color}{color:#8c8c8c}<em>{color}{color:#8c8c8c}not{color}{color:#8c8c8c}</em>{color}{color:#8c8c8c}
 required that if two objects are unequal{color}{color:#8c8c8c}* according to 
the {@link 
{color}{color:#000000}java.lang.Object{color}{color:#8c8c8c}#equals({color}{color:#000000}java.lang.Object{color}{color:#8c8c8c})}{color}{color:#8c8c8c}*
 method, then calling the \{@code hashCode} method on each of 
the{color}{color:#8c8c8c}* two objects must produce distinct integer results. 
However, the{color}{color:#8c8c8c}* programmer should be aware that producing 
distinct integer results{color}{color:#8c8c8c}* for unequal objects may improve 
the performance of hash tables.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}</ul>{color}

Following Revision 
 7c4abe695fbc73e9ce71ed37218aa608db4180ed 30.01.2020 "port equals and hashCode 
from trunk to 2.0; add SmallMap for COSDictionary; fix PDPageTree"
 the class COSStream overrides the methods equals and hashCode.

If an object contained in a HashTable/Map (as seen in COSWriter) is searched 
using:

key = {color:#871094}objectKeys{color}.get(actual);
 then, said HashTable is using the following method, to find the value for said 
object:

{color:#0033b3}public synchronized {color}{color:#007e8a}V 
{color}{color:#00627a}get{color}({color:#000000}Object {color}key) {
 {color:#000000}Entry{color}<?,?> {color:#000000}tab{color}[] = 
{color:#871094}table{color};
 {color:#0033b3}int {color}{color:#000000}hash {color}= key.hashCode();
 {color:#0033b3}int {color}{color:#000000}index {color}= ({color:#000000}hash 
{color}& {color:#1750eb}0x7FFFFFFF{color}) % 
{color:#000000}tab{color}.{color:#871094}length{color};
 {color:#0033b3}for {color}({color:#000000}Entry{color}<?,?> e = 
{color:#000000}tab{color}[{color:#000000}index{color}] ; e != 
{color:#0033b3}null {color}; e = e.{color:#871094}next{color}) {
 {color:#0033b3}if {color}((e.{color:#871094}hash {color}== 
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
 {color:#0033b3}return 
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
 }
 }
 {color:#0033b3}return null{color};
 }

COSStream is implementing the hashCode method as:

{color:#8c8c8c}/**{color}{color:#8c8c8c} * \{@inheritDoc}{color}{color:#8c8c8c} 
*/{color}{color:#9e880d}@Override{color}{color:#0033b3}public int 
{color}{color:#00627a}hashCode{color}() {
 {color:#000000}Object{color}[] {color:#000000}members {color}= {items, 
randomAccess, {color:#871094}scratchFile{color}, {color:#871094}isWriting};
 return 
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
 }

Where "randomAccess" is a non final field that may change at any time. A 
constant and consistent hash value can not be reached using this 
implementation, whenever the hashvalue of one of the fields changes (or one of 
those values changes), then also the hash value of the given COSStream, will 
change.
 Resulting in a situation, where the identical object may be recognized as 
equal according to:

{color:#0033b3}public boolean 
{color}{color:#00627a}equals{color}({color:#000000}Object {color}o) {
 {color:#0033b3}if {color}(o == {color:#0033b3}this{color})
 {
 {color:#0033b3}return true{color};
 }
 [...]

But it's Hashvalue will still differ.

This can lead to a paradoxical situation, where a HashTable clearly contains an 
object, that is identical to the searched object (==), but the HashTable will 
still claim, that such an object is neither contained, nor can it's value be 
found in the HashTable, when the hash value of said object has changed in the 
meantime - for some reason.

According to:
 "{color:#8c8c8c}Whenever it is invoked on the same object more than once 
during an execution of a Java application, the \{@code hashCode} method must 
consistently return the same integer, provided no information
 used in \{@code equals} comparisons on the object is modified.{color}"

Conclusion:
 Either the implementation of "equals()" is erroneous, as the information 
determining the hash value has changed and such objects may not be considered 
as equal. (Even when poiniting to the same exact identical instance.) Which 
would lead to the conclusion, that even though this == obj is true, the objects 
must and can not be equal, as the object's state (content) has changed!

Or the implementation of "hashCode()" is erroneous, because information are 
considered for the hashcode calculation, that are not tested for and are not 
checked in the implementation of equals.

Or the implementation of "hashCode()" is erroneous and the hashcode of said 
instance should be constant no matter what. (Which is implied by using "o == 
this" in equals).

As (according to contract) the hash value of an object should be consistent 
when no information changed, that would be used to determine the equality of 
this and another object.
 However you want to define the role and purpose of equals, this behaviour is 
inconsistent and violates the contract given above.

Please reevaluate and rethink overriding equals and hashCode! It is easy to 
provoke situations where this implementation leads to paradox results for 
java.util classes such as Collections and Maps.
 The intended changes are highly problematic. (as this behaviour of COSStream 
is - should it still be contained in the current revision.)

Theoretically:
 A new ContentStream is created (let us call this instance at this exact point 
in time A)
 Now "createOutputStream" is called for this instance, resulting in:

{color:#000000}IOUtils{color}.closeQuietly({color:#871094}randomAccess{color});
 {color:#871094}randomAccess {color}= 
{color:#871094}scratchFile{color}.createBuffer(); (the instance at this point 
in time is called B)

This results in a change to the Hashcode of that object.  The hashcodes of 
object A and object B are not equal.
 If the "equals" method was adapted accordingly to also consider the content 
information, that is checked for in "hashCode", then this would mean that A 
does not equal B even though A is identical to B. (identical instance.)

Is this the intended behaviour?

Please revert this!


was (Author: capsvd):
I quote:
 "there is a difference between identity and equality. e.g. two number objects 
with a content of 12 might be equal but these are still different objects."

This statement is absolutely correct - while it's opposite is also true:
 When an object changed and it's content is now 13 instead of 12, it remains 
the same identical object.

The native implementation of the Object#hashCode() Method states the following 
contract:

{color:#8c8c8c}The general contract of \{@code hashCode} 
is:{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<ul>{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}Whenever it is invoked on the 
same object more than once during{color}{color:#8c8c8c}* an execution of a Java 
application, the \{@code hashCode} method{color}{color:#8c8c8c}* must 
consistently return the same integer, provided no 
information{color}{color:#8c8c8c}* used in \{@code equals} comparisons on the 
object is modified.{color}{color:#8c8c8c}* This integer need not remain 
consistent from one execution of an{color}{color:#8c8c8c}* application to 
another execution of the same application.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}If two objects are equal 
according to the \{@code equals(Object)}{color}{color:#8c8c8c}* method, then 
calling the \{@code hashCode} method on each of{color}{color:#8c8c8c}* the two 
objects must produce the same integer result.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}It is 
{color}{color:#8c8c8c}<em>{color}{color:#8c8c8c}not{color}{color:#8c8c8c}</em>{color}{color:#8c8c8c}
 required that if two objects are unequal{color}{color:#8c8c8c}* according to 
the {@link 
{color}{color:#000000}java.lang.Object{color}{color:#8c8c8c}#equals({color}{color:#000000}java.lang.Object{color}{color:#8c8c8c})}{color}{color:#8c8c8c}*
 method, then calling the \{@code hashCode} method on each of 
the{color}{color:#8c8c8c}* two objects must produce distinct integer results. 
However, the{color}{color:#8c8c8c}* programmer should be aware that producing 
distinct integer results{color}{color:#8c8c8c}* for unequal objects may improve 
the performance of hash tables.{color}{color:#8c8c8c}* 
{color}{color:#8c8c8c}</ul>{color}

Following Revision 
 7c4abe695fbc73e9ce71ed37218aa608db4180ed 30.01.2020 "port equals and hashCode 
from trunk to 2.0; add SmallMap for COSDictionary; fix PDPageTree"
 the class COSStream overrides the methods equals and hashCode.

If an object contained in a HashTable/Map (as seen in COSWriter) is searched 
using:

key = {color:#871094}objectKeys{color}.get(actual);
 then, said HashTable is using the following method, to find the value for said 
object:

{color:#0033b3}public synchronized {color}{color:#007e8a}V 
{color}{color:#00627a}get{color}({color:#000000}Object {color}key) {
 {color:#000000}Entry{color}<?,?> {color:#000000}tab{color}[] = 
{color:#871094}table{color};
 {color:#0033b3}int {color}{color:#000000}hash {color}= key.hashCode();
 {color:#0033b3}int {color}{color:#000000}index {color}= ({color:#000000}hash 
{color}& {color:#1750eb}0x7FFFFFFF{color}) % 
{color:#000000}tab{color}.{color:#871094}length{color};
 {color:#0033b3}for {color}({color:#000000}Entry{color}<?,?> e = 
{color:#000000}tab{color}[{color:#000000}index{color}] ; e != 
{color:#0033b3}null {color}; e = e.{color:#871094}next{color}) {
 {color:#0033b3}if {color}((e.{color:#871094}hash {color}== 
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
 {color:#0033b3}return 
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
 }
 }
 {color:#0033b3}return null{color};
 }

COSStream is implementing the hashCode method as:

{color:#8c8c8c}/**{color}{color:#8c8c8c} * \{@inheritDoc}{color}{color:#8c8c8c} 
*/{color}{color:#9e880d}@Override{color}{color:#0033b3}public int 
{color}{color:#00627a}hashCode{color}() {
 {color:#000000}Object{color}[] {color:#000000}members {color}= {items, 
randomAccess, {color:#871094}scratchFile{color}, 
{color:#871094}isWriting\{color}};
 return 
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
 }

Where "randomAccess" is a non final field that may change at any time. A 
constant and consistent hash value can not be reached using this 
implementation, whenever the hashvalue of one of the fields changes (or one of 
those values changes), then also the hash value of the given COSStream, will 
change.
 Resulting in a situation, where the identical object may be recognized as 
equal according to:

{color:#0033b3}public boolean 
{color}{color:#00627a}equals{color}({color:#000000}Object {color}o) {
 {color:#0033b3}if {color}(o == {color:#0033b3}this{color})
 {
 {color:#0033b3}return true{color};
 }
 [...]

But it's Hashvalue will still differ.

This can lead to a paradoxical situation, where a HashTable clearly contains an 
object, that is identical to the searched object (==), but the HashTable will 
still claim, that such an object is neither contained, nor can it's value be 
found in the HashTable, when the hash value of said object has changed in the 
meantime - for some reason.

According to:
 "{color:#8c8c8c}Whenever it is invoked on the same object more than once 
during an execution of a Java application, the \{@code hashCode} method must 
consistently return the same integer, provided no information
 used in \{@code equals} comparisons on the object is modified.{color}"

Conclusion:
 Either the implementation of "equals()" is erroneous, as the information 
determining the hash value has changed and such objects may not be considered 
as equal. (Even when poiniting to the same exact identical instance.) Which 
would lead to the conclusion, that even though this == obj is true, the objects 
must and can not be equal, as the object's state (content) has changed!

Or the implementation of "hashCode()" is erroneous, because information are 
considered for the hashcode calculation, that are not tested for and are not 
checked in the implementation of equals.

Or the implementation of "hashCode()" is erroneous and the hashcode of said 
instance should be constant no matter what. (Which is implied by using "o == 
this" in equals).

As (according to contract) the hash value of an object should be consistent 
when no information changed, that would be used to determine the equality of 
this and another object.
 However you want to define the role and purpose of equals, this behaviour is 
inconsistent and violates the contract given above.

Please reevaluate and rethink overriding equals and hashCode! It is easy to 
provoke situations where this implementation leads to paradox results for 
java.util classes such as Collections and Maps.
 The intended changes are highly problematic. (as this behaviour of COSStream 
is - should it still be contained in the current revision.)

Theoretically:
 A new ContentStream is created (let us call this instance at this exact point 
in time A)
 Now "createOutputStream" is called for this instance, resulting in:

{color:#000000}IOUtils{color}.closeQuietly({color:#871094}randomAccess{color});
 {color:#871094}randomAccess {color}= 
{color:#871094}scratchFile{color}.createBuffer(); (the instance at this point 
in time is called B)

This results in a change to the Hashcode of that object.  The hashcodes of 
object A and object B are not equal.
 If the "equals" method was adapted accordingly to also consider the content 
information, that is checked for in "hashCode", then this would mean that A 
does not equal B even though A is identical to B. (identical instance.)

Is this the intended behaviour?

Please revert this!

> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4723
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4723
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: PDModel
>    Affects Versions: 2.0.18
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: bird_burst.heic.pdf, screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to