[ 
https://issues.apache.org/jira/browse/MATH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082552#comment-18082552
 ] 

Alex Herbert commented on MATH-1689:
------------------------------------

The hashCode implementation has an unreachable code path. The freqTable is 
final and is never null. The hashCode method should simply return the hashCode 
of the underlying TreeMap.

Note: This class also has a very inefficient toString implementation with 2 
calls to getSumFreq for each entry and each iteration performing a cumulative 
sum up to the entry value. I think the class is definitely legacy code that 
requires some updating. It is only used in StatUtils to get the mode of a range 
of double values.

Regarding behaviour, if two frequency tables contain the same frequency counts 
they are the same. This is irrespective of the order used to create a 
cumulative frequency. That is a behaviour of analysis on the exact same data.

Note that java.util.TreeMap does not implement equals. The implementation is in 
java.util.AbstractMap where it compares the two maps contain the same mappings. 
So the JDK's own interpretation of this is that two maps are the same 
irrespective of their sorted order. E.g. a HashMap and a TreeMap are the same 
with the same contents, even though the TreeMap is sorted.

I think it is fine to document this as ignoring the comparator for the equals 
and hashCode result.

> Frequency.equals()/hashCode() ignores the comparator
> ----------------------------------------------------
>
>                 Key: MATH-1689
>                 URL: https://issues.apache.org/jira/browse/MATH-1689
>             Project: Commons Math
>          Issue Type: Bug
>          Components: legacy
>            Reporter: Ruiqi Dong
>            Priority: Minor
>
> *Summary*
> Frequency behavior depends on the comparator used by its internal map, but 
> equals() and hashCode() ignore that comparator. Two Frequency objects with 
> different ordering semantics can therefore compare equal.
> *Affected code*
> File: 
> commons-math-legacy/src/main/java/org/apache/commons/math4/legacy/stat/Frequency.java
> {code:java}
> public long getCumFreq(T v) {
>     ...
>     Comparator<? super T> c = freqTable.comparator();
>     if (c == null) {
>         c = new NaturalComparator<>();
>     }
>     ...
> }
> public List<T> getMode() {
>     ...
> }
> @Override
> public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + ((freqTable == null) ? 0 : 
> freqTable.hashCode());
>     return result;
> }
> @Override
> public boolean equals(Object obj) {
>     ...
>     Frequency<?> other = (Frequency<?>) obj;
>     return freqTable.equals(other.freqTable);
> } {code}
>  
> *Reproducer*
> Add the following test to 
> commons-math-legacy/src/test/java/org/apache/commons/math4/legacy/stat/FrequencyTest.java:
> {code:java}
> @Test
> public void testEqualsShouldConsiderComparator() {
>     Frequency<Integer> ascending = new Frequency<>();
>     Frequency<Integer> descending = new 
> Frequency<Integer>(Comparator.reverseOrder());
>     ascending.addValue(1);
>     ascending.addValue(2);
>     descending.addValue(1);
>     descending.addValue(2);
>     Assert.assertEquals(1, ascending.getCumFreq(1));
>     Assert.assertEquals(2, descending.getCumFreq(1));
>     Assert.assertEquals("[1, 2]", ascending.getMode().toString());
>     Assert.assertEquals("[2, 1]", descending.getMode().toString());
>     Assert.assertNotEquals(ascending, descending);
> } {code}
> Run:
> {code:java}
> mvn -q -pl commons-math-legacy 
> -Dtest=org.apache.commons.math4.legacy.stat.FrequencyTest test {code}
> Observed behavior:
> {code:java}
> FrequencyTest.testEqualsShouldConsiderComparator:223
> Values should be different. Actual:
> Value          Freq.   Pct.    Cum Pct.
> 2     1       50%     50%
> 1     1       50%     100% {code}
> Expected behavior:
> Frequencies with different comparators should not compare equal when their 
> observable ordering behavior differs.
>  
> The comparator affects at least getCumFreq() and getMode(). Since those 
> outputs differ, equality and hashing should not ignore the comparator.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to