[ 
https://issues.apache.org/jira/browse/MAHOUT-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841181#action_12841181
 ] 

Sean Owen commented on MAHOUT-320:
----------------------------------

After the fact here, but I'm going to fix a number of small issues in 
IntPairWritable, and three that I think are important enough to call out. I 
keep seeing these kinds of subtle bugs:

- hashCode() and equals() not only must both be implemented, but must be 
consistent. Frequency uses 'pair' in hashCode() but not in equals()
   - and you can hash doubles much faster with RandomUtils.hashDouble()

- compareTo() must be consistent with equals(). It returns 0 exactly when 
equals() returns true. Frequency.compareTo() can never return 0, which is never 
correct, even if only because this breaks symmetry

- compareTo() that decides ordering based on two integer values should never be 
implemented with a subtraction, unless you're absolutely certain the following 
can't occur: trying to order a=Integer.MAX_VALUE and b=Integer.MIN_VALUE will 
put a before b, if you base it on a-b's value, since it overflows to negative.

> Modify IntPairWritable in LDA implementation to be binary comparable to 
> improve performance.
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-320
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-320
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Assignee: Robin Anil
>            Priority: Minor
>         Attachments: MAHOUT-320.patch, MAHOUT-320.patch, MAHOUT-320.patch, 
> MAHOUT-320.patch, MAHOUT-320.patch
>
>
> Per discussion with Robin, modifying o.a.m.clustering.lda.IntPairWritable to 
> be binary comparable will improve the performance of the comparison 
> operations during a sort because no marshaling will need to occur to compare 
> IntPairWritable instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to