[ https://issues.apache.org/jira/browse/MAHOUT-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841181#action_12841181 ]
Sean Owen commented on MAHOUT-320: ---------------------------------- After the fact here, but I'm going to fix a number of small issues in IntPairWritable, and three that I think are important enough to call out. I keep seeing these kinds of subtle bugs: - hashCode() and equals() not only must both be implemented, but must be consistent. Frequency uses 'pair' in hashCode() but not in equals() - and you can hash doubles much faster with RandomUtils.hashDouble() - compareTo() must be consistent with equals(). It returns 0 exactly when equals() returns true. Frequency.compareTo() can never return 0, which is never correct, even if only because this breaks symmetry - compareTo() that decides ordering based on two integer values should never be implemented with a subtraction, unless you're absolutely certain the following can't occur: trying to order a=Integer.MAX_VALUE and b=Integer.MIN_VALUE will put a before b, if you base it on a-b's value, since it overflows to negative. > Modify IntPairWritable in LDA implementation to be binary comparable to > improve performance. > -------------------------------------------------------------------------------------------- > > Key: MAHOUT-320 > URL: https://issues.apache.org/jira/browse/MAHOUT-320 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.3 > Reporter: Drew Farris > Assignee: Robin Anil > Priority: Minor > Attachments: MAHOUT-320.patch, MAHOUT-320.patch, MAHOUT-320.patch, > MAHOUT-320.patch, MAHOUT-320.patch > > > Per discussion with Robin, modifying o.a.m.clustering.lda.IntPairWritable to > be binary comparable will improve the performance of the comparison > operations during a sort because no marshaling will need to occur to compare > IntPairWritable instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.