Regarding having equals() effectively delegate to getName().equals(other.getName()) && equivalent(other) means that we need to be extra special careful about implementations of hashCode() :
If we are not going to break the contract between equals() and hashCode(), and we're having equals() *only* take into account the mathematical contents and the name, then I'd say what we need to do is implement hashCode() in a top level class like AbstractVector. (Is something funny going on with JIRA? Seems broken...) -jake On Wed, Sep 30, 2009 at 10:01 AM, Sean Owen (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760956#action_12760956] > > Sean Owen commented on MAHOUT-165: > ---------------------------------- > > Are my conclusions sound then: > > We agree that equals() should be 'pretty strict'. The conventional Java > wisdom is that equals(), in fact, ought not return true for instances of > differing classes, unless you really know what you're doing. I guess we do. > :) > > If the idea behind equals() is "do class-specific stuff, otherwise, check > names, and use equivalent() then", then we don't need strictEquivalence() -- > where's it used? > > (If I represented the logic correctly above -- is that as simple as we can > make it? seems a touch complex) > > I am not sure anything is 'broken' in practice here but I sense it could be > simpler. > > > > Using better primitives hash for sparse vector for performance gains > > -------------------------------------------------------------------- > > > > Key: MAHOUT-165 > > URL: https://issues.apache.org/jira/browse/MAHOUT-165 > > Project: Mahout > > Issue Type: Improvement > > Components: Matrix > > Affects Versions: 0.2 > > Reporter: Shashikant Kore > > Assignee: Grant Ingersoll > > Fix For: 0.2 > > > > Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165.patch, > mahout-165.patch > > > > > > In SparseVector, we need primitives hash map for index and values. The > present implementation of this hash map is not as efficient as some of the > other implementations in non-Apache projects. > > In an experiment, I found that, for get/set operations, the primitive > hash of Colt performance an order of magnitude better than > OrderedIntDoubleMapping. For iteration it is 2x slower, though. > > Using Colt in Sparsevector improved performance of canopy generation. For > an experimental dataset, the current implementation takes 50 minutes. Using > Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the > delay. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >