Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Grant Ingersoll Wed, 30 Sep 2009 13:17:16 -0700


On Sep 30, 2009, at 4:03 PM, Jake Mannix wrote:

Regarding having equals() effectively delegate to
getName().equals(other.getName()) && equivalent(other) means that weneed to
be extra special careful about implementations of hashCode() :
If we are not going to break the contract between equals() andhashCode(),and we're having equals() *only* take into account the mathematicalcontentsand the name, then I'd say what we need to do is implement hashCode() in a
top level class like AbstractVector.


That is what is happening.


(Is something funny going on with JIRA?  Seems broken...)


Yes, there is something wrong.  Infra is aware of it.

 -jake
On Wed, Sep 30, 2009 at 10:01 AM, Sean Owen (JIRA) <j...@apache.org>wrote:
  [
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760956#action_12760956]

Sean Owen commented on MAHOUT-165:
----------------------------------

Are my conclusions sound then:
We agree that equals() should be 'pretty strict'. The conventionalJavawisdom is that equals(), in fact, ought not return true forinstances ofdiffering classes, unless you really know what you're doing. Iguess we do.
:)
If the idea behind equals() is "do class-specific stuff, otherwise,checknames, and use equivalent() then", then we don't needstrictEquivalence() --
where's it used?
(If I represented the logic correctly above -- is that as simple aswe can
make it? seems a touch complex)
I am not sure anything is 'broken' in practice here but I sense itcould be
simpler.
Using better primitives hash for sparse vector for performance gains
--------------------------------------------------------------------

               Key: MAHOUT-165
               URL: https://issues.apache.org/jira/browse/MAHOUT-165
           Project: Mahout
        Issue Type: Improvement
        Components: Matrix
  Affects Versions: 0.2
          Reporter: Shashikant Kore
          Assignee: Grant Ingersoll
           Fix For: 0.2
Attachments: colt.jar, mahout-165-trove.patch,MAHOUT-165.patch,
mahout-165.patch
In SparseVector, we need primitives hash map for index and values.The
present implementation of this hash map is not as efficient as someof the
other implementations in non-Apache projects.
In an experiment, I found that, for get/set operations, theprimitive
hash of  Colt performance an order of magnitude better than
OrderedIntDoubleMapping. For iteration it is 2x slower, though.
Using Colt in Sparsevector improved performance of canopygeneration. For
an experimental dataset, the current implementation takes 50minutes. UsingColt, reduces this duration to 19-20 minutes. That's 60% reductionin the
delay.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to