[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760886#action_12760886
 ] 

Jake Mannix commented on MAHOUT-165:
------------------------------------

One test which is failing is the basic VectorTest case which checks equals() - 
what are we considering the contract of equals() to be on vectors?  I would 
normally assume the functionality in AbstractVector.equivalent() should be what 
equals() returns, but is this not done so we can compare while ignoring the 
name?  Or is there some more important reason why we say that a DenseVector and 
a SparseVector which are the same "vector" in the mathematical sense are not 
returning equals() as true on each other?

Speaking of which, why do we have these static methods for "equivalent()" and 
"strictEquivalence"?  Do we need something different from "equals()" which is 
true mathematical equals (currently the functionality of equivalent())?

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165.patch, 
> mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to