[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Ted Dunning (JIRA) Tue, 17 Nov 2009 13:10:04 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779134#action_12779134
 ]


Ted Dunning commented on MAHOUT-165:
------------------------------------

bq. +1 sounds like a good idea to me. It's just a matter of adding "@deprecated 
no unit tests.." tag to each class, no?

Correct.

bq. Regarding @deprecated: sounds a little aggressive to me, but not against 
it. 

I would very much like to do it so that we have some way of keeping track which 
parts have been validated and tested.

bq. .... Perhaps Commons Math is interested in it, too.

Commons math has been pretty aggressively uninterested in contributions lately. 
 I have been involved in some patches to make distributions and sampling more 
usable and much more widely available.  Jake was recently trying to help get 
sparse matrices to a usable state.  The result was lots of API whining and 
complete loss of momentum for improvement.  My own opinion is that it is not 
practical to contribute anything more than completely trivial items to 
commons-math and I say that we should go forward and not wait for them.


> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to