[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Ted Dunning (JIRA) Tue, 17 Nov 2009 11:19:07 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779058#action_12779058
 ]


Ted Dunning commented on MAHOUT-165:
------------------------------------

bq.    The colt tree could also be put into a separate module that lives 
alongside core, util, examples, built independently as a part of the maven 
build - optionally at first, activated via a build profile.

bq. +1 - I like this.

+1 as well.

bq.    As far as package names, would it be better to map cern.colt.* to 
org.apache.mahout.colt.* ? - that way there's no potential for the old being 
confused for the new in builds, etc.

bq. I personally think this is the way to go, but does it reduce confusion, or 
increase it? People who are used to using colt will see familiar classes, but 
in strange places. If we're really going to overhaul the whole library over 
time, this makes sense, I guess.

Absolutely.

Should all code be marked deprecated until it has unit tests with a comment to 
say why it is deprecated?  That would give us a clear visual signal that the 
lifeguard has left the pool.


> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to