[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Sean Owen (JIRA) Tue, 17 Nov 2009 07:44:03 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778944#action_12778944
 ]


Sean Owen commented on MAHOUT-165:
----------------------------------

I generally favor including stuff as an intact library rather than cracking it 
open and modifying it, all else equal. Because this is a first step down the 
road to forking, and I'd hate to fork Colt without good reason.

So I suppose the easiest thing is a .jar file with the unusable classes 
stripped out. That should be it. We don't need to remove classes that merely 
depend on hep.aida.*, unless a class we need depends directly or indirectly on 
a class that references it. Ideally that's not true; we'll see in practice. 
Even then, just a matter of taking those out too.

We don't necessarily need source and I'd favor not incorporating as source, and 
not modifying the package names. Yes its dependencies could be changed and 
updated but I'd say let's not bother until there is a case for doing so. What 
would we do when Colt is updated, re-port the updates? We'd have a fork on our 
hands and might as well just decide that's what we're doing.

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to