[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779134#action_12779134 ]
Ted Dunning commented on MAHOUT-165: ------------------------------------ bq. +1 sounds like a good idea to me. It's just a matter of adding "@deprecated no unit tests.." tag to each class, no? Correct. bq. Regarding @deprecated: sounds a little aggressive to me, but not against it. I would very much like to do it so that we have some way of keeping track which parts have been validated and tested. bq. .... Perhaps Commons Math is interested in it, too. Commons math has been pretty aggressively uninterested in contributions lately. I have been involved in some patches to make distributions and sampling more usable and much more widely available. Jake was recently trying to help get sparse matrices to a usable state. The result was lots of API whining and complete loss of momentum for improvement. My own opinion is that it is not practical to contribute anything more than completely trivial items to commons-math and I say that we should go forward and not wait for them. > Using better primitives hash for sparse vector for performance gains > -------------------------------------------------------------------- > > Key: MAHOUT-165 > URL: https://issues.apache.org/jira/browse/MAHOUT-165 > Project: Mahout > Issue Type: Improvement > Components: Matrix > Affects Versions: 0.2 > Reporter: Shashikant Kore > Assignee: Grant Ingersoll > Fix For: 0.3 > > Attachments: colt.jar, mahout-165-trove.patch, > MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch > > > In SparseVector, we need primitives hash map for index and values. The > present implementation of this hash map is not as efficient as some of the > other implementations in non-Apache projects. > In an experiment, I found that, for get/set operations, the primitive hash of > Colt performance an order of magnitude better than OrderedIntDoubleMapping. > For iteration it is 2x slower, though. > Using Colt in Sparsevector improved performance of canopy generation. For an > experimental dataset, the current implementation takes 50 minutes. Using > Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the > delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.