[ https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628853#comment-13628853 ]
Sebastian Schelter commented on MAHOUT-1190: -------------------------------------------- I took a look in the code and the situation is even worse, as DenseVector overrides assign and only has a special handling for PlusMult (not Plus!) I think we should extend the interfaces, so that you can ask a function whether f(0,x) = 0 and f(x,0) holds. I think we should keep definitely keep SASV and make sure that all vectors behave best as possible. > SequentialAccessSparseVector function assignment is very slow > ------------------------------------------------------------- > > Key: MAHOUT-1190 > URL: https://issues.apache.org/jira/browse/MAHOUT-1190 > Project: Mahout > Issue Type: Bug > Reporter: Dan Filimon > > Currently when calling .assign() on a SASV with another vector and a custom > function, it will iterate through it and assign every single entry while also > referring it by index. > This makes the process *hugely* expensive. (on a run of BallKMeans on the 20 > newsgroups data set, profiling reveals that 92% of the runtime was spent > updating assigning the vectors). > Here's a prototype patch: > https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira