[ https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil updated MAHOUT-1190: ------------------------------- Attachment: MAHOUT-1190-seq-dot-product.patch MAHOUT-1190-iterator-fix.patch Latest patches > SequentialAccessSparseVector function assignment is very slow > ------------------------------------------------------------- > > Key: MAHOUT-1190 > URL: https://issues.apache.org/jira/browse/MAHOUT-1190 > Project: Mahout > Issue Type: Bug > Reporter: Dan Filimon > Attachments: MAHOUT-1190-1.patch, MAHOUT-1190-iterator-fix.patch, > MAHOUT-1190-iterator-fix.patch, MAHOUT-1190.patch, > MAHOUT-1190-seq-dot-product.patch, MAHOUT-1190-seq-dot-product.patch > > > Currently when calling .assign() on a SASV with another vector and a custom > function, it will iterate through it and assign every single entry while also > referring it by index. > This makes the process *hugely* expensive. (on a run of BallKMeans on the 20 > newsgroups data set, profiling reveals that 92% of the runtime was spent > updating assigning the vectors). > Here's a prototype patch: > https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira