[
https://issues.apache.org/jira/browse/MAHOUT-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092918#comment-13092918
]
Dmitriy Lyubimov commented on MAHOUT-638:
-----------------------------------------
Ok... my latest version to compute Y buffer (which also may be sparse, but i
handled it separately looks like this :
{code}
public void computeYRow(Vector aRow, double[] yRow) {
// assert yRow.length == kp;
Arrays.fill(yRow, 0.0);
if (aRow.isDense()) {
int n = aRow.size();
for (int j = 0; j < n; j++) {
accumDots(j, aRow.getQuick(j), yRow);
}
} else {
for (Iterator<Element> iter = aRow.iterateNonZero(); iter.hasNext();) {
Element el = iter.next();
accumDots(el.index(), el.get(), yRow);
}
}
}
{code}
Does it look good to you? Do you think Y should be handled as
RandomAcessSparseVector instead?
> Stochastic svd's is not handling well all cases of sparse vectors
> ------------------------------------------------------------------
>
> Key: MAHOUT-638
> URL: https://issues.apache.org/jira/browse/MAHOUT-638
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-622-2.patch, MAHOUT-638-2.patch,
> MAHOUT-638-2.patch, MAHOUT-638.patch
>
>
> The Mahout patch of the algorithm is not handling all types of sparse input
> efficiently. BtJob doesn't handle SequentialSparseVector in a way to pick
> only non-zero elements from initial input and QJob doesn't iterate over
> RandomAccessSparseVector correctly. With extremely sparse inputs (0.05%
> non-zero elements) that leads to a terrible inefficiency in the
> aforementioned jobs (QJob, BtJob).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira