[
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966229#action_12966229
]
Ted Dunning commented on MAHOUT-376:
------------------------------------
{quote}
> I think that my suggested approach handles this already.
> The block decomposition of Q via the blockwise QR decomposition implies a
> breakdown of B into column-wise blocks which
> can each be handled separately. The results are then combined using
> concatenation.
Ted, yes, i understand that part, but i think we are talking about different
things. What I am talking about is formation of Y rows well before
orthonotmalization is even concerned.
What i mean is that right now VectorWritable loads the entire thing into
memory. Hence, the bound for width of A. (i.e. we can't load A row that is
longer than some memory chunk we can afford for it).
{quote}
I understand that now.
The current limitation is that the sparse representation of a row has to fit
into memory. That means that we are limited to cases with a few hundred
million non-zero elements and are effectively unlimited on the number of
potential columns of A.
The only other place that the total number of elements in a row comes into
play is in B. Using the block form of Q, however, we never
have to store an entire row of B, just manageable chunks.
My real worry with your approach is that the average number of elements per row
of A is likely to be comparable to p+k. This means that Y = A \Omega will be
about as large as A. Processing that sequentially is a non-starter and the
computation of Q without block QR means that Y is processed sequentially. On
the other hand, if we block decompose Y, we want blocks that fit into memory
because that block size lives on in B and all subsequent steps. Thus,
streaming QR is a non-issue in a blocked implementation. The blocked
implementation gives a natural parallel implementation.
> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
> Key: MAHOUT-376
> URL: https://issues.apache.org/jira/browse/MAHOUT-376
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Reporter: Ted Dunning
> Assignee: Ted Dunning
> Fix For: 0.5
>
> Attachments: MAHOUT-376.patch, Modified stochastic svd algorithm for
> mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for Map.pdf, QR
> decomposition for Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf,
> sd.tex, sd.tex, sd.tex, sd.tex, SSVD working notes.pdf, SSVD working
> notes.pdf, SSVD working notes.pdf, ssvd-CDH3-or-0.21.patch.gz,
> ssvd-CDH3-or-0.21.patch.gz, ssvd-m1.patch.gz, ssvd-m2.patch.gz,
> ssvd-m3.patch.gz, Stochastic SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.