[ 
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966229#action_12966229
 ] 

Ted Dunning commented on MAHOUT-376:
------------------------------------

{quote}
> I think that my suggested approach handles this already.

> The block decomposition of Q via the blockwise QR decomposition implies a 
> breakdown of B into column-wise blocks which 
> can each be handled separately. The results are then combined using 
> concatenation.

Ted, yes, i understand that part, but i think we are talking about different 
things. What I am talking about is formation of Y rows well before 
orthonotmalization is even concerned.

What i mean is that right now VectorWritable loads the entire thing into 
memory. Hence, the bound for width of A. (i.e. we can't load A row that is 
longer than some memory chunk we can afford for it).
{quote}

I understand that now.

The current limitation is that the sparse representation of a row has to fit 
into memory.  That means that we are limited to cases with a few hundred 
million non-zero elements and are effectively unlimited on the number of 
potential columns of A.

The only other place that the total  number of elements in a row comes into 
play is in B.  Using the block form of Q, however, we never
have to store an entire row of B, just manageable chunks.

My real worry with your approach is that the average number of elements per row 
of A is likely to be comparable to p+k.  This means that Y = A \Omega will be 
about as large as A.  Processing that sequentially is a non-starter and the 
computation of Q without block QR means that Y is processed sequentially.  On 
the other hand, if we block decompose Y, we want blocks that fit into memory 
because that block size lives on in B and all subsequent steps.  Thus, 
streaming QR is a non-issue in a blocked implementation.  The blocked 
implementation gives a natural parallel implementation.
 

> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
>                 Key: MAHOUT-376
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-376
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: MAHOUT-376.patch, Modified stochastic svd algorithm for 
> mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for Map.pdf, QR 
> decomposition for Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf, 
> sd.tex, sd.tex, sd.tex, sd.tex, SSVD working notes.pdf, SSVD working 
> notes.pdf, SSVD working notes.pdf, ssvd-CDH3-or-0.21.patch.gz, 
> ssvd-CDH3-or-0.21.patch.gz, ssvd-m1.patch.gz, ssvd-m2.patch.gz, 
> ssvd-m3.patch.gz, Stochastic SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to