[jira] Commented: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Dmitriy Lyubimov (JIRA) Thu, 02 Dec 2010 12:24:34 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966265#action_12966265
 ]


Dmitriy Lyubimov commented on MAHOUT-376:
-----------------------------------------

{quote}
My real worry with your approach is that the average number of elements per row 
of A is likely to be comparable to p+k. This means that Y = A \Omega will be 
about as large as A. Processing that sequentially is a non-starter and the 
computation of Q without block QR means that Y is processed sequentially. On 
the other hand, if we block decompose Y, we want blocks that fit into memory 
because that block size lives on in B and all subsequent steps. Thus, streaming 
QR is a non-issue in a blocked implementation. The blocked implementation gives 
a natural parallel implementation.

{quote}

I think you misunderstanding it a little. the actual implementation is not that 
naive. let me clarify. 

First, there *is* blocking. More over, it's a hierarchical blocking. 

the way it works, you specify block height, which is more k+p but ideally less 
than a MR split would host (you can specify more but you may be producing some 
network traffic then to move non-collocated parts of the split). Blocks are 
considered completely in parallel.  Hence, initial parallelizm degree is m/r 
where r is average block height. They can (and are) considered independently, 
among the splits. "thin streaming QR" runs *inside* the blocks, not on the 
whole Y. 

Secondly, Y matrix, or even its blocks, are never formed. What is formed is 
shifting intermediate Q buffer of the size (k+p)xr and intermediate upper 
triangular R of size (k+p)x(k+p). Since they are triangular, there's a 
rudimental implementation of Matrix itnerface called UpperTriangular not to 
waste space on lower triangle but still allow random access.

Thirdly, the hierarchy. when we form Q blocks, we will have to update them with 
Givens operations resulting from merging R matrices. This is done in combiner 
and this  comes very natural to it. If there's say z blocks in a mapper then Q1 
goes thru updates resulting from z merges of R, Q2 goes thru udpates resulting 
from z-1 merges and so on. Nothing being concatenated (or unblocked) there 
except for the R sequence (but it is still sequence, that is sequentially 
accessed thing) which i already provided memory estimates for. Most 
importantly, it does not depend on the block height, so you can shrink R 
sequence length if you have higher Q blocks, but higher Q blocks also take more 
memory to process at a time. there's a sweet spot to be hit here with 
parameters defining block height and split size, so it maximizes the thruput. 
for k+p=500 i don't see any memory concerns there in a single combiner run. 

And there's no reducer (i.e. any sizable shuffle and sort) here. At the end of 
this operation we have a bunch of Rs which corresponds to the number of splits, 
and a bunch of interbediate Q blocks still same size which correspond to number 
of Q-blocks. 

Now we can repeat this process  hierarchically with additional map-only passes 
over Q blocks until only one R block is left. with 1G memory, as i said, my 
estimate is we can merge up to 1000 Rs with one MR pass. (in reality in this 
implementation there are 2 levels in this hierarchy which seems to point to 
over 1 bln rows, or about 1 mln Q blocks of some relatively moderate height 
r>>k+p, but like i said with just one  map-only pass one can increase scale of 
m to single trillions ). This hierarchical merging is exactly what i meant by 
'making MR work harder' for us. 

There is a poor illustration of this hierarchical process in the doc that makes 
it perhaps more clear than words. 

Also let me point out that the fact that the processes involved in R merging 
are *map-only*,  which means that if we play the splitting game right in MR, 
there would  *practically be no networking IO* per MR theory. This is very 
important imo for such scale. The only IO that occurs is to 'slurp' r sequences 
from HDFS before next stage of hierarchical R-merge. For a sequence of 1000 R, 
k+p 500, the size of R, dense and uncompressed, is approximately 1 mb each, so 
for a sequence of thousand Rs, the size of such slurp IO, dense and 
uncompressed,  would be 1G, which is less than what i am having today in a 
single step with Pig for a 200k of proto-packed log records today in production 
and that finishes in a minute.

Bottom line, let's benchmark it. So we don't have to guess. Especially if we 
can do  A vector streaming. I am personally having trouble with logistics of 
this so far, as i mentioned before. I will get to benchmarking it sooner or 
later. Important thing for me at this point was to make sure it does correct 
computation (which it does) and make educated guess about the scale (which is 
billion by million without vector streaming support or billion to gazillion 
with vector streaming support, with potential to extend m scale thousand times 
with each additional map-only pass over Q data (which is (k+p)xm which is again 
unbounded for n).


thanks. 



> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
>                 Key: MAHOUT-376
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-376
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: MAHOUT-376.patch, Modified stochastic svd algorithm for 
> mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for Map.pdf, QR 
> decomposition for Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf, 
> sd.tex, sd.tex, sd.tex, sd.tex, SSVD working notes.pdf, SSVD working 
> notes.pdf, SSVD working notes.pdf, ssvd-CDH3-or-0.21.patch.gz, 
> ssvd-CDH3-or-0.21.patch.gz, ssvd-m1.patch.gz, ssvd-m2.patch.gz, 
> ssvd-m3.patch.gz, Stochastic SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Reply via email to