[
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966297#action_12966297
]
Ted Dunning commented on MAHOUT-376:
------------------------------------
{quote}
I think you misunderstanding it a little. the actual implementation is not that
naive. let me clarify.
{quote}
I was hoping I misunderstood it.
{quote}
And there's no reducer (i.e. any sizable shuffle and sort) here. At the end of
this operation we have a bunch of Rs which corresponds to the number of splits,
and a bunch of interbediate Q blocks still same size which correspond to number
of Q-blocks.
Now we can repeat this process hierarchically with additional map-only passes
over Q blocks until only one R block is left. with 1G memory, as i said, my
estimate is we can merge up to 1000 Rs per combiner with one MR pass (without
extra overhead for single Q block and other java things). (in reality in this
implementation there are 2 levels in this hierarchy which seems to point to
over 1 bln rows, or about 1 mln Q blocks of some relatively moderate height
r>>k+p, but like i said with just one map-only pass one can increase scale of m
to single trillions ). This hierarchical merging is exactly what i meant by
'making MR work harder' for us.
{quote}
Sounds to me like the reducer could replicate the combiner and thus implement
the second step of your hierarchy which would avoid the second MR pass. You
could have a single reducer which receives all combiner outputs and thus merge
everything. Since you can't guarantee that the combiner does any work, this is
best practice anyway. The specification is that the combiner will run zero or
more times.
This also raises the question of whether your combiner can be applied multiple
times. I suspect yes. You will know better than I.
> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
> Key: MAHOUT-376
> URL: https://issues.apache.org/jira/browse/MAHOUT-376
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Reporter: Ted Dunning
> Assignee: Ted Dunning
> Fix For: 0.5
>
> Attachments: MAHOUT-376.patch, Modified stochastic svd algorithm for
> mapreduce.pdf, QR decomposition for Map.pdf, QR decomposition for Map.pdf, QR
> decomposition for Map.pdf, sd-bib.bib, sd.pdf, sd.pdf, sd.pdf, sd.pdf,
> sd.tex, sd.tex, sd.tex, sd.tex, SSVD working notes.pdf, SSVD working
> notes.pdf, SSVD working notes.pdf, ssvd-CDH3-or-0.21.patch.gz,
> ssvd-CDH3-or-0.21.patch.gz, ssvd-m1.patch.gz, ssvd-m2.patch.gz,
> ssvd-m3.patch.gz, Stochastic SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.