PPS also make sure you specify numReduceTasks. Default is I beleive 1
which will not scale at multiplication steps at all.

On Tue, Nov 29, 2011 at 10:15 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> PS actually i think it should scale horizontally a little better than
> vertically but that's just a guess.
>
> On Tue, Nov 29, 2011 at 10:10 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>> On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko <nat...@spotinfluence.com> 
>> wrote:
>>>
>>> The docs look great Dmitriy.  Has anyone considered giving oversampling
>>> ssvd over lanczos which is promising.  Trying to scale out horizontally but
>>> not seeing any difference between using one slave or many slaves.  Any
>>> ideas? (I won't go into detail about the setup here but if sounds familiar
>>> I'd like to talk more).
>>
>> What do you mean by a slave? a mapper? a machine?
>>
>> whether you increase input horizontally or vertically, you should see
>> more mappers. If your cluster has enough capacity to scheudle all
>> mappers right away, i beleive you will get almost the same time (i.e.
>> almost linear scaling) for most of the jobs.
>>
>>> The basic problem with lanczos in the distributed
>>> environment seems to be that a matrix-vector multiply is not enough work to
>>> offset any setup costs, also there is not a distributed orthogonalization
>>> with lanczos and I'm getting OOM's making it difficult to scale.  I would
>>> still like to contribute what results I have found but I'm short on time so
>>> nothing besides work directly related to the completion of my thesis will
>>> happen until that is done.
>>>
>>
>>> On Fri, Nov 25, 2011 at 5:37 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>>
>>> > I attached the latex source as well (lyx, actually). I would've used
>>> > Wiki if it supported mathjax. So anyone can modify the usage if need
>>> > be. (Anyone who has lyx anyway).
>>> >
>>> > Dev docs were attached to several jira issues (and i had blog
>>> > entries), if you want to move more recent copies of them moved  over
>>> > to wiki, i'd be happy to. Mainly, so far there are 2 working notes,
>>> > one for original method, and another for power iterations, attached to
>>> > corresponding jiras.
>>> >
>>> >
>>> > On Fri, Nov 25, 2011 at 4:26 PM, Grant Ingersoll <gsing...@apache.org>
>>> > wrote:
>>> > > I hooked it into the Algorithms page.
>>> > >
>>> > > How do you intend to keep the PDF up to date?  I like the focus more on
>>> > the user, but it would also be good to have some dev docs.
>>> > >
>>> > > Also, with both Lanczos and this it would be good if we could hook them
>>> > into some real examples.
>>> > >
>>> > > On Nov 25, 2011, at 5:42 PM, Dmitriy Lyubimov wrote:
>>> > >
>>> > >> Hi,
>>> > >>
>>> > >> I put a usage and overview doc for SSVD onto wiki. I'd appreciate if
>>> > >> somebody else could look thru it, to scan for completeness and
>>> > >> suggestions.
>>> > >>
>>> > >> I tried to approach it as a user-facing documentation, i.e. I tried to
>>> > >> avoid discussing any implementation specifics .
>>> > >>
>>> > >> I had several users and Nathan Halko trying it out and actually
>>> > >> favorably commenting on its scalability vs. Lanczos but i don't know
>>> > >> first hand of any production use (even our own use is fairly limited
>>> > >> (in terms of input volume we ever processed) and actually somewhat
>>> > >> diverged from this Mahout implementation. Perhaps putting it more in
>>> > >> front of users will help to receive more feedback.
>>> > >>
>>> > >> Thanks.
>>> > >> -Dmitriy
>>> > >
>>> > > --------------------------------------------
>>> > > Grant Ingersoll
>>> > > http://www.lucidimagination.com
>>> > >
>>> > >
>>> > >
>>> > >
>>> >

Reply via email to