PPS also make sure you specify numReduceTasks. Default is I beleive 1 which will not scale at multiplication steps at all.
On Tue, Nov 29, 2011 at 10:15 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > PS actually i think it should scale horizontally a little better than > vertically but that's just a guess. > > On Tue, Nov 29, 2011 at 10:10 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko <nat...@spotinfluence.com> >> wrote: >>> >>> The docs look great Dmitriy. Has anyone considered giving oversampling >>> ssvd over lanczos which is promising. Trying to scale out horizontally but >>> not seeing any difference between using one slave or many slaves. Any >>> ideas? (I won't go into detail about the setup here but if sounds familiar >>> I'd like to talk more). >> >> What do you mean by a slave? a mapper? a machine? >> >> whether you increase input horizontally or vertically, you should see >> more mappers. If your cluster has enough capacity to scheudle all >> mappers right away, i beleive you will get almost the same time (i.e. >> almost linear scaling) for most of the jobs. >> >>> The basic problem with lanczos in the distributed >>> environment seems to be that a matrix-vector multiply is not enough work to >>> offset any setup costs, also there is not a distributed orthogonalization >>> with lanczos and I'm getting OOM's making it difficult to scale. I would >>> still like to contribute what results I have found but I'm short on time so >>> nothing besides work directly related to the completion of my thesis will >>> happen until that is done. >>> >> >>> On Fri, Nov 25, 2011 at 5:37 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>> >>> > I attached the latex source as well (lyx, actually). I would've used >>> > Wiki if it supported mathjax. So anyone can modify the usage if need >>> > be. (Anyone who has lyx anyway). >>> > >>> > Dev docs were attached to several jira issues (and i had blog >>> > entries), if you want to move more recent copies of them moved over >>> > to wiki, i'd be happy to. Mainly, so far there are 2 working notes, >>> > one for original method, and another for power iterations, attached to >>> > corresponding jiras. >>> > >>> > >>> > On Fri, Nov 25, 2011 at 4:26 PM, Grant Ingersoll <gsing...@apache.org> >>> > wrote: >>> > > I hooked it into the Algorithms page. >>> > > >>> > > How do you intend to keep the PDF up to date? I like the focus more on >>> > the user, but it would also be good to have some dev docs. >>> > > >>> > > Also, with both Lanczos and this it would be good if we could hook them >>> > into some real examples. >>> > > >>> > > On Nov 25, 2011, at 5:42 PM, Dmitriy Lyubimov wrote: >>> > > >>> > >> Hi, >>> > >> >>> > >> I put a usage and overview doc for SSVD onto wiki. I'd appreciate if >>> > >> somebody else could look thru it, to scan for completeness and >>> > >> suggestions. >>> > >> >>> > >> I tried to approach it as a user-facing documentation, i.e. I tried to >>> > >> avoid discussing any implementation specifics . >>> > >> >>> > >> I had several users and Nathan Halko trying it out and actually >>> > >> favorably commenting on its scalability vs. Lanczos but i don't know >>> > >> first hand of any production use (even our own use is fairly limited >>> > >> (in terms of input volume we ever processed) and actually somewhat >>> > >> diverged from this Mahout implementation. Perhaps putting it more in >>> > >> front of users will help to receive more feedback. >>> > >> >>> > >> Thanks. >>> > >> -Dmitriy >>> > > >>> > > -------------------------------------------- >>> > > Grant Ingersoll >>> > > http://www.lucidimagination.com >>> > > >>> > > >>> > > >>> > > >>> >