Thank you Jake. i guess i did not have a chance to read thru 228. I am guessing 228 is something quite different from murmur hash support in hadoop, right?
i will read thru it though and i guess i may come back with more questions. Thanks. -Dmitriy On Mon, Mar 22, 2010 at 4:38 PM, Jake Mannix <jake.man...@gmail.com> wrote: > Hi Dmitriy, > > Stochastic SVD is high on my list of pieces to get into Mahout as > well, but is partly dependent on getting some of Ted's murmurhash stuff > from the SGD work he's got sitting idle in a patch on MAHOUT-228. > > If you could help get MAHOUT-228 finished and put in trunk, we could > quickly move forward on MAHOUT-309. I think this can be done in > possibly only 2 MR passes, but we can chat about that a bit more > as we dig into it. :) > > -jake > > On Mon, Mar 22, 2010 at 4:33 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > Hi all, > > > > i had a chance to touch base quickly with Ted Dunning last weekend at the > > Bay Area machine learning camp. It's my understanding the main advantage > of > > this method is that partial SVD can be achieved in a constant # of MR > jobs > > (Ted's analysis seemed to imply that number would be 4) . > > > > 've been following Mahout for perhaps couple of months and read the book > > (first 6 chapters of it anyway) in MEA, and that's about it. But i have a > > great interest in all the work happening in this project. > > > > > > While it my be the case that our particular business problem at the time > > may > > be addressed by running single-node iterative svd (such as lanczos > > iterative, one of lapack's methods), it is highly likely it will not be > the > > case for too long. We also use Hadoop and ecosystem for our platform, so > > mahout comes naturally into picture (whereas MPI does not). > > > > Anyway, starting the next week, i will have to spend time on that > business > > need, and my boss seems to be happy if i have a chance to contribute part > > of > > my time and results to Mahout (i guess he also expects results as well... > > eventually :-) ) . The paper seems to be the one in the issue MAHOUT-309, > i > > skimmed it a little bit and i guess i have some questions in regards to > > Ted's clarifications as given at the camp this weekend and this paper (if > > it > > is even the right one). > > > > I guess i do need some guidance if i am to do this and i am wondering if > my > > effort is welcome (provided i need some guidance on some details of > Mahout > > and the algorithms there). I guess my selfish desire is to escalate > method > > availability in Mahout. > > > > Thank you very much. > > -Dmitriy > > >