On Wed, May 18, 2011 at 6:38 AM, Sean Owen <[email protected]> wrote: > I think it first has to finish embracing MapReduce! The code base already > uses 2.5 different versions of Hadoop. It would be better clean up the > modest clutter of approaches we already have before thinking about > extending > it. >
For the GSoC project which version of Hadoop's API should I follow? > Good news is there's a fair bit of time before any other particular > framework becomes widely used enough to merit thinking hard about. > > And I do think we need to focus on cleanup now rather than later. For > example I will shortly suggest deprecating M/R jobs that use Hadoop 0.19 > APIs in the name of moving forward. > > On Wed, May 18, 2011 at 11:23 AM, Ted Dunning <[email protected]> > wrote: > > > This is a theme that is going to raise itself over and over. > > > > I think that strategically, Mahout is going to have to embrace the > > MapReduce > > nextGen work so that we can have flexible computation models. We already > > need this with all the large scale SVD work. We could very much use it > for > > the SGD stuff. Now this gradient work could use it. > > > > New needs aren't going to stop. > > > > On Tue, May 17, 2011 at 10:17 PM, Hector Yee <[email protected]> > wrote: > > > > > Re: boosting scalability, I've implemented it on thousands of machines, > > but > > > not with mapreduce, rather with direct RPC calls. The gradient > > computation > > > tends to be iterative, so one way to do it is to have each iteration > run > > > per > > > mapreduce. > > > Compute gradients in the mapper, gather them in the reducer, rinse and > > > repeat. > > > > > >
