Hey Andy, There are no plans for this. You are correct that multiple passes aren't too difficult, but they do go against the standard map-reduce paradigm a bit if you want to avoid iterative map-reduce.
It definitely would be nice to have a really competitive random forest implementation that uses the global accumulator style plus long-lived mappers. The basic idea would be to use the same sort of tricks that Vowpal Wabbit or Giraph use to get a bunch of long-lived mappers and then have them asynchronously talk to a tree repository. On Fri, Jan 25, 2013 at 6:58 PM, Andy Twigg <[email protected]> wrote: > Hi, > > I'm new to this list so I apologise if this is covered elsewhere (but > I couldn't find it..) > > I'm looking at the Random Forests implementations, both mapreduce > ("partial") and non-distributed. Both appear to require the data > loaded into memory. Random forests should be straightforward to > construct with multiple passes through the data without storing the > data in memory. Is there such an implementation in Mahout? If not, is > there a ticket/plan ? > > Thanks, > Andy > > > -- > Dr Andy Twigg > Junior Research Fellow, St Johns College, Oxford > Room 351, Department of Computer Science > http://www.cs.ox.ac.uk/people/andy.twigg/ > [email protected] | +447799647538 >
