Hey Andy,

There are no plans for this.  You are correct that multiple passes aren't
too difficult, but they do go against the standard map-reduce paradigm a
bit if you want to avoid iterative map-reduce.

It definitely would be nice to have a really competitive random forest
implementation that uses the global  accumulator style plus long-lived
mappers.  The basic idea would be to use the same sort of tricks that
Vowpal Wabbit or Giraph use to get a bunch of long-lived mappers and then
have them asynchronously talk to a tree repository.

On Fri, Jan 25, 2013 at 6:58 PM, Andy Twigg <[email protected]> wrote:

> Hi,
>
> I'm new to this list so I apologise if this is covered elsewhere (but
> I couldn't find it..)
>
> I'm looking at the Random Forests implementations, both mapreduce
> ("partial") and non-distributed. Both appear to require the data
> loaded into memory. Random forests should be straightforward to
> construct with multiple passes through the data without storing the
> data in memory. Is there such an implementation in Mahout? If not, is
> there a ticket/plan ?
>
> Thanks,
> Andy
>
>
> --
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford
> Room 351, Department of Computer Science
> http://www.cs.ox.ac.uk/people/andy.twigg/
> [email protected] | +447799647538
>

Reply via email to