without the -p option it will use the In Memory variation: the dataset is fully loaded in memory on all the computing nodes
without the -mr option Mahout will still use Hadoop's commands to access the files but I think it won't require a Hadoop cluster if the file is not on HDFS, you'll have to give it a try though. But it's easy to setup Hadoop in local mode (just take a look at Hadoop's website) RandomForests use Hadoop's DistributedCache, it's a mechanism that can copy the data onto all computing nodes so that every mapper get access to it. So yes, when using -mr without -p Hadoop will copy the dataset into all computing nodes one last information, Mahout's RandomForests are not ment to be used without a real computing cluster, if you want to use RandomForests on a single machine I think that Weka's implementation is more suited. On Sat, Jul 16, 2011 at 8:26 AM, XiaoboGu <[email protected]> wrote: > Hi, > > If call BuildForest without the -p option, then what algorithm is used? > > Regarding to the -mr option of TestForest, there are two senarioes: > 1. If -i option is supplied with a HDFS file or path URL, will Mahout use > Hadoop to do the classification even if without the -mr option? > 2.If -I option is supplied with a local file path, then what does the -mr > option will do, copy the file into the configed Hadoop cluster, or launch a > local Hadoop instance? > > Regards, > > Xiaobo Gu > >
