Re: What does the -p and -mr option of BuildForest and TestForest mean

deneche abdelhakim Sat, 16 Jul 2011 23:01:25 -0700

without the -p option it will use the In Memory variation: the dataset is
fully loaded in memory on all the computing nodes

without the -mr option Mahout will still use Hadoop's commands to access the
files but I think it won't require a Hadoop cluster  if the file is not on
HDFS, you'll have to give it a try though. But it's easy to setup Hadoop in
local mode (just take a look at Hadoop's website)

RandomForests use Hadoop's DistributedCache, it's a mechanism that can copy
the data onto all computing nodes so that every mapper get access to it. So
yes, when using -mr without -p Hadoop will copy the dataset into all
computing nodes

one last information, Mahout's RandomForests are not ment to be used without
a real computing cluster, if you want to use RandomForests on a single
machine I think that Weka's implementation is more suited.

On Sat, Jul 16, 2011 at 8:26 AM, XiaoboGu <[email protected]> wrote:

> Hi,
>
> If call BuildForest without the -p option, then what algorithm is used?
>
> Regarding to the -mr option of TestForest, there are two senarioes:
> 1. If -i option is supplied with a HDFS file or path URL, will Mahout use
> Hadoop to do the classification even if without the -mr option?
> 2.If -I option is supplied with a local file path, then what does the -mr
> option will do, copy the file into the configed Hadoop cluster, or launch a
> local Hadoop instance?
>
> Regards,
>
> Xiaobo Gu
>
>

Re: What does the -p and -mr option of BuildForest and TestForest mean

Reply via email to