Re: Some ideas for Mahout 0.5

Ted Dunning Mon, 04 Oct 2010 11:53:18 -0700

On Mon, Oct 4, 2010 at 11:44 AM, deneche abdelhakim <[email protected]>wrote:


> For Decision Forests, my goal for 0.5 is to add a 'full'
> implementation. Meaning, an implementation that can build random
> forests using the whole dataset, even if its split among many
> machines. I found the following paper to be very interesting:
> http://www.cba.ua.edu/~mhardin/rainforest.pdf
> although the described approach doesn't work as it is for numerical
> attributes.
>

Very cool.

I would love it if DF became a first class Mahout classifier.

As well as scaling up, it would be very nice if there were a model
compression step to help with the deployment of DF
models.



>
> The implementation should at least work for the following dataset:
>
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2304&categoryID=248
> it's 50 GB, and a small subset is available in UCI. It contains only
> categorical attributes, and it's big enough to be a good candidate.
>

Which UCI dataset is this?  The income>50k$ one?

Does the AWS dataset have household income?

Re: Some ideas for Mahout 0.5

Reply via email to